[PDF] Maximal clades in random binary search trees

Abstract

We study maximal clades in random phylogenetic trees with the Yule-Harding model or, equivalently, in binary search trees. We use probabilistic methods to reprove and extend earlier results on moment asymptotics and asymptotic normality. In particular, we give an explanation of the curious phenomenon observed by Drmota, Fuchs and Lee (2014) that asymptotic normality holds, but one should normalize using half the variance.

Full PDF

aa r X i v : . [ m a t h . P R ] A ug MAXIMAL CLADES IN RANDOM BINARY SEARCHTREES

SVANTE JANSON

Abstract.

We study maximal clades in random phylogenetic trees withthe Yule–Harding model or, equivalently, in binary search trees. We useprobabilistic methods to reprove and extend earlier results on momentasymptotics and asymptotic normality. In particular, we give an expla-nation of the curious phenomenon observed by Drmota, Fuchs and Lee(2014) that asymptotic normality holds, but one should normalize usinghalf the variance. Introduction

Recall that there are two types of binary trees; we ﬁx the notation asfollows. A full binary tree is an rooted tree where each node has either 0or 2 children; in the latter case the two children are designated as left child and right child . A binary tree is a rooted tree where each node has 0, 1or 2 children; moreover, each child is designated as either left child or rightchild , and each node has at most one child of each type. (Both versionscan be regarded as ordered trees, with the left child before the right whenthere are two children.) It is convenient to regard also the empty tree ∅ asa binary tree (but not as a full binary tree). In a full binary tree, the leaves(nodes with no children) are called external nodes ; the other nodes (having 2children) are internal nodes . There is a simple, well-known bijection betweenfull binary trees and binary trees: Given a full binary tree, its internal nodesform a binary tree; this is a bijection, with inverse given by adding, to anygiven binary tree, external nodes as children at all free places.Note that a full binary tree with n internal nodes has n +1 external nodes,and thus 2 n + 1 nodes in total. In particular, the bijection just describedyields a bijection between the full binary trees with 2 n + 1 nodes and thebinary trees with n nodes.If T is a binary, or full binary, tree, we let T L and T R be the subtreesrooted at the left and right child of the root, with T L = ∅ [ T R = ∅ ] if theroot has no left [right] child. Date : 27 August, 2014.2010

Mathematics Subject Classiﬁcation. A phylogenetic tree is the same as a full binary tree. In this context,the clade of an external node v is deﬁned to be the set of external nodesthat are descendants of the parent of v . (This is called a minimal clade by Blum and Fran¸cois [3] and Chang and Fuchs [6].) Note that two cladesare either nested or disjoint; furthermore, each external node belongs tosome clade (for example its own). Hence, the set of maximal clades formsa partition of the set of external nodes. We let F ( T ) denote the number ofmaximal clades of a phylogenetic tree T . (Except that for technical reasons,see Section 2, we deﬁne F ( T ) = 0 for a phylogenetic tree T with only oneexternal node. Obviously, this does not aﬀect asymptotics.) The maximalclades, and the number of them, were introduced by Durand, Blum andFran¸cois [11], together with a biological motivation, and further studied byDrmota, Fuchs and Lee [10].The phylogenetic trees that we consider are random; more precisely, weconsider the Yule–Harding model of a random phylogenetic tree ¯ T n with agiven number n internal, and thus n +1 external, nodes. These can be deﬁnedrecursively, with ¯ T the unique phylogenetic tree with 1 node (the root), and¯ T n +1 obtained from ¯ T n ( n >

0) by choosing an external node uniformly atrandom and converting it to an internal node with two external children.(Alternatively, we obtain the same random model by constructing the treebottom-up by Kingman’s coalescent [17], see further Aldous [2], Blum andFran¸cois [3] and Chang and Fuchs [6].) Recall that, for any n >

1, thenumber of internal nodes in the left subtree ¯ T n, L (or the right subtree ¯ T n, R )is uniformly distributed on { , . . . , n − } , and that conditioned on thisnumber being m , ¯ T n, L has the same distribution as ¯ T m ; see also Remark 5.1.Under the bijection above, the Yule–Harding random tree ¯ T n correspondsto the random binary search tree T n with n nodes, see e.g. Blum, Fran¸coisand Janson [4] and Drmota [9].The random variable that we study is thus X n := F ( ¯ T n ), the number ofmaximal clades in the Yule–Harding model. It was proved by Durand andFran¸cois [12] that the mean number of maximal clades E X n ∼ αn , where α = 1 − e − . (1.1)This was reproved by Drmota, Fuchs and Lee [10], in a sharper form: Theorem 1.1 ([12; 10]) . E X n = E F ( T n ) = αn + O (1) , (1.2) where α is given by (1.1) . Moreover, Drmota, Fuchs and Lee [10] found also corresponding resultsfor the variance and higher central moments:

Theorem 1.2 ([10]) . As n → ∞ , E ( X n − E X n ) ∼ α n log n, (1.3) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 3 and for any ﬁxed integer k > , E ( X n − E X n ) k ∼ ( − k kk − α k n k − . (1.4)As a consequence of (1.3)–(1.4), the limit distribution of F ( ¯ T n ) (aftercentering and normalization) cannot be found by the method of moments.Nevertheless, [10] further proved asymptotic normality, where, unusually,the normalizing uses (the square root of) half the variance: Theorem 1.3 ([10]) . As n → ∞ , X n − E X n p α n log n d −→ N (0 , . (1.5)Here and below, d −→ denotes convergence in distribution; similarly, p −→ will denotes convergence in probability. Unspeciﬁed limits (including im-plicit ones such as ∼ and o (1)) will be as n → ∞ . Furthermore, Y p = o p ( a n ),for random variables Y n and positive numbers a n , means Y n /a n p −→

0. Welet

C, C , C , . . . denote some unspeciﬁed positive constants.The purpose of the present paper is to use probabilistic methods to re-prove these theorems, together with some further results; we hope that thiscan give additional insight, and it might perhaps also suggest future gener-alizations to other types of random trees.In particular, we can explain the appearance of half the variance in The-orem 1.3 as follows:Fix a sequence of numbers N = N ( n ), and say that a clade is small ifit has at most N + 1 elements, and large otherwise. (We use N + 1 in thedeﬁnition only for later notational convenience; the subtree correspondingto a small clade has at most N internat nodes.) Let X Nn be the number ofmaximal small clades, i.e., the small clades that are not contained in anyother small clade. It turns out that a suitable choice of N is about √ n ; wegive two versions in the next theorem. Theorem 1.4. (i)

Let N := √ n . Then Var( X Nn ) ∼ α n log n and X Nn − E X Nn p Var X Nn d −→ N (0 , . (1.6) Furthermore, X n − X Nn = o p (cid:0)p Var X Nn (cid:1) and E X n − E X Nn = o (cid:0)p Var X Nn (cid:1) ,so we may replace X Nn by X n in the numerator of (1.6) . However, Var( X n − X Nn ) ∼ Var( X Nn ) ∼ α n log n. (1.7)(ii) Let √ n ≪ N ≪ √ n log n , for example N := n log log n . Then theconclusions of (i) still hold; moreover, P ( X n = X Nn ) → . The theorem thus shows that the large clades are rare, and do not con-tribute to the asymptotic distribution; however, when they appear, thelarges clades give a large (actually negative) contribution to X n , and as SVANTE JANSON a result, half the variance of X n comes from the large clades. (When thereis a large clade, there is less room for other clades, so X n tends to be smallerthan usually. See also (2.4) and (2.2) below.)For higher moments, the large clades play a similar, but even more ex-treme, role. Note that (for n >

2) with probability 2 /n , the root of ¯ T n has one internal and one external node, and then there is a clade consist-ing of all external nodes; this is obviously the unique maximal clade, andthus X n = 1. Since E X n = αn + O (1) by Theorem 1.1, we thus have X n − E X n = − αn + O (1) with probability 2 /n , and this single exceptionalevent gives a contribution ∼ ( − k α k n k − to E ( X n − E X n ) k , which ex-plains a fraction ( k − /k of the moment (1.4); in particular, this explainswhy the moment is of order n k − .We shall see later that, roughly speaking, the moment asymptotic in (1.4)is completely explained by extremely large clades of size Θ( n ), which appearin the O (1) ﬁrst generations of the tree.This will also lead to a version of (1.4) for absolute central moments: Theorem 1.5.

For any ﬁxed real p > , as n → ∞ , E (cid:12)(cid:12) X n − E X n (cid:12)(cid:12) p ∼ pp − α p n p − . (1.8)In Section 2, we transfer the problem from random phylogenetic trees torandom binary search tree, which we shall use in the proofs. The theoremsabove are proved in Sections 3–7.2. Binary trees

We ﬁnd it technically convenient to work with binary trees instead offull binary trees (phylogenetic trees), so we use the bijection in Section 1to deﬁne F ( T ) also for binary trees T . (We use the same notation F ; thisshould not cause any confusion.) With this translation, our problem is thusto study X n := F ( T n ), where T n is the binary search tree with n nodes.The clades in a phylogenetic tree correspond to the internal nodes thathave at least one external child, i.e., the nodes in the corresponding binarytree that have outdegree at most 1. We call such nodes green . For a binarytree T , the number F ( T ) is thus the number of maximal green nodes , i.e.,the number of green nodes that have no green ancestor. (This holds also forthe phylogenetic tree T with a single node, and thus for the empty binarytree, with our deﬁnition F ( T ) = 0 in this case.)It follows that, for any binary tree T , F ( T ) := ( T has a green root ,F ( T L ) + F ( T R ) otherwise . (2.1) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 5

Deﬁne, for a binary tree T , f ( T ) := F ( T ) − F ( T L ) − F ( T R ) =  − F ( T R ) , T L = ∅ , T = ∅ , − F ( T L ) , T R = ∅ , T = ∅ , , otherwise . (2.2)Then F ( T ) is given by the recursion F ( T ) = F ( T L ) + F ( T R ) + f ( T ) , (2.3)and thus F ( T ) = X v ∈ T f ( T v ) , (2.4)where T v is the subtree rooted at v , consisting of v and all its descendants.In another words, F ( T ) is the additive functional deﬁned by the toll function f ( T ). The advantage of this point of view is that we have eliminated themaximality condition and now sum over all subtrees T v , and that we canuse general results for this type of sums, see Holmgren and Janson [16].We let T denote the random binary search tree with a random numberof elements such that P ( |T | = n ) = 2 / (( n + 1)( n + 2)), n >

1. The randombinary tree T can be constructed by a continuous-time branching process:Let ( ˜ T t ) t > be the growing tree that starts with an isolated root at time t = 0 and such that each existing node gets a left and a right child afterrandom waiting times that are independent and Exp(1); we stop the processat a random time τ ∼ Exp(1), independent of everything else, and cantake T = ˜ T τ , see Aldous [1] (where it is also proved that T is the limit indistribution of a random fringe tree in a binary search tree).3. The mean

Recall that T n is the random binary search tree with n nodes. Deﬁne ν n := E F ( T n ) and µ n := E f ( T n ), with F and f as in Section 2. (Inparticular, ν = µ = 0, while ν = µ = 1 since F ( T ) = f ( T ) = 1.) For n > T n, L is empty with probability 1 /n , and conditioned on this event, T n, R has the same distribution as T n − . The same holds if we interchange L and R . Hence, taking the expectation in (2.2), µ n = n (cid:0) − E F ( T n − ) (cid:1) = n (cid:0) − ν n − (cid:1) , n > . (3.1)Furthermore, we see that (2.2) implies P (cid:0) f ( T n ) = 0 (cid:1) /n. (3.2)Since obviously 0 F ( T ) | T | , we have by (2.2) also −| T | f ( T ) | f ( T ) | | T | (3.3)for any binary tree T . In particular, this and (3.2) yield | µ n | E | f ( T n ) | n P (cid:0) f ( T n ) = 0 (cid:1) . (3.4) SVANTE JANSON

It is now a simple consequence of general results that ν n := E F ( T n ) isasymptotically linear in n . Recall the random binary tree T deﬁned inSection 2. Lemma 3.1. ν n := E F ( T n ) = nα + O (1) , (3.5) where α := E f ( T ) = ∞ X n =1 n + 1)( n + 2) E f ( T n ) = ∞ X n =1 n + 1)( n + 2) µ n = ∞ X n =1 n ( n + 1)( n + 2) (1 − ν n − ) . (3.6) Proof.

An instance of Holmgren and Janson [16, Theorem 3.8]. More ex-plicitly, see [16, Theorem 3.4], E F ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) µ k + µ n , (3.7)which implies the result by (3.4) and (3.1). (cid:3) In order to prove Theorem 1.1, it remains to show that α deﬁned in (3.6)equals (1 − e − ) / Lemma 3.2. E f ( T ) = 1 − e − . (3.8)We can prove Lemma 3.2 by probabilistic methods, using the constructionof T by a branching process in Section 2. However, this proof is considerablylonger than the proof of Theorem 1.1 by singularity analysis of generatingfunctions in [12] and [10]; we nevertheless ﬁnd the probabilistic proof inter-esting, and perhaps useful for future generalizations, but since the methodsin it are not needed for other results in the present paper, we postpone ourproof of Lemma 3.2 to Section 7.4. Variance

Let γ n := Var( f ( T n )) and σ n := Var( F ( T n )). Then γ = γ = σ = σ =0 and, for n >

2, using (2.2), γ n = E f ( T n ) − µ n = 2 n E (cid:0) F ( T n − ) − (cid:1) − µ n n n = 2 n. (4.1)Before proving the variance asymptotics in (1.3), we begin with a weakerestimate. Lemma 4.1.

For n > , σ n := Var F ( T n ) = O ( n log n ) . (4.2) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 7

Proof.

By [16, Theorem 3.9], where it suﬃces to sum to n since we mayreplace f ( T ) by 0 for | T | > n without changing F ( T n ), σ n Cn (cid:18) n X k =1 γ k k / (cid:19) + sup k γ k k + n X k =1 µ k k ! = O ( n log n ) , (4.3)using (4.1) and (3.4), provided n >

2. The case n = 1 is trivial. (cid:3) Write f ( T ) = g ( T ) + h ( T ), where g ( T ) := ( − ν | T |− , T L = ∅ , T = ∅ or T R = ∅ , T = ∅ , , otherwise . (4.4)and thus, see (2.2), h ( T ) :=  ν | T R | − F ( T R ) , T L = ∅ ,ν | T L | − F ( T L ) , T R = ∅ , , otherwise . (4.5)Then g ( T ) = 1, h ( T ) = 0, and, for k >

2, using (3.1) and (3.4), E g ( T k ) = 2 k (cid:0) − ν k − (cid:1) = µ k = O (1) , (4.6) E h ( T k ) = 2 k E (cid:0) ν k − − F ( T k − ) (cid:1) = 0 , (4.7)and, using Lemma 4.1,Var h ( T k ) = 2 k E (cid:0) ν k − − F ( T k − ) (cid:1) = 2 k σ k − = O (log k ) . (4.8)Let, for an arbitrary binary tree T , G ( T ) := X v ∈ T g ( T v ) and H ( T ) := X v ∈ T h ( T v ) , (4.9)so by (2.4), F ( T ) = G ( T ) + H ( T ) . (4.10) Lemma 4.2.

For n > , E G ( T n ) = ν n , (4.11) E H ( T n ) = 0 , (4.12)Var H ( T n ) = O ( n ) . (4.13) Proof.

By [16, Theorem 3.4], cf. (3.7), and (4.7), E H ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) E h ( T k ) + E h ( T n ) = 0 , (4.14)which proves (4.12). This implies (4.11), since by (4.10), E G ( T n ) = E F ( T n ) − E H ( T n ) = ν n . (4.15) SVANTE JANSON

Similarly, by [16, Theorem 3.9], cf. (4.3), and (4.7)–(4.8),Var H ( T n ) Cn (cid:18) ∞ X k =1 log kk / (cid:19) + sup k > log kk + 0 ! = O ( n ) . (cid:3) We shall see that this means that H ( T n ) is asymptotically negligible, andthus it suﬃces to consider G ( T n ).Note that g ( T ) depends only on the sizes | T L | and | T R | . This enables usto easily estimate the variance of G ( T n ). Theorem 4.3.

For all n > , Var G ( T n ) = 4 α n log n + O ( n ) . (4.16) Proof.

Write g ( T ) = g ( | T | , | T L | , | T R | ). (We only care about g ( k, j, l ) when j + l = k −

1, but use three arguments for emphasis.) Thus g ( k, , k −

1) = g ( k, k − ,

0) = 1 − ν k − and otherwise g ( k, j, k − j −

1) = 0. Let, as in [16,Theorem 1.29], I k be uniformly distributed on { , . . . , k − } and ψ k := E (cid:0) ν I k + ν k − − I k + g ( k, I k , k − − I k ) − ν k (cid:1) = 1 k k − X j =1 ( ν j + ν k − − j − ν k ) + 2 k (cid:0) ν k − + 1 − ν k − − ν k (cid:1) = 1 k k − X j =1 ( ν j + ν k − − j − ν k ) + 2 k ( ν k − = O (1) + 2 k (cid:0) αk + O (1) (cid:1) = 2 α k + O (1) , (4.17)where we used that ν j = αj + O (1) by Theorem 1.1. By [16, Lemma 7.1],thenVar G ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) ψ k + ψ n = ( n + 1) n − X k =1 α k + O (1)( k + 1)( k + 2) + O ( n ) = ( n + 1) n − X k =1 α k + O ( n )= 4 α n log n + O ( n ) . (4.18) (cid:3) We can now prove (1.3) in Theorem 1.2. (Higher moments are treated inSection 6.)

Theorem 4.4.

For all n > , Var F ( T n ) = 4 α n log n + o ( n log n ) . (4.19)This follows from (4.10), (4.16) and (4.13) by Minkowski’s inequality (thetriangle inequality for √ Var ).

AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 9 Asymptotic normality

We prove the central limit theorem Theorem 1.3 by a martingale centrallimit theorem for a suitable martingale that we construct in this section.Consider the inﬁnite binary tree T ∞ , where each node has two children,and denote its root by o . We may regard any binary tree T as a subtree of T ∞ with the same root o . (In the general sense that the node set V ( T ) is asubset of V ∞ := V ( T ∞ ), and that the left and right children are the same asin T ∞ , when they exist.) In particular we regard the random binary searchtree T n as a subtree of T ∞ .Order the nodes in T ∞ in breadth-ﬁrst order as v (1) = o, v (2) , . . . , and let V j := { v (1) , . . . , v ( j ) } be the set of the ﬁrst j nodes. Let F j be the σ -ﬁeldgenerated by the sizes |T n,v, L | and |T n,v, R | of the two child subtrees of T n ateach node v ∈ V j . Equivalently, we may regard V j as the internal nodes ina full binary tree; let ∂V j be the corresponding set of j + 1 external nodes.Then F j is generated by the subtree sizes |T n,v | for all v ∈ ∂V j , togetherwith the indicators { v ∈ T n } , v ∈ V j , that describe T n ∩ V j . (We regardthe subtree T n,v as deﬁned for all v ∈ V ∞ , with T n,v = ∅ if v / ∈ T n .) Then,conditioned on F j , T n consists of some given subtree of V j together withattached subtrees T n,v at all nodes v ∈ ∂V j ; these are independent binarysearch trees of some given orders.We allow here j = 0; V = ∅ and F is the trivial σ -ﬁeld. Remark 5.1.

As is well-known, see e.g. [9], another construction of therandom binary search tree T n ( n >

1) is to let the random variable I n beuniformly distributed on { , . . . , n − } , and to let T n be deﬁned recursivelysuch that, given I n , T n, L and T n, R are independent binary search trees with |T n, L | = I n and |T n, R | = n − − I n . (When the tree is used to sort n keys, I n tells how many of the keys that are assigned to the left subtree.) The pair( I n , n − − I n ) thus tells how the tree is split at the root, and there is asimilar pair for each node. Then F j is generated by these pairs (i.e., splits)for the nodes v , . . . , v j .Recall that g ( T ) by (4.4) depends only on the sizes | T L | and | T R | . Hence, F j speciﬁes the value of g ( T n,v ) for every v ∈ V j , and it follows that E (cid:0) G ( T n ) | F j (cid:1) = E (cid:16) X v ∈ V ∞ g ( T n,v ) (cid:12)(cid:12)(cid:12) F j (cid:17) = X v ∈ V j g ( T n,v ) + X v ∈ ∂V j ν |T n,v | . (5.1)Since the sequence of σ -ﬁelds ( F j ) ∞ is increasing, the sequence M n,j := E (cid:0) G ( T n ) | F j (cid:1) , j >

0, is a martingale (for any ﬁxed n ). It follows from (5.1)that the martingale diﬀerences are∆ M n,j := M n,j − M n,j − = g ( T n,v ( j ) ) + ν |T n,v ( j ) L | + ν |T n,v ( j ) R | − ν |T n,v ( j ) | , (5.2)where v ( j ) L and v ( j ) R are the children of v ( j ). It follows easily that, with ψ k deﬁned in (4.17), E (cid:0) | ∆ M n,j | | F j − (cid:1) = E (cid:0) | ∆ M n,j | | |T n,v ( j ) | (cid:1) = ψ |T n,v ( j ) | . (5.3) Consequently, the conditional square function is given by W n := ∞ X j =1 E (cid:0) | ∆ M n,j | | F j − (cid:1) = X v ∈ V ∞ ψ |T n,v | = X v ∈T n ψ |T n,v | . (5.4)(It suﬃces to sum over v ∈ T n , since ψ = 0.) This is again a sum ofthe same type as (2.4) and (4.9), for the random tree T n . (Note that thetoll function ψ | T | here depends only on the size of T .) In particular, [16,Theorem 3.4] applies (in this case we can also use [7], [8] or [13]); this yields E W n = ( n + 1) n − X k =1 k + 1)( k + 2) ψ k + ψ n . (5.5)If j is large enough, say j > n , then V ( T n ) ⊆ V j and thus M n,j = G ( T n ).In particular, G ( T n ) = M n, ∞ . Thus, by a standard (and simple) martingaleidentity, Var G ( T n ) = Var M n, ∞ = E W n ; hence (5.5) yields the ﬁrst equalityin (4.18). (This is no coincidence; the proof just given of (5.5) is essentiallythe same as the proof of [16, Lemma 7.1] that was used in (4.18), but statedin martingale formulation.)We now split the sum G ( T n ) into two parts, roughly corresponding tosmall and large clades. We ﬁx a cut-oﬀ N = N ( n ); for deﬁniteness andsimplicity we choose N = N ( n ) := √ n , but we note that the argumentsbelow hold with a few minor modiﬁcations for any N > √ n with N = o ( √ n log n ). We then deﬁne, for binary trees T , g ′ ( T ) := g ( T ) {| T | N } (5.6) g ′′ ( T ) := g ( T ) {| T | > N } = g ( T ) − g ′ ( T ) . (5.7)In analogy with (2.4) and (4.9), we deﬁne further G ′ ( T ) := X v ∈ T g ′ ( T v ) and G ′′ ( T ) := X v ∈ T g ′′ ( T v ); (5.8)thus G ( T ) = G ′ ( T ) + G ′′ ( T ). We shall see that, asymptotically, both G ′ ( T n )and G ′′ ( T ) contribute to the variance with equal amounts, but nevertheless G ′′ ( T n ) is negligible (in probability).We begin with the main term G ′ ( T n ). Lemma 5.2. As n → ∞ , Var (cid:0) G ′ ( T n ) (cid:1) = 2 α n log n + O ( n ) , (5.9) G ′ ( T n ) − E G ′ ( T n ) p α n log n d −→ N (0 , . (5.10) Proof.

We deﬁne ν ′ n := E G ′ ( T n ). Note that g ′ ( T ) depends only on thesizes | T L | and | T R | . Hence we can repeat the argument above and deﬁnea martingale M ′ n,j := E (cid:0) G ′ ( T n ) | F j (cid:1) , j >

0, with G ′ ( T n ) = M ′ n, ∞ andmartingale diﬀerences ∆ M ′ n,j = ϕ ′ ( T n,v ( j ) ) , (5.11) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 11 where we deﬁne, cf. (5.2), ϕ ′ ( T ) := g ′ ( T ) + ν ′| T L | + ν ′| T R | − ν ′| T | . (5.12)By [16, Theorem 3.4] again, cf. (3.7) and (5.5), using E g ( T k ) = µ k = O (1)by (4.6), ν ′ m = ( m + 1) m − X k =1 k + 1)( k + 2) E g ′ ( T k ) + E g ′ ( T m )= ( m + 1) ( m − ∧ N X k =1 k + 1)( k + 2) E g ( T k ) + O (1)= ( m + 1) N X k =1 k + 1)( k + 2) µ k + O (1) . (5.13)Hence, (5.12) yields, after cancellations, ϕ ′ ( T ) = g ′ ( T ) + O (1) = ( g ( T ) + O (1) , | T | N,O (1) , | T | > N. (5.14)Let ψ ′ k := E | ϕ ′ ( T k ) | . (5.15)Then, by (5.14), (4.4) and (3.5), cf. (4.17), ψ ′ k = ( E (cid:0) g ( T k ) + O (1) (cid:1) = 2 α k + O (1) , k N,O (1) , k > N. (5.16)Furthermore, by (5.11) and (5.15), E (cid:0) | ∆ M ′ n,j | | F j − (cid:1) = E (cid:0) | ϕ ′ ( T n,v ( j ) ) | | |T n,v ( j ) | (cid:1) = ψ ′|T n,v ( j ) | . (5.17)Hence, the conditional square function of ( M ′ n,j ) j is W ′ n := ∞ X j =1 E (cid:0) | ∆ M ′ n,j | | F j − (cid:1) = X v ∈ V ∞ ψ ′|T n,v | = X v ∈T n ψ ′|T n,v | . (5.18)Yet another application of [16, Theorem 3.4] yields, using (5.16), E W ′ n = ( n + 1) n − X k =1 k + 1)( k + 2) ψ ′ k + ψ ′ n = ( n + 1) N X k =1 α k ( k + 1)( k + 2) + O ( n )= 4 α n log N + O ( n ) = 2 α n log n + O ( n ) . (5.19)Since Var G ′ ( T n ) = Var (cid:0) M ′ n, ∞ (cid:1) = E W ′ n , (5.9) follows from (5.19). Moreover, the representation (5.18) and [16, Theorem 3.9] (again sum-ming only to n , as we may) yield, noting that the toll function ψ ′| T | dependsonly on the size of T , using (5.16),Var( W ′ n ) Cn n X k =1 ( ψ ′ k ) k C n N X k =1 C n n X k =1 k = O ( nN ) = O ( n ) . (5.20)Hence, Var (cid:0) W ′ n / ( n log n ) (cid:1) → n → ∞ , which together with (5.19) implies W ′ n n log n p −→ α . (5.21)Note also that g ( T ) = O ( | T | ) by (4.4) and (3.5), and thus (5.14) implies ϕ ′ ( T ) = O ( N ) for all trees T . Thus (5.11) yieldssup j | ∆ M n,j |√ n log n = O (cid:16) N √ n log n (cid:17) = o (1) . (5.22)We now apply the central limit theorem for martingale triangular arrays,in the form in [5, Corollary 1] (see also [15, Theorem 3.1]), which shows that(5.21) and (5.22) together imply G ′ ( T n ) − E G ′ ( T n ) √ n log n = M n, ∞ − E M n, ∞ √ n log n d −→ N (cid:0) , α (cid:1) . (5.23)(Actually, [5, Corollary 1] assumes instead of (5.22) only a conditionalLindeberg condition, which is a trivial consequence of the uniform bound(5.22).) (cid:3) Remark 5.3.

We used the breadth-ﬁrst order above as just one convenientorder. It is perhaps more natural to consider instead of the sets V j arbitrarynode sets V of (ﬁnite) subtrees of T ∞ that include the root o . This wouldgive us, instead of ( M n,j ) j , a martingale indexed by binary trees. However,we have no use for this exotic object here, and use instead the standardmartingales above. Lemma 5.4. E | G ′′ ( T n ) | = O (cid:0) √ n (cid:1) , (5.24)Var( G ′′ ( T n )) = 2 α n log n + O ( n ) . (5.25) Proof.

By (5.7), (4.4) and (4.6), E | g ′′ ( T k ) | = | E g ( T k ) | · { k > N } = O (1) · { k > N } (5.26)and thus, using the triangle inequality and [16, Theorem 3.4], E | G ′′ ( T n ) | ( n + 1) n − X N k + 1)( k + 2) E | g ′′ ( T k ) | + E | g ′′ ( T n ) | = O (cid:16) nN (cid:17) , yielding (5.24). AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 13

For the variance, we use either [16, Theorem 1.29] as in the proof ofTheorem 4.4, or the (essentially equivalent) martingale argument in (5.11)–(5.19) and conclude that, with some ψ ′′ k satisfying ψ ′′ k = ( O (1) , k N, E (cid:0) g ( T k ) + O (1) (cid:1) = 2 α k + O (1) , k > N, (5.27)we have Var G ′′ ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) ψ ′′ k + ψ ′′ n = ( n + 1) n − X k = ⌊ N ⌋ +1 α kk + O ( n )= 4 α n log( n/N ) + O ( n ) = 2 α n log n + O ( n ) . (cid:3) Proof of Theorem 1.3.

It follows from (5.24) that G ′′ ( T n ) − E G ′′ ( T n ) p α n log n p −→ , (5.28)which together with (5.10) yields G ( T n ) − E G ( T n ) p α n log n d −→ N (0 , . (5.29)Similarly, (4.13) implies H ( T n ) − E H ( T n ) p α n log n p −→ , (5.30)which together with (5.29) yields (1.5), recalling X n = F ( T n ) = G ( T n ) + H ( T n ) by (4.10). (cid:3) Proof of Theorem 1.4. (i). Deﬁne, similarly to (5.6)–(5.7), f ′ ( T ) := f ( T ) {| T | N } , f ′′ ( T ) := f ( T ) {| T | > N } , (5.31) h ′ ( T ) := h ( T ) {| T | N } , h ′′ ( T ) := h ( T ) {| T | > N } , (5.32)and corresponding sums F ′ ( T ) := P v ∈ T f ′ ( T v ) and similarly F ′′ ( T ), H ′ ( T ), H ′′ ( T ). The argument in (2.1)–(2.4) is easily modiﬁed and shows that X Nn = F ′ ( T n ) = G ′ ( T n ) + H ′ ( T n ) . (5.33)The same proof as for Lemma 4.2 yields alsoVar H ′ ( T n ) = O ( n ) and Var H ′′ ( T n ) = O ( n ) . (5.34)Hence, (1.6) follows from Lemma 5.2 and (5.33).Furthermore, X n − X Nn = F ′′ ( T n ) = G ′′ ( T n ) + H ′′ ( T n ) . (5.35) By (5.33) and (5.35), (1.7) follows from (5.9) and (5.25), using (5.34) andMinkowski’s inequality. Similarly, E | X n − X Nn | E | G ′′ ( T n ) | + E | H ′′ ( T n ) | = O ( √ n ) , (5.36)using (5.24), (5.34) and H¨older’s inequality, together with E H ′′ ( T n ) = 0,which is proved as (4.12).(ii). The conclusions of (i) hold by the same proofs (with some minormodiﬁcations in some estimates).Moreover, let Z n,k be the number of clades of size k + 1. Then, for n > E Z n,k =  nk ( k +1)( k +2) , k < n, n , k = n, , k > n, (5.37)see [6, Theorem 1]. (This can be seen as another example of [16, Theorem3.4].) Consequently, P ( X n = X Nn ) P (cid:16) X k>N Z n,k > (cid:17) E X k>N Z n,k = n − X ⌊ N ⌋ +1 nk ( k + 1)( k + 2) + 2 n = O (cid:16) nN (cid:17) + O (cid:16) n (cid:17) = o (1) , (5.38)which completes the proof. (cid:3) Higher moments

We begin the proof of Theorem 1.5 by proving a weaker estimate. We let k X k p := ( E X p ) /p for any random variable X . Recall that ν n := E F ( T n ). Lemma 6.1.

For any ﬁxed real p > , and all n > , E (cid:12)(cid:12) F ( T n ) − ν n (cid:12)(cid:12) p C ( p ) n p − . (6.1) Equivalently, (cid:13)(cid:13) F ( T n ) − ν n (cid:13)(cid:13) p = O ( n − /p ) . (6.2) Proof.

Fix p > m > C i belowmay depend on p but not on m .) Let V j and F j be as in Section 5, andwrite V ′ m := V m − , F ′ m := F m − . Thus ∂V ′ m consists of the 2 m nodes in T ∞ of depth m , and V ′ m consists of the 2 m − T , F ( T ) = X v ∈ V ′ m f ( T v ) + X v ∈ ∂V ′ m F ( T v ) . (6.3) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 15

Furthermore, by (1.2), X v ∈ ∂V ′ m ν | T v | = X v ∈ ∂V ′ m (cid:0) α | T v | + O (1) (cid:1) = α X v ∈ ∂V ′ m | T v | + O (2 m )= α | T | + O (2 m ) = ν | T | + O (2 m ) . (6.4)Hence, by combining (6.3) and (6.4), F ( T ) − ν | T | = X v ∈ V ′ m f ( T v ) + X v ∈ ∂V ′ m (cid:0) F ( T v ) − ν | T v | (cid:1) + O (2 m ) . (6.5)We shall use this decomposition for the binary search tree T n . Note ﬁrstthat by (3.2)–(3.3), E | f ( T n ) | p n p P (cid:0) f ( T n ) = 0 (cid:1) n p − . (6.6)(This holds for any p > p = 1.)Hence, for any v ∈ V ∞ , E (cid:0) | f ( T n,v ) | p (cid:12)(cid:12) |T n,v | (cid:1) |T n,v | p − n p − , (6.7)and thus E | f ( T n,v ) | p n p − . (6.8)Let Y := P v ∈ V ′ m f ( T n,v ) be the ﬁrst sum in (6.5) for T = T n . By Minkowski’sinequality and (6.8), k Y k p X v ∈ V ′ m k f ( T n,v ) k p m /p n ( p − /p . (6.9)Let Z := P v ∈ ∂V ′ m (cid:0) F ( T n,v ) − ν |T n,v | (cid:1) be the second sum in (6.5) for T = T n .The σ -ﬁeld F ′ m speciﬁes the sizes of the subtrees T n,v for v ∈ ∂V ′ m , andconditioned on F ′ m , these subtrees are independent and distributed as T n ( v ) of the given sizes n ( v ). Hence, conditionally on F ′ m , the terms in the sum Z are independent and have means zero, so we can apply Rosenthal’s inequality[14, Theorem 3.9.1], which yields E (cid:0) | Z | p | F ′ m (cid:1) C X v ∈ ∂V ′ m E (cid:0) | F ( T n,v ) − ν |T n,v | | p | F ′ m (cid:1) + C (cid:16) X v ∈ ∂V ′ m E (cid:0) | F ( T n,v ) − ν |T n,v | | | F ′ m (cid:1)(cid:17) p/ . (6.10)We note ﬁrst that by (1.3), E (cid:0) | F ( T n,v ) − ν |T n,v | | | F ′ m (cid:1) C |T n,v | log |T n,v | C |T n,v | log n, (6.11)and thus X v ∈ ∂V ′ m E (cid:0) | F ( T n,v ) − ν |T n,v | | | F ′ m (cid:1) C X v ∈ ∂V ′ m |T n,v | log n C n log n. (6.12) Hence the second term on the right-hand side in (6.10) is C ( n log n ) p/ .Taking the expectation in (6.10) we thus obtain E | Z | p C X v ∈ ∂V ′ m E | F ( T n,v ) − ν |T n,v | | p + C ( n log n ) p/ . (6.13)Let A n := E | F ( T n ) − ν n | p . We can write (6.5) for T = T n as F ( T n ) − ν n = Y + Z + O (2 m ) . (6.14)Thus, by Minkowski’s inequality, (6.9) and (6.13), A n = E (cid:12)(cid:12) Y + Z + O (2 m ) (cid:12)(cid:12) p p (cid:0) E | Y | p + E | Z | p + O (2 m ) (cid:1) C mp n p − + C E | Z | p + C m C E | Z | p + C mp n p − . (6.15)Furthermore, (6.13) can be written E | Z | p C X v ∈ ∂V ′ m E A |T n,v | + C ( n log n ) p/ . (6.16)We prove the lemma by induction, and assume that A k Ck p − forall k < n . Since |T n,v | < n for every v ∈ ∂V ′ m , (6.16) and the inductivehypothesis yield E | Z | p C C X v ∈ ∂V ′ m E |T n,v | p − + C ( n log n ) p/ . (6.17)If v is a child of the root, then |T n,v | is uniformly distributed on { , . . . , n − } ,so |T n,v | d = ⌊ nU ⌋ nU , where U ∼ U (0 ,

1) is uniformly distributed on [0 , m , it follows that for any v ∈ ∂V ′ m , |T n,v | n m Y i =1 U i , (6.18)with U , . . . , U m independent and U (0 , E |T n,v | p − E (cid:16) n p − m Y i =1 U p − i (cid:17) = n p − m Y i =1 E U p − i = n p − (1 /p ) m , (6.19)since E U p − i = R u p − d u = 1 /p . There are 2 m nodes in ∂V ′ m , and thus(6.17) yields E | Z | p C C m (1 /p ) m n p − + C ( n log n ) p/ , (6.20)which together with (6.15) yields, since ( n log n ) p/ = O ( n p − ) when p > A n C C C (2 /p ) m n p − + C C ( n log n ) p/ + C mp n p − C C C (2 /p ) m n p − + C mp n p − . (6.21)Now choose m such that (2 /p ) m C C < / p > C := 2 mp +1 C . With these choices, (6.21) yields A n Cn p − + Cn p − = Cn p − . (6.22) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 17

In other words, we have proved the inductive step: A k Ck p − for k < n implies A n Cn p − . Consequently, this is true for all n >

0, i.e., (6.1) holds.(The initial cases n = 0 and n = 1 are trivial, since A = A = 0.) (cid:3) Lemma 6.2.

For any ﬁxed real p > , as n → ∞ , k F ( T n ) k p ∼ αn, (6.23) k f ( T n ) k p ∼ /p αn − /p . (6.24) Proof.

By Minkowski’s inequality, (6.2) and (1.2), (cid:13)(cid:13) F ( T n ) (cid:13)(cid:13) p = (cid:12)(cid:12) E F ( T n ) (cid:12)(cid:12) + O ( n − /p ) = αn + O ( n − /p ) ∼ αn, (6.25)which is (6.23).For n >

2, it follows from (2.2) that E | f ( T n ) | p = 2 n E | − F ( T n − ) | p = 2 n k F ( T n − ) − k pp ∼ α p n p − , (6.26)since (6.23) obviously implies also k F ( T n ) − k p ∼ αn . (cid:3) The idea in the proof of Theorem 1.5 is to approximate E | X n − E X n | p = E (cid:12)(cid:12)P v (cid:0) f ( T n,v ) − E f ( T n,v ) (cid:1)(cid:12)(cid:12) p by E P v (cid:12)(cid:12) f ( T n,v ) − E f ( T n,v ) (cid:12)(cid:12) p , or simpler by E P v (cid:12)(cid:12) f ( T n,v ) (cid:12)(cid:12) p = P v E (cid:12)(cid:12) f ( T n,v ) (cid:12)(cid:12) p . The heuristic reason for this is that themoment E (cid:12)(cid:12)P v (cid:0) f ( T n,v ) − E f ( T n,v ) (cid:1)(cid:12)(cid:12) p is dominated by the event when thereis one large term (corresponding to one large clade, cf. the discussion beforeTheorem 1.5), and then (cid:12)(cid:12)(cid:12)X v (cid:0) f ( T n,v ) − E f ( T n,v ) (cid:1)(cid:12)(cid:12)(cid:12) p ≈ X v (cid:12)(cid:12) f ( T n,v ) − E f ( T n,v ) (cid:12)(cid:12) p ≈ X v | f ( T n,v ) | p . (6.27)We shall justify this in several steps. We begin by ﬁnding the expectationof the ﬁnal sum in (6.27), cf. the sought result (1.8). Lemma 6.3. As n → ∞ , E X v ∈T n | f ( T n,v ) | p ∼ pp − α p n p − . (6.28) Proof.

We apply again [16, Theorem 3.4] and obtain E X v ∈T n | f ( T n,v ) | p = ( n + 1) n − X k =1 k + 1)( k + 2) E | f ( T k ) | p + E | f ( T n ) | p . (6.29)By (6.26), 2( k + 1)( k + 2) E | f ( T k ) | p ∼ k · α p k p − = 4 α p k p − (6.30) as k → ∞ , and it follows that, as n → ∞ , using p > E X v ∈T n | f ( T n,v ) | p ∼ ( n + 1) n − X k =1 α p k p − + 2 α p n p − ∼ n α p p − n p − + 2 α p n p − = 2 pp − α p n p − . (cid:3) Next we take again some m > n , and thus, since p > k Z k p C (2 /p ) m/p n − /p + O (cid:0) ( n log n ) / (cid:1) = C (2 /p ) m/p n − /p + o (cid:0) n − /p (cid:1) . (6.31)Consequently, by (6.14) and Minkowski’s inequality, (cid:12)(cid:12) k F ( T n ) − ν n k p − k Y k p (cid:12)(cid:12) k Z k p + O (2 m ) = C (2 /p ) m/p n − /p + o (cid:0) n − /p (cid:1) . (6.32)In particular, (6.32) and (6.2) imply k Y k p = O ( n − /p ). By the mean valuetheorem, | x p − y p | p | x − y | max { x p − , y p − } (6.33)for any x, y >

0; hence (6.32) implies, using also (6.2) again, E | F ( T n ) − ν n | p − E | Y | p = O (cid:0) (2 /p ) m/p n p − (cid:1) + o (cid:0) n p − (cid:1) . (6.34)Let δ > J v bethe indicator of the event that v is green and |T n,v | > δn . (The idea is thatthe signiﬁcant contributions only come from nodes v with J v = 1.) Lemma 6.4.

For each ﬁxed m > and δ > , and all n > , P (cid:16) X v ∈ V ′ m J v > (cid:17) m +1 δ − n − = O (cid:0) n − (cid:1) , (6.35) P (cid:16) X v ∈ V ′ m J v > (cid:17) m +1 δ − n − = O (cid:0) n − (cid:1) . (6.36) Proof.

We use again the σ -ﬁelds F j from Section 5. Since F j − speciﬁes |T n,v j | , but not how this subtree is split at v j , we have P ( J v j = 1 | F j − ) |T n,v j | {|T n,v j | > δn } δn , (6.37)and thus, by taking the expectation, P ( J v j = 1) / ( δn ). Since there are < m nodes in V ′ m , (6.35) follows.Furthermore, for any two nodes v i and v j with i < j , J v i is determinedby F j − , and (6.37) thus gives also P ( J v i J v j = 1 | F j − ) = E ( J v i J v j | F j − ) = J v i P ( J v j = 1 | F j − ) δn J v i . (6.38) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 19

Thus, by taking the expectation and using (6.37) again, P ( J v i J v j = 1) / ( δn ) . Summing over the less than (cid:0) m (cid:1) < m − pairs ( v i , v j ) with v i , v j ∈ V ′ m yields (6.36). (cid:3) Proof of Theorem 1.5.

We show this in several steps.

Step 1.

Deﬁne Y := X v ∈ V ′ m J v f ( T n,v ) . (6.39)Since f ( T n,v ) = 0 unless v is green, we have Y − Y = X v ∈ V ′ m (1 − J v ) f ( T n,v ) = X v ∈ V ′ m f ( T n,v ) {|T n,v | < δn } . (6.40)For each v , it follows from (6.6) by conditioning on |T n,v | that E (cid:12)(cid:12) f ( T n,v ) {|T n,v | < δn } (cid:12)(cid:12) p δn ) p − . (6.41)Hence, (6.40) and Minkowski’s inequality yield (cid:12)(cid:12) k Y k p − k Y k p (cid:12)(cid:12) k Y − Y k p X v ∈ V ′ m k f ( T n,v ) {|T n,v | < δn }k p m +1 /p ( δn ) − /p . (6.42)Thus k Y k p = O ( n − /p ) + O (2 m δ − /p n − /p ), and (6.33) yields E | Y | p − E | Y | p = O (cid:0) (2 m δ − /p + 2 mp δ p − ) n p − (cid:1) . (6.43) Step 2.

Similarly, using (6.41) again, E (cid:16) X v ∈ V ′ m | f ( T n,v ) | p − X v ∈ V ′ m J v | f ( T n,v ) | p (cid:17) = X v ∈ V ′ m E (cid:0) | f ( T n,v ) | p {|T n,v | < δn } (cid:1) m +1 ( δn ) p − . (6.44) Step 3.

By (6.39), | Y | p − P v ∈ V ′ m | J v f ( T n,v ) | p = 0 unless P v ∈ V ′ m J v >

Deﬁne F ( p ) ( T ) := P v ∈ T | f ( T v ) | p . Then, in analogy with (6.3), F ( p ) ( T ) = X v ∈ V ′ m | f ( T v ) | p + X v ∈ ∂V ′ m F ( p ) ( T v ) . (6.47) Note that Lemma 6.3 implies E F ( p ) ( T n ) = O ( n p − ). Hence, by ﬁrst condi-tioning on F ′ m , and using (6.19), E X v ∈ ∂V ′ m F ( p ) ( T n,v ) C E X v ∈ ∂V ′ m |T n,v | p − = C (2 /p ) m n p − . (6.48)Taking T = T n in (6.47) and taking the expectation, we thus ﬁnd E X v ∈T n | f ( T n,v ) | p − E X v ∈ V ′ m | f ( T n,v ) | p = O (cid:0) (2 /p ) m n p − (cid:1) . (6.49) Step 5.

Finally, combining (6.34), (6.43), (6.46), (6.44), (6.49) and (6.28),we obtain E | F ( T n ) − ν n | p = 2 pp − α p n p − + O (cid:0) (2 /p ) m/p n p − (cid:1) + O (cid:0) m δ − /p n p − (cid:1) + O (cid:0) mp δ p − n p − (cid:1) + o ( n p − ) . (6.50)For any ε >

0, we can make each of the error terms on the right-hand sideless than εn p − by ﬁrst choosing m large and then δ small, and ﬁnally n large. Consequently, E | F ( T n ) − ν n | p = pp − α p n p − + o ( n p − ). (cid:3) Proof of (1.4) . Now p = k is an integer. If k is even, then (1.4) is the sameas (1.8), so we may assume that p = k > x, y . Thus for any random variables X and Y , using also H¨older’s inequality, E | X p − Y p | p E (cid:0) | X − Y | | X | p − + | X − Y | | Y | p − (cid:1) p k X − Y k p (cid:0) k X k p − p + k Y k p − p (cid:1) . (6.51)It is now easy to modify the proof of Theorem 1.5 and obtain E (cid:0) F ( T n ) − ν n (cid:1) p = E X v ∈T n f ( T n,v ) p + o (cid:0) n p − (cid:1) . (6.52)Furthermore, it follows from (2.2) that f ( T ) | T | = 1. Hence, X v ∈T n f ( T n,v ) p = − X v ∈T n | f ( T n,v ) | p + O ( n ) . (6.53)The estimate (1.4) now follows from (6.52), (6.53) and (6.28). (cid:3) Proof of Lemma 3.2

Deﬁne a chain of length k in a (binary) tree T to be a sequence of k nodes v · · · v k such that v i +1 is a (strict) descendant of v i for each i = 1 , . . . , k − v , . . . , v k are some nodes (in order) on some path from theroot. We say that the chain v · · · v k is green if all nodes v , . . . , v k are green.(The nodes between the v i ’s may have any colour.) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 21

For a binary tree T and k >

1, let F k ( T ) be the number of green chains v · · · v k in T , and let f k ( T ) be the number of such chains where v is theroot. Obviously, cf. (2.4), F k ( T ) = X v ∈ T f k ( T v ) . (7.1)These functionals are useful to us because of the following simple relations,that are cases of inclusion-exclusion. Lemma 7.1.

For any binary tree T , f ( T ) = ∞ X k =1 ( − k − f k ( T ) , (7.2) F ( T ) = ∞ X k =1 ( − k − F k ( T ) . (7.3) Proof.

Let v be a node in T and consider the contribution to the sum in(7.3) of all chains with ﬁnal node v k = v . This is clearly 0 if 1 if v is notgreen, and it is 1 if v is a maximal green node; furthermore, if v is greenbut has j > P ji =0 (cid:0) ji (cid:1) ( − i = (1 − j = 0. Hence the right-hand side of (7.3) is thenumber of maximal green nodes, i.e., F ( T ).For (7.2) we can argue similarly: Both sides are 0 unless the root o isgreen. If it is, the chain o gives contribution 1, and by inclusion-exclusion,the chains with a given ﬁnal node v = o yield together a ycontribution − v is green and there are no green nodes between v and o , and 0otherwise. Hence the sum equals f ( T ) by (2.2). (Alternatively, (7.2) followsby induction from (7.3), (2.4) and (7.1).) (cid:3) Lemma 7.2.

For every k > , E f k ( T ) = k ( k + 3)( k + 1)( k + 2) · k − k ! = 2 k − k ! − k ( k + 2)! . (7.4) Proof.

We use the construction of T = ˜ T τ in Section 2, which we formulateas follows. Consider again the inﬁnite binary tree T ∞ , and grow ˜ T t as asubtree of T ∞ , cf. Section 5. To do this, we equip each node v in T ∞ withtwo clocks C L ( v ) and C R ( v ). These are started when v is added to thegrowing tree ˜ T t , and each chimes after a random time with an exponentialdistribution with mean 1; when the clock chimes we add a left or right child,respectively, to v . There is also a doomsday clock C , started at 0 and withthe same Exp(1) distribution; when it chimes (at time τ ), the process isstopped and the tree ˜ T τ is output. All clocks are independent of each other.Fix a chain v · · · v k in the inﬁnite tree T ∞ , with v = o , the root. Let ℓ i > v i and v i +1 . We compute the probabilitythat v · · · v k is a green chain in T = ˜ T τ by following the construction of˜ T t as time progresses, checking in several steps whether still v · · · v k is a candidate for a green chain, and computing the probability of this. (We usethroughout the proof the Markov property and the memoryless property ofthe exponential distribution.) We assume for notational convenience thatthe path from v to v k always uses the left child of each node. (By symmetry,this does not aﬀect the result.)1. If k >

1, we ﬁrst need that v = o has a left child but no right child(in order to be green); in particular, of the three clocks C L ( v ), C R ( v ), C that run from the beginning, C L ( v ) has to chime ﬁrst. This has probability1 / v gets a left child w . If ℓ >

0, we needa left child of w , and still no right child at v . (But we do not care whetherwe get a right child at w or not.) Hence we need that C L ( w ) chimes ﬁrstamong the three clocks C L ( w ), C R ( v ), C (ignoring all other clocks). Thishas probability 1 / ℓ nodes; thus, the total probability that steps 1 and2 succeed is 3 − ( ℓ +1) .3. This takes us to v . If k >

2, we need a left child but no right child at v , and still no right child at v . Hence, the next chime from the four clocks C L ( v ), C R ( v ), C R ( v ), C has to come from C L ( v ). This has probability1 / ℓ nodes between v and v ; again the prob-ability of success at each of these nodes is 1 /

4. Hence the probability thatSteps 3 and 4 succeed is 4 − ( ℓ +1) .5. Steps 3 and 4 are repeated for v i for each i < k , yielding a probability( i + 2) − ( ℓ i +1) of success for each i .6. Finally, we have obtained v k , and wait for the doomsday clock. Untilit chimes, we must not get any right child at v , . . . , v k − , and we must getat most one child at v k . Hence, among the k + 2 clocks C R ( v ) , . . . , C R ( v k ), C L ( v k ), C , the next chime must be either from C (probability 1 / ( k + 2)),or from C L ( v k ) or C R ( v k ), followed by C (probability k +2 · k +1 ). Theprobability of success in this step is thus1 k + 2 + 2 k + 2 · k + 1 = k + 3( k + 1)( k + 2) . (7.5)Combining the six steps above, we see that the probability that v · · · v k is a green chain in ˜ T τ is k + 3( k + 1)( k + 2) k − Y i =1 (cid:16) i + 2 (cid:17) ℓ i +1 . (7.6) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 23

Given ℓ , . . . , ℓ k − , there are Q k − i =1 ℓ i +1 choices of the chain v · · · v k , allwith the same probability, so summing over all ℓ , . . . , ℓ k − >

0, we obtain E f k ( T ) = k + 3( k + 1)( k + 2) k − Y i =1 ∞ X ℓ i =0 (cid:16) i + 2 (cid:17) ℓ i +1 = k + 3( k + 1)( k + 2) k − Y i =1 i = k + 3( k + 1)( k + 2) · k − ( k − k ( k + 3)( k + 1)( k + 2) · k − k ! . (cid:3) Proof of Lemma 3.2.

By Lemmas 7.1 and 7.2, and a simple calculation, E f ( T ) = ∞ X k =1 ( − k − E f k ( T ) = ∞ X k =1 (cid:18) ( − k − k ! + ( − k ( k + 2)! (cid:19) = 1 − e − , noting that we may take the expectation inside the sum since it also followsfrom Lemma 7.2 that P ∞ k =1 E | f k ( T ) | = P ∞ k =1 E f k ( T ) < ∞ . (cid:3) Recall that this, together with Lemma 3.1, completes our probabilisticproof of Theorem 1.1.

Remark 7.3.

If we in the proof above change the doomsday clock and letit have an arbitrary rate λ >

0, and denote the resulting random binary treeby T ( λ ) , then the same argument yields E f k ( T ( λ ) ) = k + λ + 2( k + λ )( k + λ + 1) k − Y i =1 ∞ X ℓ i =0 (cid:16) i + λ + 1 (cid:17) ℓ i +1 = k + λ + 2( k + λ )( k + λ + 1) k − Y i =1 i + λ −

1= ( k + λ − k + λ + 2)( k + λ )( k + λ + 1) 2 k − λ k = 2 k − λ k − k λ k +2 . (7.7)Thus by Lemma 7.1, letting F denote the conﬂuent hypergeometric func-tion, see e.g. [18, §§ E f ( T ( λ ) ) = ∞ X k =1 ( − k − E f k ( T ( λ ) ) = ∞ X k =1 (cid:18) ( − k − λ k + ( − k λ k +2 (cid:19) = − (cid:0) F (1; λ ; − − (cid:1) + 14 (cid:16) F (1; λ ; − − (cid:16) − λ + 2 · λ ( λ + 1) (cid:17)(cid:17) = 14 + λ − λ ( λ + 1) − F (1; λ ; − . (7.8)Furthermore, if λ > E F ( T ( λ ) ) by the same method; theonly diﬀerence is that we also allow a path of length ℓ > v , which gives an additional factor (1 + λ ) − ℓ for each v · · · v k , leading to E F k ( T ( λ ) ) = ∞ X ℓ =0 (cid:18) λ + 1 (cid:19) ℓ E f k ( T ( λ ) ) = λ + 1 λ − E f k ( T ( λ ) ) , (7.9)and hence, using both parts of Lemma 7.1, E F ( T ( λ ) ) = ∞ X k =1 ( − k − E F k ( T ( λ ) ) = λ + 1 λ − E f ( T ( λ ) ) . (7.10)Moreover, a simple argument shows that, for any n > P ( |T ( λ ) | = n ) = n Y i =2 ii + λ · λn + 1 + λ = λn !(2 + λ ) n , (7.11)and conditioned on |T ( λ ) | = n , T ( λ ) has the same distribution as T n , i.e.,( T ( λ ) | |T ( λ ) | = n ) d = T n . Hence, E F ( T ( λ ) ) = ∞ X n =1 λn !(2 + λ ) n ν n , (7.12)which can be interpreted as an unusual type of generating function for thesequence ( ν n ); note that (7.10) and (7.8) yield an explicit expression for it. References [1] David Aldous, Asymptotic fringe distributions for general families ofrandom trees.

Ann. Appl. Probab. (1991), no. 2, 228–266.[2] David Aldous, Probability distributions on cladograms. Random Dis-crete Structures (Minneapolis, MN, 1993) , 1–18, IMA Vol. Math. Appl.,76, Springer, New York, 1996.[3] Michael G. B. Blum and Olivier Fran¸cois, Minimal clade size and exter-nal branch length under the neutral coalescent.

Adv. in Appl. Probab. (2005), no. 3, 647–662.[4] Michael G. B. Blum, Olivier Fran¸cois and Svante Janson, The mean,variance and limiting distribution of two statistics sensitive to phyloge-netic tree balance. Ann. Appl. Probab. (2006), no. 4, 2195–2214.[5] B. M. Brown and G. K. Eagleson, Martingale convergence to inﬁnitelydivisible laws with ﬁnite variances. Trans. Amer. Math. Soc. (1971),449–453.[6] Huilan Chang and Michael Fuchs, Limit theorems for patterns in phy-logenetic trees.

J. Math. Biol. (2010), no. 4, 481–512.[7] Luc Devroye, Limit laws for local counters in random binary searchtrees. Random Structures Algorithms (1991), no. 3, 303–315.[8] Luc Devroye, Limit laws for sums of functions of subtrees of randombinary search trees. SIAM J. Comput. (2002/03), no. 1, 152–171.[9] Michael Drmota, Random Trees . Springer, Vienna, 2009.

AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 25 [10] Michael Drmota, Michael Fuchs and Yi-Wen Lee, Limit laws for thenumber of groups formed by social animals under the extra clusteringmodel. (Extended abstract.)

Proceedings, 2014 Conference on Analysisof Algorithms, AofA ’14 (Paris, 2014) , DMTCS Proceedings , 2014.[11] Eric Durand, Michael G. B. Blum and Olivier Fran¸cois, Prediction ofgroup patterns in social mammals based on a coalescent model.

J. The-oret. Biol. (2007), no. 2, 262–270.[12] Eric Durand and Olivier Fran¸cois, Probabilistic analysis of a genealog-ical model of animal group patterns.

J. Math. Biol. (2010), no. 3,451–468.[13] Philippe Flajolet, Xavier Gourdon and Conrado Mart´ınez, Patterns inrandom binary search trees. Random Structures Algorithms (1997),no. 3, 223–244.[14] Allan Gut, Probability: A Graduate Course , 2nd ed., Springer, NewYork, 2013.[15] P. Hall and C. C. Heyde,

Martingale Limit Theory and its Application .Academic Press, New York, 1980.[16] Cecilia Holmgren and Svante Janson, Limit laws for functions of fringetrees for binary search trees and recursive trees. Preprint, 2014. arXiv:1406.6883v1 [17] J. F. C. Kingman, The coalescent.

Stochastic Process. Appl. (1982),no. 3, 235–248.[18] NIST Handbook of Mathematical Functions . Edited by Frank W. J.Olver, Daniel W. Lozier, Ronald F. Boisvert and Charles W. Clark.Cambridge Univ. Press, 2010.Also available as

NIST Digital Library of Mathematical Functions , http://dlmf.nist.gov/ Department of Mathematics, Uppsala University, PO Box 480, SE-751 06Uppsala, Sweden

E-mail address : [email protected] URL : ∼∼