[PDF] Clades and clans: a comparison study of two evolutionary models

Abstract

The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model are two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions of clade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important in hypothesis testing and Bayesian analyses in phylogenetics. Here we show that these distributions are log-convex, which implies that very large clades or very small clades are more likely to occur under these two models. Moreover, we prove that there exists a critical value κ(n) for each n⩾4 such that for a given clade with size k , the probability that this clade is contained in a random tree with n leaves generated under the YHK model is higher than that under the PDA model if 1<k<κ(n) , and lower if κ(n)<k<n . Finally, we extend our results to binary unrooted trees, and obtain similar results for the distributions of clan sizes.

Full PDF

aa r X i v : . [ q - b i o . P E ] J u l Journal of Mathematical Biology manuscript No. (will be inserted by the editor)

Clades and clans: a comparison study of two evolutionary models

Sha Zhu · Cuong Than · Taoyang Wu

Received: date / Accepted: date

Abstract

The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) modelare two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions ofclade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important inhypothesis testing and Bayesian analyses in phylogenetics. Here we show that these distributions are log-convex, whichimplies that very large clades or very small clades are more likely to occur under these two models. Moreover, we provethat there exists a critical value k ( n ) for each n > k , the probability that this cladeis contained in a random tree with n leaves generated under the YHK model is higher than that under the PDA modelif 1 < k < k ( n ) , and lower if k ( n ) < k < n . Finally, we extend our results to binary unrooted trees, and obtain similarresults for the distributions of clan sizes. Keywords

Phylogenetic trees · Null models · Clade · Clan · Log-convexity

Distributions of genealogical features such as shapes, subtrees, and clades are of interest in phylogenetic and populationgenetics. By comparing biological data with these distributions, which can be derived from null models such as the Yule-Harding-Kingman (YHK) model and proportional to distinguishable arrangements (PDA) model, we can obtain insightsinto macro-evolutionary processes underlying the data (Felsenstein, 2004; Mooers and Heard, 1997, 2002; Nordborg,

SZ was supported in part by the New Zealand Marsden Fund, CT by the National Science Foundation contract DBI-1146722, and TW by the SingaporeMOE grant R-146-000-134-112.Sha ZhuWellcome Trust Centre for Human Genetics, University of Oxford, United KingdomE-mail: [email protected] ThanDepartment of Computer Science, University of Tuebingen, GermanyE-mail: [email protected] Wu ( B )School of Computing Sciences, University of East Anglia, United KingdomE-mail: [email protected] Sha Zhu et al. n >

4, neither the PDA model nor the YHKmodel gives rise to a uniform prior on clades (Steel and Pickett, 2006). As an attempt to further elucidate these relation-ships, in this paper we study the distributions of clade sizes in the PDA model, and then conduct a comparison study ofthese distributions with those in the YHK model. In addition, we conduct a similar study on clans, the counterpart ofclades for unrooted trees.The remainder of the paper is organized as follows. Sections 2 and 3 contain necessary notation and backgroundused in the paper and a brief review of the YHK and PDA models. We then present in Section 4 the results concerningclade probabilities under the two null models, and those related to clan probabilities in Section 5. Finally, we concludein Section 6 with discussions and remarks. lades and clans: a comparison study of two evolutionary models 3

Fig. 1

Example of a rooted phylogenetic tree (left) and an unrooted phylogenetic tree (right).

In this section, we present some basic notation and background concerning phylogenetic trees and log-convexity thatwill be used in this paper. From now on, X will be used to denote the leaf set, and we assume that X is a ﬁnite set of size n = | X | > tree is a connected acyclic graph. A vertex will be referred to as a leaf if its degree is one, and an interior vertex otherwise. An unrooted tree is binary if all interior vertices have degree three. A rooted tree is a tree that has exactly onedistinguished node designated as the root , which is usually denoted by r . A rooted tree is binary if the root has degreetwo and all other interior vertices have degree three.A phylogenetic tree on X is a binary tree with leaves bijectively labeled by elements of X . The set of rooted andunrooted phylogenetic trees on X are denoted by T X and T ∗ X , respectively. Two examples of phylogenetic trees on X = { , . . . , } , one rooted and the other unrooted, are presented in Figure 1.Let T be a rooted phylogenetic tree on X . Given two vertices v and u in tree T , u is below v if v is contained in thepath between u and the root of T . In this case, we also say u is a descendant of v if v and u are distinct. A clade of T isa subset of X that contains precisely all the leaves below a vertex in T . A clade A is called trivial if | A | = | A | = X holds, and non-trivial otherwise. Since T has 2 n − n − n + X = { , . . . , } depicted in Figure 1 has 13 clades: the ﬁve non-trivialones are { , } , { , } , { , , , } , { , } and { , , } .Suppressing the root of a tree T in T X , that is, removing r and replacing the two edges incident with r with anedge connecting the two vertices adjacent to r , results in an unrooted tree in T ∗ X , which will be denote by r − ( T ) . Forinstance, for the rooted tree T and unrooted tree T ∗ in Figure 1, we have T ∗ = r − ( T ) . Note that for each T ∗ in T ∗ X ,there are precisely 2 n − T in T X such that T ∗ = r − ( T ) holds.Recall that a split A | B on X is a bipartition of X into two disjoint non-empty sets A and B , that is, A ∩ B = /0 and A ∪ B = X . Let T ∗ be an unrooted tree in T ∗ X . Every edge e of T ∗ induces a necessarily unique split A | B of X obtainedas the two sets of leaves separated by e . In other words, the path between a pair of leaves in X contains e if and only ifone of these two leaves is in A and the other one is in B . In this case, we say A | B is a split contained in T ∗ . A clan A of Sha Zhu et al. T ∗ is a subset of X such that A | ( X \ A ) is a split contained in T ∗ . Since T ∗ has 2 n − ( n − ) clans.2.2 Log-convexityA sequence { y , . . . , y m } of real numbers is called positive if each number contained in the sequence is greater than zero.It is called log-convex if y k − y k + > y k holds for 2 k m −

1. Clearly, a positive sequence { y k } k m is log-convexif and only if the sequence { y k + / y k } k m − is increasing. Therefore, a log-convex sequence of positive numbers isnecessarily unimodal , that is, there exists an index 1 k m such that y > y > · · · > y k and y k y k + · · · y m (1)hold. Recall that a sequence { y i } i m is also called unimodal if y y · · · y k and y k > y k + > · · · > y m hold forsome 1 k m . However, in this paper, unimodal is always referred to the situation speciﬁed in Eq. (1).For later use, we end this section with the following results concerning log-convex sequences (see, e.g. Liu and Wang(2007)). Lemma 1 If { y i } i m and { y ′ i } i m are two positive and log-convex sequences, then the sequences { y i + y ′ i } i m and { y i · y ′ i } i m are positive and log-convex. (cid:3) In this section, we present a formal deﬁnition of the two null models investigated in this paper: the proportional todistinguishable arrangements (PDA) model and

Yule–Harding–Kingman (YHK) model.To begin with, recall that the number of rooted phylogenetic trees with leaf set X with n = | X | is j ( n ) : = ( n − ) !! = · · · · ( n − ) = ( n − ) !2 n − ( n − ) ! . Here we will use the convention that j ( ) =

1. Under the PDA model, each tree has the same probability to be generated,that is, we have P PDA ( T ) = j ( n ) (2)for every T in T X .Under the Yule–Harding model, a rooted phylogenetic tree on X is generated as follows. Beginning with a two leafedtree, we “grow” it by repeatedly splitting a leaf into two new leaves. The splitting leaf is chosen randomly and uniformlyamong all the present leaves in the current tree. After obtaining an unlabeled tree with n leaves, we label each of itsleaves with a label sampled randomly uniformly (without replacement) from X . When branch lengths are ignored, theYule–Harding model is shown by Aldous (1996) to be equivalent to the trees generated by Kingman’s coalescent process,and so we call it the YHK model. Under this model, the probability of generating a tree T in T X is (Semple and Steel,2003): P YHK ( T ) = n − n ! (cid:213) v ∈ ˚ V ( T ) l v , (3) lades and clans: a comparison study of two evolutionary models 5 where ˚ V ( T ) is the set of interior nodes of T , and l v is the number of interior nodes of T that are below v . For example,the probability of the rooted tree in Figure 1 is 2 − / ( × × × ) . For an unrooted tree T ∗ in T ∗ X , let r ( T ∗ ) denote the set of rooted trees T in T X with T ∗ = r − ( T ) . As notedpreviously in Section 2, T ∗ can be obtained from each of the 2 n − T in r ( T ∗ ) by removing the root of T . Using this correspondence scheme, a probability measure P on T X induces a probability measure P u on the set T ∗ X .That is, we have P u ( T ∗ ) = (cid:229) T ∈ r ( T ∗ ) P ( T ) . (4)In particular, let P uYHK and P uPDA denote the probability measures on T ∗ X induced by P YHK and P PDA , respectively.Note that this implies P uPDA ( T ∗ ) = j ( n − ) (5)for every T ∗ in T ∗ X . Since the number of unrooted phylogenetic trees on X is | T ∗ X | = j ( n − ) = ( n − ) !!, each tree in T ∗ X has the same probability under P uPDA .We end this section with a property of the PDA and YHK models that will play an important role in obtaining ourresults. Recall that a probability measure P on T X has the exchangeability property if P depends only on tree shapes,that is, if two rooted trees T ′ and T can be obtained from each other by permuting their leaves, then P ( T ) = P ( T ′ ) holds.Similarly, a probability measure on T ∗ X has the exchangeability property if it depends only on tree shapes. It is well-known that both P YHK and P PDA , the probability measures on the set of rooted trees T X induced by the YHK and PDAmodels, have the exchangeability property (Aldous, 1996), By Eqs. (5) and (4), we can conclude that the probabilitymeasures P uYHK and P uPDA on the set of unrooted trees T ∗ X also have the exchangeability property. In this section, we shall present our main results on clade probabilities. To this end, we need some further notation anddeﬁnitions. Given a rooted binary tree T , let I T ( A ) =  , if A is a clade of T , , otherwise, (6)be the ‘indicator’ function that maps a subset A of X to 1 if A is a clade of T , and 0 otherwise. Now for a subset A of X ,the probability of X being a clade of a random tree sampled according to a probability distribution P on T X is deﬁned as P ( A ) = (cid:229) T ∈ T X P ( T ) I T ( A ) . (7)Since (cid:229) A ⊆ X I T ( A ) = n − T ∈ T X and (cid:229) T ∈ T X P ( T ) =

1, we have (cid:229) A ⊆ X P ( A ) = (cid:229) A ⊆ X (cid:229) T ∈ T X P ( T ) I T ( A ) = (cid:229) T ∈ T X P ( T ) (cid:229) A ⊆ X I T ( A ) = n − . By the last equation, we note that each probability measure P on T X induces a measure on the set of all subsets of X ,which can be normalized to a probability measure by a factor of 1 / ( n − ) . Sha Zhu et al.

The above deﬁnitions on a subset of X can be extended to a collection of subsets of X . That is, given a collection ofsubsets { A , . . . , A k } of X , we have I T ( A , . . . , A m ) = I T ( A ) · · · I T ( A m ) , (8)and P ( A , . . . , A m ) = (cid:229) T ∈ T X P ( T ) (cid:0) I T ( A ) · · · I T ( A m ) (cid:1) . (9)Note that I T ( A , . . . , A m ) = A i is a clade of T for 1 i m . On the other hand, it is well known (see,e.g. Semple and Steel (2003)) that given a collection of subsets { A , . . . , A k } of X , there exists a tree T ∈ T X with I T ( A , . . . , A m ) = { A , . . . , A k } forms a hierarchy , that is, A i ∩ A j ∈ { /0 , A i , A j } holds for 1 i < j m .The following result shows that if a probability measure depends only on tree shapes, then the clade probabilitiesderived from it are also independent of the ‘labeling’ of the elements. Lemma 2

Let P be a probability measure on T X that has the exchangeability property. Then for each pair of subsets Aand A ′ of X with | A | = | A ′ | , we have P ( A ) = P ( A ′ ) and P ( A , X \ A ) = P ( A ′ , X \ A ′ ) . (10) Proof

Suppose that A and A ′ are two subsets of X that have the same size. Then there exists a permutation p on X suchthat A ′ = A p : = { p ( x ) | x ∈ A } . Now for each tree T in T X , let T p be the tree obtained from T by relabeling the leaves of T according to permutation p . Then A is a clade of T if and only if A p is a clade of T p . Together with Eq. (7), we have P ( A ) = (cid:229) T ∈ T X P ( T ) I T ( A ) = (cid:229) T ∈ T X P ( T ) I T p ( A p )= (cid:229) T ∈ T X P ( T p ) I T p ( A p ) = (cid:229) T p ∈ T X P ( T p ) I T p ( A p ) = P ( A p ) , where the third equality follows from the exchangeability property of P . This shows P ( A ) = P ( A ′ ) , and a similar argu-ment leads to P ( A , X \ A ) = P ( A ′ , X \ A ′ ) . (cid:3) Since P YHK has the exchangeability property, by Lemma 2 we know that P YHK ( A ) is determined by the size of A only. Therefore, we denote p n ( a ) = P YHK ( A ) , as the probability that a random tree in T X , where n = | X | , induces a speciﬁc clade A of size a under the YHK model.Similarly, we let q n ( a ) = P PDA ( A ) , be the probability that a random tree in T X induces a speciﬁc clade A of size a under the PDA model. In addition, wealso denote p n ( a , n − a ) = P YHK ( A , X \ A ) , and q n ( a , n − a ) = P PDA ( A , X \ A ) , the probabilities that both A and X \ A are clades of a tree in T X generated under the YHK and PDA models, respectively.Note that if both A and X \ A are clades of a tree T , then they are precisely the clades consisting of the leaves below thetwo children of the root of T . lades and clans: a comparison study of two evolutionary models 7 Corollary 1

Let P be a probability measure on T X that has the exchangeability property. For each a n, theexpected number of clades with size a contained in a random tree sampled according to P is (cid:18) na (cid:19) P ( A ) , where A is an arbitrary subset of X with | A | = a.Proof Denote the collection of subsets of X with size a by X a and ﬁx a subset A ∈ X a . Let Z T ( a ) : = (cid:229) Y ∈ X a I T ( Y ) bethe number of clades with size a contained in a tree T . Then the expected number of clades with size a contained in arandom tree sampled according to P is given by (cid:229) T ∈ T X P ( T ) Z T ( a ) = (cid:229) T ∈ T X (cid:229) Y ∈ X a P ( T ) I T ( Y ) = (cid:229) Y ∈ X a (cid:229) T ∈ T X P ( T ) I T ( Y ) = (cid:229) Y ∈ X a P ( Y ) = (cid:18) na (cid:19) P ( A ) , where the last equality holds because by Lemma 2 we have P ( Y ) = P ( A ) for all Y ∈ X a . (cid:3) p n ( a ) and p n ( a , n − a ) , which was discovered and rediscovered several times in the literature(see, e.g., Blum and Francois (2005); Brown (1994); Heard (1992); Rosenberg (2003, 2006)). Theorem 1

For a positive integer a n − we have: (i) p n ( a ) = na ( a + ) (cid:0) na (cid:1) − . (ii) p n ( a , n − a ) = n − (cid:0) na (cid:1) − . By the above results, we show below that clade probabilities under the YHK model form a log-convex sequence.This implies that the clades with small or large size are more likely to be generated than those with middle size underthe model.

Theorem 2

For n > , the sequence { p n ( a ) } a n and { p n ( a , n − a ) } a < n are log-convex. Moreover, let D ( n ) : = r n + (cid:16) n − (cid:17) + n −

34 ; then we have (i) p n ( a ) > p n ( a + ) for a D ( n ) , and p n ( a ) < p n ( a + ) for a > D ( n ) , and (ii) p n ( a , n − a ) > p n ( a + , n − a − ) for a n / and p n ( a , n − a ) < p n ( a + , n − a − ) for a > n / .Proof Let y a = na ( a + ) for 1 a n − y n =

1, and y ′ a = (cid:0) na (cid:1) − for 1 a n . Since { y a } a n and { y ′ a } a n areboth log-convex, by Lemma 1 and Theorem 1 we can conclude that the sequence { p n ( a ) } a n is log-convex. A similarargument shows that { p n ( a , n − a ) } a < n is also log-convex.By Theorem 1, we have p n ( a + ) p n ( a ) = a ( a + ) (cid:0) na (cid:1) ( a + )( a + ) (cid:0) na + (cid:1) = a ( a + )( a + )( n − a ) , for 1 a n −

2. The last equation is less than or equal to 1 if and only if a ( a + ) ( a + )( n − a ) ⇐⇒ a − ( n − ) a − n . Therefore, p n ( a + ) p n ( a ) if and only if a D ( n ) . This establishes Part (i) of the theorem.Part (ii) of the theorem follows from the fact that (cid:0) na (cid:1) < (cid:0) na + (cid:1) for a n / (cid:0) na (cid:1) > (cid:0) na + (cid:1) for a > n / (cid:3) Sha Zhu et al.

Theorem 3

For a positive integer a n − we have: (i) q n ( a ) = j ( a ) j ( n − a + ) j ( n ) = (cid:0) n − a − (cid:1)(cid:0) n − a − (cid:1) − . (ii) q n ( a , n − a ) = j ( a ) j ( n − a ) j ( n ) = ( n − a − ) (cid:0) n − a − (cid:1)(cid:0) n − a − (cid:1) − . Proof

To derive the formula for q n ( a ) , it sufﬁces to show that there are j ( a ) j ( n − a + ) trees in A , the subset of treesin T X containing A as a clade, because the probability of each tree in T X is 1 / j ( n ) . Without loss of generality, we canassume that X = { , , · · · , n } and A = { n − a + , · · · , n } . Let X ′ : = ( X − A ) ∪ { n − a + } = { , , · · · , n − a , n − a + } ;then each tree in A can be generated by the following two steps: picking up a tree in T X ′ and replacing the leaf withlabel n − a + T A . In addition, a different choice of trees in the ﬁrst step or the second step will result ina different tree in A . Since there are j ( n − a + ) possible choices in the ﬁrst step and j ( a ) ones in second step, we canconclude that the number of trees A is j ( a ) j ( n − a + ) . In addition, using the fact that j ( m ) = ( m − ) !! = ( m − ) !!2 m − ( m − ) !holds for m >

1, we have q n ( a ) = j ( a ) j ( n − a + ) j ( n ) = ( a − ) ! ( n − a ) ! ( n − ) ! ( n − ) ! ( a − ) ! ( n − a ) ! = (cid:18) n − a − (cid:19)(cid:18) n − a − (cid:19) − . The proof of the formula for q n ( a , n − a ) is similar to the one for q n ( a ) . Let A ∗ be the collection of the trees in T X containing both A and X − A as clades. Then a tree in A ∗ is uniquely determined by choosing a tree in T A , andsubsequently another tree from T X − A . This implies the number of trees in A ∗ is j ( a ) j ( n − a ) . Hence q n ( a , n − a ) = j ( a ) j ( n − a ) j ( n ) = ( n − a − ) q n ( a )= ( n − a − ) (cid:18) n − a − (cid:19)(cid:18) n − a − (cid:19) − . (cid:3) Recall that in Theorem 2 we show that clade probabilities under the YHK model form a log-convex sequence. Herewe establish a similar result for the PDA model, which implies that the sequences { q n ( a ) } a < n and { q n ( a , n − a ) } a < n are also unimodal. Theorem 4

For n > , the sequence { q n ( a ) } a n and { q n ( a , n − a ) } a < n are log-convex. Moreover, we have (i) q n ( a + ) > q n ( a ) when a > n / , and q n ( a + ) q n ( a ) when a n / . (ii) q n ( a + , n − a − ) > q n ( a , n − a ) when a > ( n − ) / , and q n ( a + , n − a − ) > q n ( a , n − a ) when a ( n − ) / . lades and clans: a comparison study of two evolutionary models 9 Proof

By Theorem 3 and q n ( n ) =

1, for 1 a < n we have q n ( a + ) q n ( a ) = a − n − a − , which is greater than or equal to 1 when 2 a − > n − a −

1, or equivalently when a > n /

2. Thus Part (i) follows.Moreover, we have q n ( a + ) q n ( a − ) q n ( a ) = (cid:16) a − a − (cid:17)(cid:16) n − a + n − a − (cid:17) > , for 2 a < n , and hence { q n ( a ) } a n is log-convex.Similarly, we have q n ( a + , n − a − ) q n ( a , n − a ) = (cid:16) n − a − n − a − (cid:17)(cid:16) q n ( a + ) q n ( a ) (cid:17) = a − n − a − , which is greater than or equal to 1 when 2 a − > n − a −

3, or equivalently when a > ( n − ) /

2. Moreover, we have q n ( a + , n − a − ) q n ( a − , n − a + ) q n ( a , n − a ) = (cid:16) a − a − (cid:17)(cid:16) n − a − n − a − (cid:17) > , and hence { q n ( a ) } a < n is log-convex. (cid:3) p n ( a ) and q n ( a ) ,the probabilities of a speciﬁc (and ﬁxed) clade of size a under the YHK and PDA models, respectively. As an example,consider the ratio of p n ( a ) / q n ( a ) with n =

30 as depicted in Figure 2. Then it is clear that, except for a = p n ( a ) = q n ( a ) =

1, the ratio is strictly decreasing and is less than 1 when a is greater than certain value. This ‘phasetransition’ type phenomenon holds for all n >

3, as the following theorem shows.

Theorem 5

For n > , there exists a number k ( n ) in [ , n − ] , such that p n ( a ) > q n ( a ) for a < k ( n ) , and p n ( a ) < q n ( a ) for k ( n ) < a n − .Proof Let g n ( a ) = p n ( a ) q n ( a ) = na ( a + ) (cid:18) n − a − (cid:19)(cid:18) na (cid:19) − (cid:18) n − a − (cid:19) − . Using the identity (cid:0) mk + (cid:1) = m − kk + (cid:0) mk (cid:1) , we obtain g n ( a + ) g n ( a ) = a ( a + )( n − a − )( a + )( a − )( n − a ) . We have a ( a + )( n − a − ) < ( a + )( a − )( n − a ) ⇐⇒ a > nn + , and hence g n ( a ) > g n ( a + ) for 2 n / ( n + ) < a n −

2. Since 2 n / ( n + ) <

2, we have g n ( ) > g n ( ) > · · · > g n ( n − ) .It is easy to see that for n > g n ( ) = ( n − ) ( n − ) > g n ( n − ) = ( n − ) n ( n − ) < . This and the fact that g n ( a ) is strictly decreasing on [ , n − ] imply the existence of the number k ( n ) in the theorem. (cid:3) . . . . . . . . p n ( a ) / q n ( a ) p n ( a , n − a ) / q n ( a , n − a ) Fig. 2

Plots of the ratios p n ( a ) / q n ( a ) and p n ( a , n − a ) / q n ( a , n − a ) , with n =

30 and a = ,... , Next, we consider p n ( a , n − a ) and q n ( a , n − a ) . Note that by deﬁnition, both p n ( a ) and q n ( a , n − a ) are symmetricabout n /

2, as demonstrated by the plot of the ratio p n ( a , n − a ) / q n ( a , n − a ) with n =

30 in Figure 2. In addition, theﬁgure shows that the ratio is strictly increasing on the interval [ , ⌊ n / ⌋ ] (and by the symmetry of the ratio, it is strictlydecreasing on the interval [ ⌈ n / ⌉ , n − ] ). This observation is made precise and rigorous in the following theorem. Theorem 6

For n > , there exists a number l ( n ) in [ , ⌊ n / ⌋ ] , such that p n ( a , n − a ) < q n ( a , n − a ) for a l ( n ) ,and p n ( a , n − a ) > q n ( a , n − a ) for l ( n ) < a ⌊ n / ⌋ .Proof Let h n ( a ) = p n ( a , n − a ) q n ( a , n − a ) = ( n − a − ) n − (cid:18) n − a − (cid:19)(cid:18) na (cid:19) − (cid:18) n − a − (cid:19) − . Then h n ( a + ) h n ( a ) = ( a + )( n − a − )( a − )( n − a ) > , where the last inequality follows from the observation that ( a + )( n − a − ) − ( n − a )( a − ) = ( n − a − ) > a ⌊ n / ⌋ −

1. This implies that the function h n ( a ) is strictly increasing on the interval [ , ⌊ n / ⌋ ] .Thus, it now sufﬁces to show that h n ( ) h n ( ⌊ n / ⌋ ) > l ( n ) . Wehave h n ( ) = p n ( , n − ) q n ( , n − ) = ( n − ) n ( n − ) < , if n >

3. Let k = ⌊ n / ⌋ . If n is even (i.e., k = n / k > h k ( k ) = ( k − k − )( k − ) (cid:18) k − k − (cid:19)(cid:18) kk (cid:19) − (cid:18) k − k − (cid:19) − = (cid:18) k − k − (cid:19)(cid:18) k − k − (cid:19) − > . lades and clans: a comparison study of two evolutionary models 11 − . . . . . . . . . p n ( a , n − a ) / p n ( a ) − q n ( a , n − a ) / q n ( a ) Fig. 3

Plot of function u n ( a ) with n = The inequality in the last equation can be seen as follows. Let A and B be two sets, each having ( k − ) elements. Thenumber of subsets of A ∪ B that have k − A and B is (cid:0) k − k − (cid:1) . On the other hand, the total numberof ( k − ) -subsets of A ∪ B is (cid:0) k − k − (cid:1) .If n is odd (i.e., k = ( n − ) / h k + ( k ) = ( k + ) k (cid:18) k k − (cid:19)(cid:18) k + k (cid:19) − (cid:18) kk − (cid:19) − = k + k (cid:18) k k − (cid:19) k k + (cid:18) kk − (cid:19) − = (cid:18) k k − (cid:19)(cid:18) kk − (cid:19) − . Using the same argument as in proving h k ( k ) >

1, we also have h k + ( k ) > k > (cid:3) Let A be a ﬁxed subset of X with size a , where 1 a n −

1. In the previous two theorems, we present comparisonresults for P ( A ) and P ( A , X \ A ) under the YHK and PDA models. We end this subsection with a comparison study of P ( A , X \ A ) / P ( A ) , that is, the probability that a tree T ∈ T X sampled according to probability measure P contains both A and X \ A as its clades (which means that A and X \ A are the clades below the two children of the root of T ), giventhat A is a clade of T . To this end, let u n ( a ) = p n ( a , n − a ) p n ( a ) − q n ( a , n − a ) q n ( a ) = a ( a + ) n ( n − ) − n − a − u n ( a ) as it indicates a ‘phase transitions’ between these two models. For instance, considering the values of u n ( a ) for n =

30 as depicted in Figure 3, then there exists a unique change of sign. Indeed, the observation that there exists aunique change of sign of u n ( a ) holds for general n , as the following theorem shows. Theorem 7

For n > , there exists t ( n ) ∈ [ , n − ] such that u n ( a ) if a t ( n ) and u n ( a ) > if a > t ( n ) .Proof Consider the function f n ( x ) = x ( x + ) n ( n − ) − n − x − , x ∈ R . Clearly f n ( x ) agrees with u n ( a ) when x = a . Then f ′ n ( x ) = x + n ( n − ) − ( n − x − ) = t ( n − t ) − n ( n − ) n ( n − )( n − t ) , where t = x +

1. The sign of f ′ n ( x ) thus depends on the sign of g n ( t ) = t ( n − t ) − n ( n − ) . We see that g n ( t ) is a polynomial of t of degree 3, and hence it can have at most three (real) roots. On the other hand,for n >

3, we have: g n ( ) = − n ( n − ) < , g n ( ) = n + ( n − ) > , g n ( n − ) = − n ( n − ) − < , and lim t → ¥ g n ( t ) = ¥ . Therefore, g n ( t ) has exactly three roots t ∈ ( , ) , t ∈ ( , n − ) , and t > n −

1. Note further that g n ( n ) = n − n ( n − ) = n (( n − ) + ) >

0, and hence t > n . Denoting x i = ( t i − ) / i

3, then we have f ′ n ( x ) = x ∈ { x , x , x } , f ′ n ( x ) < x ∈ ( − ¥ , x ) ∪ ( x , x ) , and f ′ n ( x ) > x ∈ ( x , x ) ∪ ( x , ¥ ) . Since x = ( t − ) / < f n ( a ) = u n ( a ) , the sign of f ′ n ( x ) implies that u n ( ) < u n ( ) < · · · < u n ( ⌊ x ⌋ ) . Similarly, we also have u n ( ⌈ x ⌉ ) > · · · > u n ( n − ) > u n ( n − ) . It is easy to see that for n > u n ( ) = n ( n − ) − n − = − ( n − )( n − ) n ( n − )( n − ) , u n ( n − ) = n ( n − ) n ( n − ) − n − ( n − ) − = . Since x = ( t − ) / < n − x = ( t − ) / > n − ⌈ x ⌉ n − < x . This implies that u n ( ⌈ x ⌉ ) > · · · > u n ( n − ) > u n ( n − ) =

0. Therefore, there exists a positive number t ( n ) ∈ [ , x ] such that u n ( a ) a t ( n ) and u n ( a ) > a > t ( n ) . (cid:3) X , and then show that the twoindicator variables I T ( A ) and I T ( B ) are positively correlated. Theorem 8

Let A , . . . , A k be k disjoint (nonempty) subsets of X. Denoting | A | + · · · + | A k | by m, then we have P PDA ( A , . . . , A k ) = j ( n − m + k ) (cid:213) ki = j ( | A i | ) j ( n ) . lades and clans: a comparison study of two evolutionary models 13 Proof

We ﬁrst compute the number of trees that have A , . . . , A k as clades. To this end, note that such a tree can beconstructed in two steps:1. Build a tree on (cid:16) X \ S ki = A i (cid:17) ∪ { x , . . . , x k } , where x ′ , . . . , x ′ k are leaves not in X serving as “placeholders” used inthe second step.2. Replace each x ′ i with a tree in T A i .There are j ( n − m + k ) different choices for a tree in the ﬁrst step, and (cid:213) ki = j ( | A i | ) different ways to replace x ′ , . . . , x ′ k by trees in T A , . . . , T A k in the second step. Therefore the number of trees that have A , . . . , A k as clades is j ( n − m + k ) (cid:213) ki = j ( | A i | ) . Together with the fact that each tree in T X is chosen with probability 1 / j ( n ) under the PDA model, thisimplies the theorem. (cid:3) Note that | A | + · · · + | A k | = n when A , . . . , A k form a partition of X . Therefore, we obtain the following result as asimple consequence of Theorem 8 (see Theorem 5.1 in Zhu et al (2011) for a parallel result on the YHK model). Corollary 2

If A , . . . , A k form a partition of X, then P PDA ( A , . . . , A k ) = j ( k ) (cid:213) ki = j ( | A i | ) j ( n ) . Theorem 8 is a general result concerning a collection of clades. When there are only two clades, the below theoremprovides a more detailed analysis.

Theorem 9

Let A and B be two subsets of X with a b, where a = | A | and b = | B | . Then we have P PDA ( A , B ) =  j ( a ) j ( n − b + ) j ( b − a + ) j ( n ) , if A ⊆ B, j ( a ) j ( b ) j ( n − a − b + ) j ( n ) , if A and B are disjoint, , otherwise.Proof The ﬁrst case follows by applying Theorem 2 twice. The second case is a special case of Theorem 8. The thirdcase holds because if A ∩ B

6∈ { A , B , /0 } , then there exists no tree that contains both A and B as its clades. (cid:3) To establish the last result of this subsection, we need the following technical lemma.

Lemma 3

Let m , n , m ′ , n ′ be positive numbers with ( m − m ′ )( n − n ′ ) > , then j ( m ′ + n ′ ) j ( m + n ) > j ( m + n ′ ) j ( m ′ + n ) . (11) In particular, if a b b ′ a ′ are positive numbers with a + a ′ = b + b ′ , then we have j ( a ) j ( a ′ ) > j ( b ) j ( b ′ ) . (12) Proof

To establish the ﬁrst claim, we may assume m > m ′ and n > n ′ , as the proof of the other case, m m ′ and n n ′ ,is similar. Now Eqn. (11) holds because we have j ( m + n ) j ( m + n ′ ) = ( ( m + n ) − ) · ( ( m + n ) − ) · · · · ( ( m + n ′ ) − ) · ( ( m + n ′ ) − ) · · · · = ( m + n − )( m + n − ) · · · ( m + n ′ + )( m + n ′ − ) (13) > ( m ′ + n − )( m ′ + n − ) · · · ( m ′ + n ′ + )( m ′ + n ′ − ) (14) = j ( m ′ + n ) j ( m ′ + n ′ ) . Here Eq. (13) follows from n > n ′ and Eq. (14) from m > m ′ .The second assertion follows from the ﬁrst one by setting m ′ = n ′ = a / m = b − a / n = b ′ − a / (cid:3) We end this section with the following result, which says that the random variables I T ( A ) and I T ( B ) are positivelycorrelated when A and B are compatible, that is, A ∩ B ∈ { /0 , A , B } . Theorem 10

Let A and B be two compatible non-empty subsets of X; then P PDA ( A , B ) > P PDA ( A ) P PDA ( B ) . Proof

Set a = | A | and b = | B | . By symmetry we may assume without loss of generality that a b holds. Since A and B are compatible, we have either A ∩ B = /0 or A ⊆ B .Since n − a − b + n − b + n − a + n , by Lemma 3 we have j ( n ) j ( n − a − b + ) > j ( n − b + ) j ( n − a + ) , and hence j ( a ) j ( b ) j ( n − a − b + ) j ( n ) > j ( b ) j ( n − b + ) j ( n ) j ( a ) j ( n − a + ) j ( n ) . Together with Theorem 9, this shows that the theorem holds for the case A ∩ B = /0.On the other hand, noting that b − a + b n and b − a + n − a + n holds, by Lemma 3 we have j ( n ) j ( b − a + ) > j ( b ) j ( n − a + ) , and hence j ( a ) j ( b − a + ) j ( b ) j ( b ) j ( n − b + ) j ( n ) > j ( b ) j ( n − b + ) j ( n ) j ( a ) j ( n − a + ) j ( n ) . Together with Theorem 9, this shows that the theorem holds for the case A ⊆ B , as required. (cid:3) In this section, we study clan probabilities, the counterpart of clade probabilities for unrooted trees. To this end, given asubset A ⊆ X and an unrooted tree T ∗ ∈ T ∗ X , let I T ∗ ( A ) be the indicator function deﬁned as I T ∗ ( A ) =  , if A is a clan of T ∗ ,0 , otherwise.Then the probability that clan A is contained in a random unrooted tree sampled according to P u is P u ( A ) = (cid:229) T ∗ ∈ T ∗ X P u ( T ∗ ) I T ∗ ( A ) . Note that the the clan probability deﬁned as above can be extended to a collection of subsets in a natural way, that is, wehave P u ( A , . . . , A m ) = (cid:229) T ∗ ∈ T ∗ X P u ( T ∗ ) (cid:0) I T ∗ ( A ) · · · I T ∗ ( A m ) (cid:1) . As a generalization of Lemma 6.1 in Zhu et al (2011), the following technical result relates clan probabilities toclade probabilities. lades and clans: a comparison study of two evolutionary models 15

Lemma 4

Suppose that P is a probability measure on T X and P u is the probability measure on T ∗ X induced by P . Thenfor a nonempty subset A ⊂ X, we have P u ( A ) = P ( A ) + P ( X \ A ) − P ( A , X \ A ) . Proof

It is well-known (see, e.g., Lemma 6.1 in Zhu et al (2011)) that for a rooted binary tree T , a set A is a clan of r − ( T ) if and only if either A is a clade of T or X \ A is a clade of T . Now the lemma follows from the deﬁnitions andthe inclusion-exclusion principle. (cid:3) Now we proceed to studying the clan probabilities under the YHK and PDA models. To begin with, recall that theprobabilities of an unrooted tree T ∗ ∈ T ∗ X under the YHK and PDA models are P uYHK ( T ∗ ) = (cid:229) T ∈ r ( T ∗ ) P YHK ( T ) and P uPDA ( T ∗ ) = (cid:229) T ∈ r ( T ∗ ) P PDA ( T ) , where r ( T ∗ ) denotes the set of rooted trees T in T X with T ∗ = r − ( T ) .By the deﬁnition of clan probabilities, we have P uYHK ( A ) = (cid:229) T ∗ ∈ T ∗ X P uYHK ( T ∗ ) I T ∗ ( A ) , and P uPDA ( A ) = (cid:229) T ∗ ∈ T ∗ X P uPDA ( T ∗ ) I T ∗ ( A ) . It can be veriﬁed, as with the case of clade probabilities, that the exchangeability property of P uYHK and P uPDA impliesthat both P uYHK ( A ) and P uPDA ( A ) depend only on the size a = | A | , not on the particular elements in A . Therefore, wewill denote them as p ∗ n ( a ) and q ∗ n ( a ) , respectively.By Lemma 4, we can derive the following formulae to calculate clan probabilities under the two models, the ﬁrst ofwhich is established in Zhu et al (2011). Note that the second formula reveals an interesting relationship between clanprobability and clade probability under the PDA model. Intuitively, it is related to the observation that there exists abijective mapping from T X to T ∗ Y with Y = X ∪ { y } for some y X that maps each rooted tree T in T X to the uniquetree in T ∗ Y obtained from T by adding the leaf y to the root of T . Theorem 11

For a < n, we havep ∗ n ( a ) = n h a ( a + ) + ( n − a )( n − a + ) − ( n − ) n i(cid:18) na (cid:19) − ; (15) q ∗ n ( a ) = j ( a ) j ( n − a + ) + j ( n − a ) j ( a + ) − j ( a ) j ( n − a ) j ( n ) (16) = j ( a ) j ( n − a ) j ( n − ) = q n − ( a ) . Proof

Since the ﬁrst equation is established in Zhu et al (2011), it remains to show the second one. The ﬁrst equalityfollows from Lemma 4 and Theorem 3. To establish the second equality, it sufﬁces to see that j ( n − )[ j ( a ) j ( n − a + ) + j ( n − a ) j ( a + )]= j ( n − ) j ( a ) j ( n − a )[( n − a − ) + ( a − )]= j ( n − )( n − ) j ( a ) j ( n − a )= ( j ( n ) + j ( n − )) j ( a ) j ( n − a ) . (cid:3) . . . . . . . . . p ∗ n ( a ) / q ∗ n ( a ) Fig. 4

Plot of the ratio p ∗ n ( a ) / q ∗ n ( a ) with n =

30 and a = ,... , Recall that in Theorem 2 and 4 we show that the sequence { p n ( a ) } a < n and { q n ( a ) } a < n are log-convex. Thetheorem below establishes a similar result for clan probabilities. Theorem 12

For n > , the sequence { p ∗ n ( a ) } a < n and { q ∗ n ( a ) } a < n are log-convex. Moreover, we have (i) p ∗ n ( a ) = p ∗ n ( n − a ) and q ∗ n ( a ) = q ∗ n ( n − a ) for a < n. (ii) q ∗ n ( a + ) q ∗ n ( a ) when a > ⌊ ( n − ) / ⌋ − , and q ∗ n ( a + ) > q ∗ n ( a ) when a ⌈ ( n − ) / ⌉ .Proof Part (i) follows from Theorem 11. Since q ∗ n ( a ) = q n − ( a ) by Theorem 11, Part (ii) and that { q ∗ n ( a ) } a < n islog-convex follow from Theorem 4.It remains to show that { p ∗ n ( a ) } a < n is log-convex. To this end, ﬁx a number n >

3, and let y a = a ( a + ) for 1 a < n . Then clearly { y a } a < n is log-convex. This implies { y ′ a } a < n with y ′ a = y n − a is also log-convex. In addition,since 2 y a > y a + + y a − for 2 a n − { y ∗ a } a < n with y ∗ a = y a − n ( n − ) is log-convex as well. By Lemma 1, weknow { y ′ a + y ∗ a } a < n is log-convex. As { (cid:0) na (cid:1) − } a < n is log-convex, by Lemma 1 and Theorem 11 we conclude that { p ∗ n ( a ) } a < n is log-convex, as required. (cid:3) Next, we consider the relationships between clan probabilities under the two models. For instance, consider the ratioof p ∗ n ( a ) / q ∗ n ( a ) with n =

30 (see Figure 4. Then the ratios are symmetric about a =

15, which is consistent with Part(i) inTheorem 12. In addition, by the ﬁgure it is clear that, except for a = p ∗ n ( a ) = q ∗ n ( a ) =

1, the ratio is strictlydecreasing on [ , ⌊ n / ⌋ ] and is less than 1 when a is greater than a critical value. We shall show this observation holdsfor general n . To this end, we need the following technical lemma. Lemma 5

For n > , we have p ∗ n ( ⌊ n / ⌋ ) < q ∗ n ( ⌊ n / ⌋ ) .Proof For simplicity, let k = ⌊ n / ⌋ . To establish the lemma, we consider the following two cases. lades and clans: a comparison study of two evolutionary models 17 The ﬁrst case is when n is even, that is, n = k . Then we have p ∗ k ( k ) = k (cid:16) k ( k + ) − k ( k − ) (cid:17)(cid:18) kk (cid:19) − = (cid:16) k + − k − (cid:17)(cid:18) kk (cid:19) − = ( k − )( k + )( k − ) (cid:18) kk (cid:19) − , and a ( k ) : = q ∗ k ( k ) p ∗ k ( k ) = j ( k ) j ( k ) j ( k − ) (cid:18) kk (cid:19) ( k + )( k − ) ( k − )= ( k − ) ! ( k − ) ! ( k − ) ! ( k ) ! ( k − ) ! ( k − ) ! ( k − ) ! k ! k ! ( k + )( k − ) ( k − ) . Note that a ( ) = >

1, and a ( k ) is increasing for k >

3, because a ( k + ) a ( k ) = ( k − )( k + ) ( k + )( k − )( k − )( k − )( k + ) ( k + )= k + k − k − k + k + k + k − k − k + k + > , holds for k >

3. In other words, for k >

3, we have a ( k ) > q ∗ k ( k ) > p ∗ k ( k ) .The second case is when n is odd, that is, n = k +

1. Then we have p ∗ k + ( k ) = ( k + ) (cid:16) k ( k + ) + ( k + )( k + ) − k ( k + ) (cid:17)(cid:18) k + k (cid:19) − = k + k ( k + ) (cid:18) k + k (cid:19) − , and b ( k ) : = q ∗ k + ( k ) p ∗ k + ( k ) = j ( k ) j ( k + ) j ( k ) (cid:18) k + k (cid:19) k ( k + ) k + = ( k − ) ! ( k − ) ! ( k ) ! ( k + ) ! ( k + )( k − ) ! ( k − ) ! ( k − ) ! k ! ( k + ) ! ( k + ) . Now we have b ( ) = / >

1. In addition, b ( k ) is increasing for k > b ( k + ) b ( k ) = ( k − )( k + )( k + )( k + )( k + )( k + )( k + ) ( k + )( k − ) k ( k + )= k + k + k + k − k − k − k + k + k + k − k − k > k >

3. In other words, for k > n being odd, we also have b ( k ) > q ∗ k + ( k ) > p ∗ k + ( k ) .This completes the proof. (cid:3) Parallel to Theorem 5 which compares p n ( a ) and q n ( a ) , the following theorem provides a comparison between p ∗ n ( a ) and q ∗ n ( a ) . Theorem 13

For n > , there exists a number k ∗ ( n ) in ( , ⌊ n / ⌋ ) , such that p ∗ n ( a ) > q ∗ n ( a ) for a k ∗ ( n ) , andp ∗ n ( a ) < q ∗ n ( a ) for k ∗ ( n ) < a ⌊ n / ⌋ . Proof

For simplicity, let b : = n − a . Since we have p ∗ n ( ) = (cid:16) + n ( n − )( n − ) (cid:17) n − > ( n − ) > n − = q ∗ n ( ) , and p ∗ n ( ⌊ n / ⌋ ) < q ∗ n ( ⌊ n / ⌋ ) by Lemma 5, it sufﬁces to prove that g n ( a ) = p ∗ n ( a ) q ∗ n ( a ) is strictly decreasing on [ , ⌊ n / ⌋ ] . To this end, let f n ( a ) = a ( a + ) + b ( b + ) − n ( n − ) . From the deﬁnition of g n ( a ) and Theorem 11, we have g n ( a + ) g n ( a ) = f n ( a + ) f n ( a ) ( a + )( b − ) b ( a − ) , which is less than 1 for 2 a ⌊ n / ⌋ − b n ( a ) : = f n ( a ) b ( a − ) − f n ( a + )( a + )( b − ) > a ⌊ n / ⌋ − . (17)In the rest of the proof, we shall establish Eq. (17). To begin with, note that b n ( a ) = n − − ( a + ) n ( n − ) + a + an + a − na ( a + )( a + )+ a − n ( b − )( b + ) + a + n + ( b − ) b ( b + ) . (18)This implies b n ( ) = n − n − n + n − n ( n − )( n − )( n − )= n ( n − n − ) + ( n − ) n ( n − )( n − )( n − ) > n > b ( ) = / b ( ) = /

70 and n − n − > n >

8. In addition, we have b t + ( t ) = t + t − t ( t + )( t + ) + − t + t ( t + ) + t ( t + ) > t > b t + ( t ) = t − − t + + t + t − t ( t + )( t + ) − t + ( t + )( t + ) + ( t + )( t + )( t + )( t + )= ( t − )( t + ) + t − t ( t + )( t + )( t + ) > t >

2. Therefore, we have b n ( ⌊ n / ⌋ − ) > n > b n ( a ) is strictly decreasing, that is, b n ( a ) − b n ( a + ) > a ⌊ n / ⌋ −

1. Indeed, byEqn. (18) we have b n ( a ) − b n ( a + ) = n ( n − ) + a + an + a − na ( a + )( a + )( a + ) + a − an + n − n − ( b − )( b − ) b ( b + ) > n − n − + a ( b − )( b − ) b ( b + ) > . lades and clans: a comparison study of two evolutionary models 19 Here the ﬁrst inequality follows from a > a ⌊ n / ⌋ − ( n − ) / n − an > n , and the secondone from a > n >

6. This completes the proof. (cid:3)

We end this section with some correlation results about clan probabilities under the PDA model.

Theorem 14

Let A , . . . , A k be k disjoint (nonempty) subsets of X, and let m = | A | + · · · + | A k | . Then we have P uPDA ( A , . . . , A k ) = j ( n − m + k − ) (cid:213) ki = j ( | A i | ) j ( n − ) . Proof

Since P uPDA ( T ∗ ) = / j ( n − ) for each tree T ∗ in T X , it remains to compute the number of trees that have A , . . . , A k as clans is j ( n − m + k − ) (cid:213) ki = j ( | A i | ) . To this end, note that such a tree can be constructed in two steps:1. Build an unrooted tree on (cid:16) X \ S ki = A i (cid:17) ∪ { x , . . . , x k } , where x , . . . , x k are leaves not in X serving as “placeholders”used in the second step.2. Replace each x i with a tree in T A i .There are j ( n − m + k − ) different choices for a tree in the ﬁrst step, and there are (cid:213) ki = j ( a i ) different ways to replace x , . . . , x k by trees in T A , . . . , T A k . The claim then follows. (cid:3) Theorem 15

Let A and B be two subsets of X with a b, where a = | A | and b = | B | . Then we have P uPDA ( A , B ) =  j ( b ) j ( n − b ) j ( a ) j ( b − a ) j ( n − ) j ( b − ) , if A ⊆ B, j ( a ) j ( b ) j ( n − a − b + ) j ( n − ) , if A and B are disjoint, , otherwise.Proof The ﬁrst case follows by applying Theorem 11 twice; the second case follows from Theorem 14. (cid:3)

Corollary 3

Let A and B be two compatible subsets of X. Then we have P uPDA ( A , B ) > P uPDA ( A ) P uPDA ( B ) . Proof

Set a = | A | and b = | B | . By symmetry we may assume without loss of generality that a b holds. Since A and B are compatible, we have either A ∩ B = /0 or A ⊆ B .To establish the theorem for the ﬁrst case, note ﬁrst that n − a − b + n − b n − a n − j ( n − a − b + ) j ( n − ) > j ( n − a ) j ( n − a ) , and hence j ( a ) j ( b ) j ( n − a − b + ) j ( n − ) > (cid:16) j ( b ) j ( n − b ) j ( n − ) (cid:17)(cid:16) j ( a ) j ( n − a ) j ( n − ) (cid:17) . Together with Theorem 15, this shows that the theorem holds for the case A ∩ B = /0.For the second case, note that b − a n − a n − b − a b − n − j ( n − ) j ( b − a ) > j ( b − ) j ( n − a ) . and hence j ( b ) j ( n − b ) j ( a ) j ( b − a ) j ( n − b ) j ( n − ) j ( b − ) > (cid:16) j ( b ) j ( n − b ) j ( n − ) (cid:17)(cid:16) j ( a ) j ( n − a ) j ( n − ) (cid:17) . Together withTheorem 15, this shows that the theorem holds for the case A ⊆ B , as required. (cid:3) Clade sizes are an important genealogical feature in the study of phylogenetic and population genetics. In this paper wepresent a comparison study between the clade probabilities under the YHK and PDA models, two null models which arecommonly used in evolutionary biology.Our ﬁrst main result reveals a common feature, that is, the clade probability sequences are log-convex under bothmodels. This implies that compared with ‘mid-sized’ clades, very ‘large’ clades and very ‘small’ clades are more likelyto occur under these two models, and hence provides a theoretical explanation for the empirical result on the PDA modelobserved by Pickett and Randle (2005). One implication of this result is that in Bayesian analysis where the two nullmodels are used as prior distribution, the distribution on clades is not uninformative as bias is given to those whose sizesare extreme. Therefore, further considerations or adjustment, such as introducing a Bayes factor to account for the biason prior clade probabilities, is important to interpret posterior Bayesian clade supports.The second result reveals a ‘phase transition’ type feature when comparing the sequences of clade probabilities underthe two null models. That is, we prove that there exists a critical value k ( n ) such that the probability that a given cladewith size k is contained in a random tree with n leaves generated under the YHK model is smaller than that under thePDA model for 1 < k k ( n ) , and higher for all k ( n ) k < n . This implies that typically the trees generated under theYHK model contains relatively more ‘small’ clades than those under the PDA model.The above two results are also extended to unrooted trees by considering the probabilities of ‘clans’, the sets of taxathat are all on one side of an edge in an unrooted phylogenetic tree. This extension is relevant because in many treereconstruction approaches, the problem of ﬁnding the root is either ignored or left as the last step. Here we study thesequences formed by clan probabilities for unrooted trees generated by the two null models, and obtain several resultssimilar to those for rooted trees.Note that the two models studied here are special instances of the b -splitting model introduced by Aldous (1996), acritical branching process in which the YHK model corresponds to b = b = − .

5. Therefore,it would be of interest to study clade and clan probabilities under this more general model. In particular, it is interestingto see whether the relationships between two models revealed in this paper also hold for general b . Acknowledgements

We thank Prof. Kwok Pui Choi and Prof. Noah A. Rosenberg for simulating discussions and useful suggestions. We would alsolike to thank two anonymous referees for their helpful and constructive comments on the ﬁrst version of this paper.