[PDF] Mutation effects in ordered trees

Abstract

A mutation will affect an individual and some or all of its descendants. In this paper, we investigate ordered trees with a distinguished vertex called the mutator. We describe various mutations in ordered trees, and find the generating functions for statistics concerning trees with those mutations. The examples give new interpretations to several known sequences and also introduce many new sequences and their combinatorial interpretations.

Full PDF

aa r X i v : . [ m a t h . C O ] O c t Mutation eﬀects in ordered trees

Gi-Sang Cheon a ∗ , Hana Kim b † and Louis W. Shapiro c a Department of Mathematics, Sungkyunkwan University, Suwon 440-746, Rep. of Korea [email protected] b National Institute for Mathematical Sciences, 70 Yuseong-daero, 1689 beon-gil, Yuseong-gu,Daejeon 305-811, Rep. of Korea [email protected] c Department of Mathematics, Howard University, Washington, DC 20059, USA [email protected]

Abstract

A mutation will aﬀect an individual and some or all of its descendants. Inthis paper, we investigate ordered trees with a distinguished vertex called themutator. We describe various mutations in ordered trees, and ﬁnd the generatingfunctions for statistics concerning trees with those mutations. The examples givenew interpretations to several known sequences and also introduce many newsequences and their combinatorial interpretations.

AMS classiﬁcations : Primary: 05A15; secondary: 05C05

Key words : mutation, ordered trees, short lived mutation, toggle tree, right path tree.

A mutation is a change in the genome of an organism as well as a genotype thatexhibits high rates of mutation. It can result in several diﬀerent types of change in thenucleotide sequences. For instance, mutations in genes can either have no eﬀect, alterthe product of a gene, or prevent the gene from functioning properly or completely.These phenomena may be reﬂected in a family tree which is an example of ordered treewhere its subtrees are usually ordered by date of birth but could be ordered by someother attribute such as height. ∗ This work was supported by the National Research Foundation of Korea Grant funded by theKorean Government (NRF-2012-007648) † This research was supported by Basic Science Research Program through the National Re-search Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2013R1A6A3A03024342)

1n the present paper, we consider ordered trees with one distinguished vertex calledthe mutator . The vertices changed by the mutator are said to be of a new type . Weinclude the mutator itself as being of the new type. Ordered trees with a mutationmay reﬂect many biological or social structures where changes occur. There are anabundant literature on applications. For instance, if the mutator represents a geneticmutation in a family then the new type vertices are those carrying this mutation. If anordered tree represents a river network then a mutator could be a spot where pollutionhas been detected and the vertices above it could be the possible source of the pollution.It is well-known that the number of ordered trees with n edges is counted by the n th Catalan number C n = n +1 (cid:0) nn (cid:1) , and its generating function is C = P n ≥ C n z n = −√ − z z resulting from C = 1 + zC = − zC . In particular, the number of the orderedtrees with a distinguished vertex is counted by the n th central binomial coeﬃcient B n = (cid:0) nn (cid:1) with B = P n ≥ B n z n = √ − z . A key fact directly obtained from the upliftprinciple (see Proposition 2.1) provides the generating function B/C for the orderedtrees with a distinguished leaf (i.e. terminal vertex).The purpose of this paper is to investigate ﬁve kinds of ordered trees with mutationsaccording to some conditions on the children of the mutator. In particular, we postulatethat such conditions are given or can be explained in terms of generating functions. Wethen enumerate the numbers of such trees and their vertices as well as vertices of eachnew type. The asymptotic behavior of those numbers will be also discussed. Finallyin Section 3 we change from ordered trees to complete binary trees.

Throughout this section, a tree means an ordered tree. Let us begin with trees aﬀectedby a mutation. By the children we shall mean the vertices directly connected to themutator, and by the descendants we shall mean all subsequent vertices above themutator. There are a variety of types of conditions that we could set for the childrenor the descendants of the mutator.At one extreme the mutation causes sterility. In this case the generating functionfor the number of trees is T M := L = B/C and we have only one new type vertex,the mutator per tree. At the other extreme all the descendants of the mutator are ofthe new type. As an intermediate case once a new type child appears all subsequentdescendants are new type. This could model a case where the mutator is a person whomoves to a new country. Other possibilities include exactly one new and sterile child.We also consider what happens if the new type vertices are all on the right most branchor on the right most path.There are many further variations possible. In one direction we could look atdiﬀerent kinds of trees such as complete (or incomplete) binary trees, Motzkin 0-1-2trees, even trees, Riordan trees, and spoiled child trees. A second direction wouldallow a variety of new types, and a third would allow more than one mutator. All ofmutation possibilities can be considered with these variations, but to keep this paperfocused we discuss these only brieﬂy.In this section, we consider ﬁve kinds of trees with mutations arising from diﬀerentconditions on the children or descendants. We compute the number of vertices of each2ew type by the following procedure. First we ﬁnd the generating function for suchtrees, designate this generating function as T M . Since each tree with n edges has n + 1vertices, the generating function for the vertices is the derivative ( zT M ) ′ . We then ﬁndthe generating function V N for vertices of the new type. In this step, the uplift principlecan be used for transferring results established at the root to an arbitrary vertex. Wenote that a mutator does not change the conditions on its children wherever it appearsas is usual. Proposition 2.1 (The uplift principle, [1])

First, ﬁnd the generating function forwhatever is being counted at the root. Then uplift the result at the root to an arbi-trary vertex by multiplying by the leaf generating function L = B/C . Next, we compute the proportion of new type vertices among all the vertices. Forthis step we use a few asymptotic results depending on Stirling’s approximation or theratio test. These are(i) (cid:0) nn (cid:1) ∼ n √ πn or 4 n ∼ (cid:0) nn (cid:1) √ πn , (cid:0) nn (cid:1) ∼ (cid:0) n − n − (cid:1) ;(ii) C n ∼ C n − , B n ∼ B n − ;We also adopt a singularity analysis for few complicated cases. In addition here aresome other facts we will use:(i) L = B/C , C ′ = BC , B ′ = 2 B ;(ii) B = 1 + 2 zBC = − zC = C − zC so that B − = zBC ;(iii) [ z n ] C s = s n + s (cid:0) n + sn (cid:1) and [ z n ] BC s = (cid:0) n + sn (cid:1) .where [ z n ] is the coeﬃcient extraction operator.Let us now describe ﬁrst example of ordered trees with a mutation under the vari-ation of an extreme case. Here the rightmost child of the mutator is of the new type.The mutator has no more children of the new type and the child of the new type has nochildren. In this sense, we call such mutation the short lived mutation . As an exampleif a male donkey mates with a female horse, it stops reproducing and the child, a mule,is sterile. The horse, traumatized by the experience, has no more oﬀspring.Every such tree has exactly two vertices of the new type so the question of interestis the number of such trees. The generating function is L · C · z = BC · C · z = zB = z √ − z = z + 2 z + 6 z + 20 z + · · · ( A . To illustrate with n = 3 edges, we consider the ordered trees with a short lived mutatorand three edges. There are 6 trees as shown in Figure 1. From now on, ‘x’ denotes theroot and the mutator is circled. Along with the mutator, we will mark the edges abovethe vertices of the new type as this is easier to see.We note that the total number of vertices is ( n + 1) (cid:0) n − n − (cid:1) for n ≥ n +1) ( n − n − ) approaches 0 when n gets larger as is obvious.3 b b b b bb bb b bb b b bb b b Figure 1. Trees with the short lived mutation

However, if the horse should resume an active social life the generating function for thetrees becomes zBC .For the second example we look at toggle trees . Once the mutator has a child of thenew type, all the later descendants are also new type. In other words, the ﬁrst child ofthe mutator of the new type plays a role of a ‘toggle’ that divides all children of themutator into two groups, those on the left are normal, those on the right are new type.

Theorem 2.2

The number of toggle trees with n edges is (cid:0) n +1 n (cid:1) . In particular, theproportion of new type vertices is asymptotically p πn . Proof.

The number of toggle trees with a mutator at the root has the generatingfunction C , where each of the ﬁrst family and the new family contributes a C .If we allow a mutator to be anywhere, applying the uplift principle gives T M = BC · C = BC = X n ≥ (cid:18) n + 1 n (cid:19) z n = 1 + 3 z + 10 z + 35 z + · · · ( A . Then the generating function for vertices is( zT M ) ′ = (cid:18) B − (cid:19) ′ = B = X n ≥ (2 n + 1) (cid:18) nn (cid:19) z n . To count new type vertices we see that if the mutator is at the root, we have C · ( zC ) ′ = CB possibilities with C for the pretoggle subtree and ( zC ) ′ = B counting the newtype vertices. Multiplying by L = B/C allows the mutator to be anywhere and ourgenerating function is V N = BC · CB = B = 1 + 4 z + 16 z + 64 z + · · · .To estimate the proportion of new type vertices, we apply the Stirling’s approxi-mation and get [ z n ] B [ z n ] B = 4 n (2 n + 1) (cid:0) nn (cid:1) ∼ √ πn (cid:0) nn (cid:1) (2 n + 1) (cid:0) nn (cid:1) ∼ r πn . (1)Figure 2 illustrates the ten toggle trees with 30 vertices of which 16 are of thenew type. The result (1) is reasonable since a mutator high up will usually have fewdescendants. If, by way of contrast, we specify that the mutator be at height 1 thenthe number of toggle trees is counted by zC = X n ≥ n + 3 (cid:18) n + 1 n − (cid:19) z n = z + 4 z + 14 z + 48 z + 165 z + · · · . b bb bbbb bb b b b b b b b b b b Figure 2. Toggle trees with 16 new type vertices.

The number of vertices has the generating function ( z C ) ′ = 2 zC B = 2 P n ≥ (cid:0) n +2 n − (cid:1) z n =2 z + 12 z + 56 z + 240 z + 990 z + · · · . The generating function for vertices of the newtype is zC B = X n ≥ (cid:18) n + 1 n − (cid:19) z n = z + 5 z + 21 z + 84 z + · · · . For instance, there are 4 ·

14 = 56 vertices of which 21 vertices are of the new type, seeFigure 3. bb b b b bb bb b b bb b bbbb bb b bb b b bb b bb b bb b bb b b b b b b Figure 3. Toggle trees with a mutator at height 1.

We note that the proportion of new type vertices in this height 1 case is (cid:0) n +1 n − (cid:1) n +4 n +3 (cid:0) n +1 n − (cid:1) = n + 34 n + 4 → . In a toggle tree, if a child of the new type of the mutator appears, all later oﬀspringof the mutator and their descendants are of the new type. Suppose instead that everychild of the mutator and recursively every child of a new type vertex has a 50% chanceof being new type. We call such trees embedded new type (ENT) trees since the resultis a tree with a subtree of new type vertices. This concept coincides with an autosomaldominant mutation when we assume that we only keep track of genetic history ofa single family, and every member met a spouse not having a mutant gene so thatappearance of a mutation only depends on a member of the family. Figure 4 illustratesthe 12 possible trees with 2 edges. bb b b b bbb b b b b b bbb bb bb bb b b

Figure 4. The embedded new type trees with 2 edges.

Theorem 2.3

The number of embedded new type trees with n edges is P nk =0 1 k +1 (cid:0) nn − k (cid:1)(cid:0) kk (cid:1) .In particular, the proportion of new type vertices is asymptotically . roof. Let T be the generating function for ENT trees having a mutator at the root.If a mutator has k children, there are 2 k possible distributions of the mutation over thechildren where each normal child and child of the new type are the roots of subtreesdescribed by C and T , respectively. It then follows that the generating function forENT trees having a mutator at the root of updegree k is z k ( C + T ) k . So T satisﬁes T = 11 − z ( C + T )and solving the functional equation gives T = 1 − √ − C zC = 1 + 2 z + 7 z + 29 z + 131 z + · · · (A007852) . A simple computation shows T = C · ( C ◦ zC ). By the uplift principle, T M = BC · T = B · ( C ◦ zC ) = 1 + 3 z + 12 z + 52 z + 236 z + · · · . This is (A007856) in the OEIS [4] and is also known [3] to count the number of subtreesin ordered trees with n edges. It can be shown that [ z n ] T M = P nk =0 C k (cid:0) nn − k (cid:1) . Thesingular expansion of C at the dominant singularity z = of C ◦ zC gives [ z n ] T M ∼ √

159 1 √ πn (cid:0) (cid:1) n , which implies the asymptotic number of vertices of ENT trees with n edges is √

159 1 √ πn (cid:0) (cid:1) n .In order to count the vertices of the new type, ﬁrst consider ENT trees with amutator at the root. Let ˜ V N be the generating function for such ENT trees where wehave marked one of the new type vertices. Suppose the root degree is k of which j are new type. The marked vertex, if not the root itself, is in one of the j new typesubtrees. The other j − V N = T + z ˜ V N + 2 z ˜ V N ( C + T ) + 3 z ˜ V N ( C + T ) + · · · = T + z ˜ V N (1 − z ( C + T )) with T counting the case where the marked vertex is the root. Solving the functionalequation yields V N = L · ˜ V N = BC · T · (cid:18) − z (1 − z ( C + T )) (cid:19) − = s − z + 2 √ − z (4 − z )(1 − z )= 1 + 4 z + 20 z + 106 z + 580 z + · · · . It follows from V N ∼ √ (cid:0) − z (cid:1) − / that [ z n ] V N ∼ √

153 1 √ πn (cid:0) (cid:1) n . Thus the propor-tion of vertices of the new type is asymptotically equal to √

153 1 √ πn (cid:0) (cid:1) n √

159 1 √ πn (cid:0) (cid:1) n = 35 .

6n fact, we see for example that[ z ] V N [ z ]( zT M ) ′ = 6427842461221730919576097619275814663201086960365349865718238126127455484769220 ; . . One instance where ENT trees occur is classical. These were family names as inEngland. The mutator, often the root, passes his name to his male children (thesewould be new type) whereas the female children would not carry on the family name,see [5].In a right branch new type (RBNT) tree , the rightmost branch from the mutatoris nontrivial and all the vertices of the new type constitute this branch including themutator. bbb b bb bbb b bb b b

C Cz L

Figure 5. A right branch new type tree with a mutator.

Theorem 2.4

The number of right branch new type trees with n edges is (cid:0) n − n (cid:1) . Inparticular, the proportion of new type vertices is asymptotically p πn . Proof.

The generating function for RBNT trees follows from Figure 5: T M = L · C · zC = BC · C · zC = zBC = B −

12 = X n ≥ (cid:18) n − n (cid:19) z n . The number of vertices involved are then counted by( zT M ) ′ = B −

12 + zB = X n ≥ ( n + 1) (cid:18) n − n (cid:19) z n = 2 z + 9 z + 40 z + 175 z + · · · (2)for (A097070). It also counts the number of parts equal to 1 over all weak compositionsof n + 1 into n + 1 parts. Since the right branch from the mutator contains all verticesof the new type, it follows from Figure 5 that V N = L · C · ( z ( zC )) ′ = B · ( z C ) ′ = B · (2 zC + z BC ) = 2 + 7 z + 26 z + 99 z + · · · . This is (A114121) in [4] except for the initial term. We would like to ﬁnd the proportionof new type vertices among all the vertices. For n ≥

1, the denominator is [ z n ]( zT M ) ′ =( n + 1) (cid:0) n − n (cid:1) = n +12 (cid:0) nn (cid:1) and the numerator is[ z n ] V N = [ z n ](2 zBC + z B C ) = [ z n ] B − (cid:18) B − (cid:19) ! = (cid:18) nn (cid:19) + 14 (cid:18) n − (cid:18) nn (cid:19)(cid:19) = 4 n − + 12 (cid:18) nn (cid:19) . n − + (cid:0) nn (cid:1) n +12 (cid:0) nn (cid:1) ∼ √ πn + n +12 ∼ r πn . Another intermediate case of mutations in trees is obtained by assuming that thevertices of the new type for trees go from the mutator to the rightmost leaf above themutator. For instance, Figure 6 shows a tree with 5 vertices of the new type. We calltrees involving such mutation the right path trees . bb bbb b bb bb b b bb Figure 6. A right path tree with 5 new type vertices.

Theorem 2.5

The number of right path trees with n edges is (cid:0) nn (cid:1) . In particular, theproportion of new type vertices is asymptotically n . Proof.

It can be easily seen that the generating function for right path trees is T M = L · (1 + zC + z C + · · · ) = BC · − zC = BC · C = B. Hence the number of vertices in right path trees is counted by ( zB ) ′ = B + 2 zB = P n ≥ ( n + 1) (cid:0) nn (cid:1) z n = 1 + 4 z + 18 z + 80 z + · · · . Next we want to compute the numberof new type vertices. If the right path from the mutator has length k , we have k + 1new type vertices and the generating function for vertices ( k + 1) z k C k since a subtreecan be attached to the left of all vertices except the last. Including the location of themutator and summing over k gives the generating function V N = L · (1 + 2 zC + 3 z C + · · · ) = BC · − zC ) = BC · C = BC = X n ≥ (cid:18) n + 1 n (cid:19) z n . Thus the proportion of new type vertices is (cid:0) nn (cid:1) ( n + 1) (cid:0) nn (cid:1) = 12( n + 1) ∼ n . right path ∗ mutation.The number of right path ∗ trees has the generating function L · ( zC + z C + · · · ) = BC · zC − zC = zB · − zC = zBC = X n ≥ (cid:18) n − n (cid:19) z n , which is the same as T M of right branch new type trees. So the generating function forvertices is given by (2).For the number of vertices of the new type we have L · (2 zC + 3( zC ) + · · · ) = BC · (cid:18) − zC ) − (cid:19) = BC · ( C −

1) = BC − BC = X n ≥ n + 12 n + 2 (cid:18) nn (cid:19) z n = 2 z + 7 z + 25 z + 91 z + · · · . This is (A097613) in [4] except for the initial term. The proportion of new type verticesout of all vertices is n +12 n +2 (cid:0) nn (cid:1) n +12 (cid:0) nn (cid:1) = 3 n + 1( n + 1) ∼ n . Here is an illustration for the 10 right path ∗ trees on 3 edges. There are 40 vertices ofwhich 25 are of the new type. bb b bb bb b bbbb bb b bb b bb bbbbb b b b b b Figure 7. Right path trees with a mutator having at least one child.

The following table summarizes the main results in this section.Mutation Number of trees Asymptotic ratio ofwith the mutation vertices of the new typeShort lived (cid:0) n − n − (cid:1) / (( n + 1) (cid:0) n − n − (cid:1) )Toggle (cid:0) n +1 n (cid:1) p πn Embedded new type P nk =0 1 k +1 (cid:0) nn − k (cid:1)(cid:0) kk (cid:1) Right branch new type (cid:0) n − n (cid:1) p πn Right path (cid:0) nn (cid:1) n Right path ∗ (cid:0) n − n (cid:1) n Another class of ordered trees

What if a mutation occurs in another class of ordered trees instead of the usual orderedtrees? We end this paper with the complete binary trees with a simple mutation inwhich a mutator changes all of its descendants to the new type. A complete binarytree has every vertex with updegree 0 or 2. It is known that there are C n completebinary trees with n internal vertices. To have the coeﬃcient of z n count edges insteadof internal vertices, the appropriate generating function is P n ≥ C n z n = C ( z ) = −√ − z z . This change brings about various other small changes. By setting ˜ C ( z ) =˜ C = C ( z ) and ˜ B ( z ) = ˜ B = B ( z ), we have ˜ C ′ = 2 z ˜ B ˜ C , ˜ B ′ = 4 z ˜ B and ˜ B =1 + 2 z ˜ B ˜ C . The vertex and leaf generating functions V and L are V = ( z ˜ C ) ′ = ˜ C · (1 + 2 z ˜ B ˜ C ) = ˜ C ˜ B and L = ˜ C ˜ B ˜ C = ˜ B. Let T M be the generating function for complete binary trees with a mutator, whereall the descendants of the mutator are of the new type. Then T M = V = ˜ C ˜ B =1 + 3 z + 10 z + 35 z + · · · . The number of vertices of these mutator enhanced treeshas the generating function˜ V = ( z ˜ T ) ′ = ˜ T + z ˜ T ′ = ˜ B ˜ C + z ( ˜ B ˜ C ) ′ = ˜ B ˜ C + z ( ˜ B · z ˜ B ˜ C + 4 z ˜ B ˜ C )= ˜ B ˜ C · (1 + 2 z ˜ B ˜ C + 4 z ˜ B ) = ˜ B ˜ C · (cid:18) ˜ B + 4 z − z (cid:19) = X n ≥ (2 n + 1) C n z n = 1 + 9 z + 50 z + 245 z + · · · . This is a new entry in the OEIS [4].To ﬁgure out the number of new type vertices, we apply the uplift principle againand get the generating function L ˜ V = ˜ B · ˜ B ˜ C = ˜ B ˜ C = X n ≥ (cid:18) n − − (cid:18) nn (cid:19)(cid:19) z n = 1 + 5 z + 22 z + 93 z + · · · . The proportion of new type vertices is[ z n ] ˜ B ˜ C [ z n ] ˜ V = 2 n − − (cid:0) nn (cid:1) (2 n + 1) n +1 (cid:0) nn (cid:1) ∼ (cid:0) nn (cid:1) √ πn − (cid:0) nn (cid:1) (2 n +1) n +1 (cid:0) nn (cid:1) = √ πn − n +1) n +1 ∼ r πn . Complete binary trees and ordered trees are both examples of trees satisfying auniform updegree requirement. The methods used in this paper generalize to all suchtrees but that will be the subject of a separate manuscript.

References ∼∼