[PDF] A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Abstract

We present the first fixed-parameter algorithm for constructing a tree-child phylogenetic network that displays an arbitrary number of binary input trees and has the minimum number of reticulations among all such networks. The algorithm uses the recently introduced framework of cherry picking sequences and runs in O((8k ) k poly(n,t)) time, where n is the number of leaves of every tree, t is the number of trees, and k is the reticulation number of the constructed network. Moreover, we provide an efficient parallel implementation of the algorithm and show that it can deal with up to 100 input trees on a standard desktop computer, thereby providing a major improvement over previous phylogenetic network construction methods.

Full PDF

AA Practical Fixed-Parameter Algorithm for ConstructingTree-Child Networks from Multiple Binary Trees ∗ Leo van Iersel † Remie Janssen † Mark Jones † Yukihiro Murakami † Norbert Zeh ‡ July 22, 2019

Abstract

We present the ﬁrst ﬁxed-parameter algorithm for constructing a tree-child phylogenetic networkthat displays an arbitrary number of binary input trees and has the minimum number of reticulationsamong all such networks. The algorithm uses the recently introduced framework of cherry pickingsequences and runs in O (( k ) k poly ( n , t )) time, where n is the number of leaves of every tree, t isthe number of trees, and k is the reticulation number of the constructed network. Moreover, weprovide an efﬁcient parallel implementation of the algorithm and show that it can deal with up to 100input trees on a standard desktop computer, thereby providing a major improvement over previousphylogenetic network construction methods. Evolutionary histories are usually described by phylogenetic trees or networks. A phylogenetic treedescribes how a collection of studied taxa (e.g., species, strains or languages) have evolved over timeby divergence events, often also called speciation events. A phylogenetic network can additionallydescribe events where lineages merge, such as hybridization or lateral gene transfer, which are called reticulation events. A central goal of computational phylogenetics is to develop methods for reconstructingphylogenetic networks from various types of inputs.One of the most fundamental problems in this area, H

YBRIDIZATION N UMBER , is to ﬁnd a phylogeneticnetwork with the minimum number of reticulation events among all networks that contain a givencollection of phylogenetic trees. The network is said to display each of the input trees. Each of thesetrees represents the evolution, through speciation events and mutation, of a particular gene. Reticulationevents such as hybridization or lateral gene transfer can lead to discordance between gene trees. Therequirement that each gene tree should be contained in the constructed network ensures that the networkprovides the required paths along which each gene could be passed from ancestors to descendants in amanner consistent with its gene tree. Following the parsimony principle, a network with the minimumnumber of reticulations that displays all inputs trees offers a simplest possible model of the evolution of aset of taxa consistent with the given gene trees. Hence the goal to compute a phylogenetic network withas few reticulations as possible. Since not all discordance between gene trees is due to reticulation events,such a network provides only an estimate of the actual number of reticulation events. Nevertheless, ∗ Leo van Iersel, Remie Janssen, Mark Jones and Yukihiro Murakami were supported by the Netherlands Organization forScientiﬁc Research (NWO), including Vidi grant 639.072.602, and van Iersel also by the 4TU Applied Mathematics Institute.Norbert Zeh was supported by the Natural Sciences and Engineering Research Council of Canada. † Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE, Delft, TheNetherlands, {L.J.J.vanIersel,R.Janssen-2,M.E.L.Jones,Y.Murakami}@tudelft.nl . ‡ Faculty of Computer Science, Dalhousie University, 6050 University Ave, Halifax, NS B3H 1W5, Canada, [email protected] . a r X i v : . [ c s . D M ] J u l ybridization networks have proven to be a valuable tool in the study of the evolution of different setsof taxa. Computing hybridization networks with the minimum number of reticulations, however, hasproven to be a major challenge.Initial research focused on the special case that the input consists of only two trees, in which casethere exists a nice mathematical characterization of the problem in terms of maximum agreementforests (MAFs) [ ] . This characterization has shown to be extremely useful for the development ofﬁxed-parameter algorithms for phylogenetic network construction problems on two trees [

7, 9, 19 ] , withthe currently fastest algorithm for H YBRIDIZATION N UMBER running in O ( k n ) time [ ] .When the input consists of more than two trees, the problem becomes signiﬁcantly harder. Ker-nelization is still possible [

12, 13 ] . However, existing algorithms for solving kernelized instances,T REETISTIC [ ] , PIRN [ ] , PIRNs [ ] and H YBROSCALE [

1, 2 ] , are limited to (very) small numbersof input trees and / or (very) small numbers of reticulation events. None of these algorithms is ﬁxed-parameter tractable (FPT) unless combined with kernelization. A bounded-search FPT algorithm withrunning time O ( c k poly ( n )) for the special case of three input trees was proposed in [ ] ( n is the numberof taxa, k the number of reticulations), but the constant c is much too big for the algorithm to be usefulin practice.The main bottleneck hindering the development of practical algorithms seemed to be the missingmathematical characterization for the problem on more than two trees, analogous to the MAF charac-terization for two trees. Such a characterization, in terms of cherry picking sequences, was developedrecently and is very different from the MAF characterization for two trees. The ﬁrst characterization interms of cherry picking sequences was developed for the restricted class of temporal networks [ ] . Sub-sequently, it was generalized to the larger class of tree-child networks [ ] , in which each non-leaf vertexis required to have at least one non-reticulate child. However, Humphries, Linz, and Semple [ ] provideonly a theoretical FPT result based on kernelization for temporal networks, and Linz and Semple [ ] donot present any algorithmic results. Hence, the ﬁxed-parameter tractability of the tree-child version ofH YBRIDIZATION N UMBER remained open, as well as the development of practical FPT algorithms basedon the new characterization.Our contribution is to ﬁll this algorithmic gap. We show that there exists an FPT algorithm forH

YBRIDIZATION N UMBER restricted to tree-child networks on an arbitrary collection of binary input trees.Its running time is O (( k ) k · poly ( n , t )) , where n is the number of taxa, t is the number of trees, and k is the number of reticulations in the computed network. We verify experimentally that, combinedwith two heuristic improvements that both preserve the correctness of the algorithm, it can solve fairlycomplex instances of tree-child H YBRIDIZATION N UMBER . These two heuristics are cluster reduction [ ] and a redundant branch elimination technique introduced in this paper. The implementation used in ourexperiments is available from https://github.com/nzeh/tree_child_code .The main practical beneﬁt of our algorithm is that it can handle many more input trees than existingmethods. Indeed, in experiments on synthetic inputs, the running time grows roughly linearly in thenumber of trees and taxa. On the other hand, the running time still has a large exponential dependencyon the number of reticulation events k . Nevertheless, as long as k is small (at most 7–12), our algorithmcan solve inputs with up to 100 input trees and 200 taxa. In our experiments on real-world data, weobserved that these data sets have substantially more structure than random synthetic data sets, whichmakes cluster reduction and redundant branch elimination more effective and allowed our algorithmto solve inputs with up to 8 trees and 50 reticulations. As the number of trees increases, however, theinputs become less “clusterable”, which reduces the number of reticulations our algorithm can handle.We also compared our algorithm directly to H YBROSCALE . For instances consisting of two input trees,H

YBROSCALE is much faster because it exploits the MAF characterization for this case. When the numberof input trees is at least three, our algorithm turns out to be much faster than H

YBROSCALE , which couldhandle only very few instances with more than ﬁve trees.2e restrict our attention to tree-child networks for two reasons. First, although Linz and Semple [ ] also provided a characterization of unrestricted hybridization networks in terms of cherry pickingsequences, this characterization is based on adding leaves; since it is not known where to add these leaves,this characterization does not seem to be directly useful for developing FPT algorithms. Furthermore,we observed in our experiments that the optimal tree-child network for a set of trees often has the samenumber of reticulations as an optimal unrestricted hybridization network. Hence, the restriction totree-child networks allows us to deal with larger numbers of input trees without changing the problemsubstantially.The remainder of this paper is organized as follows: Section 2 formally deﬁnes the key conceptsincluding the H YBRIDIZATION N UMBER and T

REE -C HILD H YBRIDIZATION problems. Section 3 presents ourFPT algorithm for T

REE -C HILD H YBRIDIZATION . Section 4 presents our redundant branch eliminationheuristic for speeding up the algorithm in practice. This section also shows that redundant branchelimination preserves the correctness of the computed cherry picking sequence. Section 5 presents somedetails of our implementation of the algorithm and discusses our experimental results. We present someconcluding remarks in Section 6.

Throughout this paper, we denote by X a ﬁnite non-empty set of taxa. A phylogenetic network on a subsetX (cid:48) ⊆ X is a directed acyclic graph N whose nodes satisfy the following properties: There is a single nodeof in-degree 0 and out-degree 2, called the root ; the nodes of in-degree 1 and out-degree 0 are bijectivelylabelled with elements from X (cid:48) (the leaves ); all other nodes either have in-degree 1 and out-degree 2(the tree nodes ) or have out-degree 1 and in-degree at least 2 (the reticulations ). This is illustrated inFigure 1a. A phylogenetic tree on X (cid:48) is a phylogenetic network on X (cid:48) without reticulations; see Figure 1b.Given a directed edge uv in a phylogenetic network or tree, we say that u is a parent of v and v is a child of u . If | X (cid:48) | = X (cid:48) consists of a single node labelled with theunique element of X (cid:48) .For brevity, we usually refer to phylogenetic networks and phylogenetic trees as networks and trees ,respectively. When we feel the need to state the label set X (cid:48) of a phylogenetic tree explicitly, especiallywhen we want to emphasize that a set of trees all share the same leaf set, we do refer to this tree as an X (cid:48) -tree .Given a directed edge uv in a network N , we call uv a reticulation edge if v is a reticulation; otherwise, uv is a tree edge . A tree path in N is a directed path composed of only tree edges. A tree path is shownin red in Figure 1a. The reticulation number of N is the number of reticulation edges in N minus thenumber of reticulations. Alternatively, the reticulation number is the number of edges that need to bedeleted from the network to obtain a tree.The restriction of an X -tree T to a subset X (cid:48) ⊆ X is the smallest subtree of T that contains all edgeson paths between leaves in X (cid:48) . If T is an X -tree and T (cid:48) is the restriction of T to some subset X (cid:48) ⊆ X , wewrite T (cid:48) ⊆ T . We also write T \ T (cid:48) to denote the difference X \ X (cid:48) of the label sets of the two trees.Let N (cid:48) be a subgraph (e.g., a path) of the network N . Any edge uv ∈ N such that u ∈ N (cid:48) and v / ∈ N (cid:48) is called a pendant edge of N (cid:48) ; v is a pendant node of N (cid:48) . When N is a tree, we say the subtree rooted at v is a pendant subtree of N (cid:48) . Remark.

We note that phylogenetic networks as deﬁned in this paper have out-degree at most 2 onall nodes. This is consistent with the deﬁnitions used by Linz and Semple [ ] . As noted by Linz andSemple, restricting network nodes to have out-degree at most 2 does not result in any loss of generality.3 b c de (a) a b c de a b c de a b c de a b c de (b) Figure 1: (a) A phylogenetic network that is not tree-child because both children of the red node arereticulations. Its reticulation number is 2. A tree path from the root to the leaf labelled a is shown inred. (b) The four phylogenetic trees displayed by the network in (a). For example, the ﬁrst tree can beobtained by deleting the dotted edges in (a). The red and black edges constitute an embedding of thistree into the network.In particular, for the problems discussed in this paper, any instance that has a network with out-degreegreater than 2 as a solution also has a network with out-degree at most 2 as a solution.While phylogenetic trees may in general have unbounded out-degree, we require phylogenetic treesto have maximum out-degree 2 in this paper, that is, we restrict our attention to what are normally called“binary” trees. It is an open question whether our algorithm can be extended to input trees of unboundedout-degree. We note that Linz and Semple’s result relating tree-child networks to tree-child sequencesimposes no restriction on the out-degree of phylogenetic trees but does not offer any algorithm to ﬁndan optimal tree-child sequence or network even for binary trees. Given a network N on a set of taxa X and a tree T on a subset X (cid:48) ⊆ X , we say that N displays T if T can be obtained from a subgraph of N by suppressing nodes of out-degree and in-degree 1 (a node v with out-degree and in-degree 1 is suppressed by deleting v and replacing the edges uv and vw with thesingle edge uw ). Equivalently, N displays T if there exists a function f , called an embedding of T into N ,that maps nodes of T to nodes of N , and edges of T to directed paths in N , such that • Every leaf of T is mapped to the leaf of N with the same label; • For each edge uv in T , the path f ( uv ) is a directed path in N from f ( u ) to f ( v ) ; and • For any two distinct edges e and e (cid:48) of T , the paths f ( e ) and f ( e (cid:48) ) are edge-disjoint.For any embedding f and any node or edge x , we call f ( x ) the image of x (under f ). This deﬁnitionextends naturally to arbitrary subgraphs T (cid:48) ⊆ T by deﬁning the image f ( T (cid:48) ) of T (cid:48) to be the union ofthe images of all nodes and edges in T (cid:48) . For a set of trees T = { T , . . . , T t } , we say that N displays T if N displays every tree T i ∈ T . For example, the network in Figure 1a displays all trees in Figure 1b. Anembedding of the ﬁrst tree into the network is shown.The M INIMUM H YBRIDIZATION problem takes as input a set T of phylogenetic trees and an integer k ,and asks for a network displaying T and with reticulation number at most k , if such a network exists. Inthis paper, we focus on a restricted version of M INIMUM H YBRIDIZATION , described below.A network N is tree-child if every non-leaf node of N has at least one child that is a tree node.Note that this is equivalent to requiring that every node in N has a tree path to a leaf. The networkin Figure 1a is not tree-child because the children of the red node are both reticulations. A tree-childnetwork displaying the trees in Figure 1b is shown in Figure 2a.M INIMUM T REE -C HILD H YBRIDIZATION b c de (a) 〈 ( a , b ) , ( c , d ) , ( c , b ) , ( c , e ) , ( b , e ) , ( a , e ) , ( e , d ) , ( d , − ) 〉 (b) Figure 2: (a) An optimal tree-child network for the four trees in Figure 1b. Note that this networkhas reticulation number 3, one more than the non-tree-child hybridization network for these trees inFigure 1a. The tree-child cherry picking sequences corresponding to this network is shown in (b).

Input:

A set T = { T , . . . , T t } of phylogenetic trees on X and an integer k . Output:

A tree-child phylogenetic network N on X that displays T and has at most k reticulations, ifsuch a network exists; N ONE otherwise.For a set T = { T , T , . . . , T t } of X -trees, let h ( T ) denote the hybridization number of T , that is, theminimum reticulation number of all networks that display T . Similarly, let h tc ( T ) denote the tree-childhybridization number of T , that is, the minimum reticulation number of all tree-child networks thatdisplay T . For any tree T on X (cid:48) ⊆ X and any two taxa x , y ∈ X (cid:48) , we say that { x , y } is a cherry of T if the leaveslabelled with x and y are siblings in T . Observe that any tree with two or more leaves contains at leastone cherry. A pair { x , y } is a cherry of a set of trees T if it is a cherry of at least one tree in T . It is a trivial cherry of T if { x , y } is a cherry of every tree in T that contains both x and y .Linz and Semple [ ] gave a characterization of tree-child hybridization number in terms of cherrypicking sequences , which we deﬁne next. Informally, a cherry picking sequence is a sequence of pairs ofleaves, describing a sequence of operations on a set of trees T . In particular a pair of the form ( x , y ) denotes the operation of removing leaf x from any tree in T that has { x , y } as a cherry, while a pair ofthe form ( x , − ) is used when at least one tree in T has been reduced to the single leaf x .Formally, a cherry picking sequence is a sequence S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) , ( x r + , − ) , . . . , ( x s , − ) 〉 with { x , x , . . . , x s , y , y , . . . , y r } ⊆ X . We write | S | to denote the length s of S . It may be that s = r ,in which case the last element is ( x r , y r ) , that is, there are no pairs of the form ( x j , − ) . We call such asequence a partial cherry picking sequence. A sequence is full if s > r and { x , . . . x s } = X . For any 1 ≤ i ≤ j ≤ s , we denote by S i , j the subsequence 〈 ( x i , y i ) , . . . , ( x j , y j ) 〉 (where y h is replaced with − for h > r ).Given two sequences S = 〈 ( x , y ) , . . . , ( x r , y r ) 〉 and S (cid:48) = 〈 ( x (cid:48) , y (cid:48) ) , . . . , ( x (cid:48) r (cid:48) , y (cid:48) r (cid:48) ) , ( x (cid:48) r (cid:48) + , − ) , . . . , ( x (cid:48) s (cid:48) , − ) 〉 ,we denote by S ◦ S (cid:48) the sequence 〈 ( x , y ) , . . . , ( x r , y r ) , ( x (cid:48) , y (cid:48) ) , . . . , ( x (cid:48) r (cid:48) , y (cid:48) r (cid:48) ) , ( x (cid:48) r (cid:48) + , − ) , . . . , ( x (cid:48) s (cid:48) , − ) 〉 . Wesay that S ◦ S (cid:48) is an extension of S , and that S is a preﬁx of S ◦ S (cid:48) . If S (cid:48) (cid:54) = 〈〉 , then we call S a proper preﬁxof S ◦ S (cid:48) .For a tree T on X (cid:48) ⊆ X , the sequence S deﬁnes a sequence of trees 〈 T ( ) , T ( ) , . . . , T ( r ) 〉 as follows:5 T ( ) = T ; • If { x j , y j } is a cherry of T ( j − ) , then T ( j ) is obtained from T ( j − ) by removing x j and suppressing y j ’s parent. Otherwise, T ( j ) = T ( j − ) .For notational convenience, we refer to T ( r ) as T / S , the tree obtained by applying the sequence S to T .In addition, for a set of trees T = { T , . . . , T t } , we write T ( j ) to denote the set { T ( j ) , . . . , T ( j ) t } , and T / S todenote the set { T / S , . . . , T r / S } .A full cherry picking sequence S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) , ( x r + , − ) , . . . , ( x s , − ) 〉 isa cherry picking sequence for a set of trees T if every tree in T / S has a single leaf and that leaf is in { x r + , . . . , x s } . (Note in particular that every cherry picking sequence for a set of X -trees is full.) The weight w ( S ) of S is deﬁned to be | S | − | X | .A cherry picking sequence S is tree-child if s ≤ r + y j (cid:54) = x i for all 1 ≤ i < j ≤ s . (Thus, if S is atree-child cherry picking sequence for T , then T / S consists of the single leaf x s for every tree T ∈ T .) Thetree-child cherry picking sequences for the set of trees in Figure 1b corresponding to the two tree-childnetworks in Figures 2a,b are shown in Figure 2c,d. If S is a tree-child cherry picking sequence, we referto the leaves { x , . . . x r } as forbidden leaves with respect to S , since they are forbidden to appear as thesecond element of any cherry ( x j , y j ) with j > r in any tree-child extension of S . We say that S ◦ S (cid:48) isan optimal tree-child extension of S if S ◦ S (cid:48) is a tree-child sequence for T and every extension S ◦ S (cid:48)(cid:48) of S that is a tree-child sequence for T satisﬁes w ( S ◦ S (cid:48)(cid:48) ) ≥ w ( S ◦ S (cid:48) ) . For the purposes of algorithmicconstruction of sequences, we adopt the convention that S ◦ N ONE = N ONE for any sequence S and that w ( N ONE ) = ∞ .Let s tc ( T ) be the minimum weight of all tree-child sequences for T . Linz and Semple showed that theproblem of ﬁnding the tree-child hybridization number of a set T of X -trees is equivalent to ﬁnding theminimum weight of a tree-child cherry picking sequence for T : Theorem 1 (Linz and Semple [ ] ) . Let X be a set of taxa, and T = { T , T , . . . , T t } a collection ofphylogenetic X -trees. Then s tc ( T ) = h tc ( T ) . In this section, we show that M

INIMUM T REE -C HILD H YBRIDIZATION is ﬁxed-parameter tractable withrespect to k . Our proof is based on Linz and Semple’s characterization of tree-child hybridization numberin terms of tree-child cherry picking sequences (see Theorem 1). As such, our main technical contributionis to give a ﬁxed-parameter algorithm, TCS, for the problem of ﬁnding a tree-child cherry pickingsequence of weight at most k , if such a sequence exists. By the following proposition, a correspondingtree-child network can then be found in polynomial time. Proposition 2 (Linz and Semple [ ] ) . There exists a linear-time algorithm that, given a set T of X -trees and a tree-child cherry picking sequence S for T , computes a tree-child network N displaying T withh ( N ) ≤ w ( S ) . For completeness, the pseudocode of this algorithm, T

REE C HILD N ETWORK F ROM S EQUENCE , is given inthe appendix. (Linz and Semple do not state a running time for this algorithm, but it is easy to observethat their algorithm takes linear time in n = | X | , given that there are at most n reticulations.)Our algorithm for computing a tree-child cherry picking sequence of length at most k has the followingstructure: Starting with the set of trees T and the empty sequence S = 〈〉 , the algorithm repeats thefollowing as long as T / S still has a cherry. If T / S has a trivial cherry { x , y } such that y is not forbidden6 rocedure TCS ( T , S , k ) Input:

A collection of phylogenetic trees T , a partial tree-child cherry picking sequence S , and aninteger k Output:

An optimal solution of ( T , S ) if ( T , S ) has a solution of weight at most k ; N ONE otherwise while there exists a trivial cherry { x , y } of T / S with y not forbidden with respect to S do S ← S ◦ 〈 ( x , y ) 〉 T (cid:48) ← T / S if T (cid:48) contains a cherry { x , y } with x , y both forbidden with respect to S then return N ONE else n (cid:48) ← |{ x ∈ X : x is a leaf of a tree in T (cid:48) }| k (cid:48) ← | S | − | X | + n (cid:48) C ← { ( x , y ) | { x , y } is a cherry of some tree in T (cid:48) } if | C | = then return S ◦ 〈 ( x , − ) 〉 , where x is the last remaining leaf in all trees else if | C | > k or k (cid:48) ≥ k then return N ONE else S opt ← N ONE foreach ( x , y ) ∈ C with y not forbidden with respect to S do S temp ← TCS ( T , S ◦ 〈 ( x , y ) 〉 , k ) if w ( S temp ) < w ( S opt ) then S opt ← S temp return S opt with respect to S , it adds ( x , y ) to the end of S . If T / S has no trivial cherry, we show that T / S has atmost 4 k unique cherries or h tc ( T ) > k . The algorithm makes one recursive call for each pair ( x , y ) suchthat { x , y } is a cherry of T / S , starting each recursive call by adding ( x , y ) to the end of S . (Note thatevery cherry { x , y } of T / S gives rise to two recursive calls, one for the pair ( x , y ) and one for the pair ( y , x ) .) As this kind of branching step cannot occur more than k times in a sequence of weight at most k ,this gives a search tree for our algorithm of depth k and branching number at most 8 k .In the remainder of this section, we prove the correctness of procedure TCS and analyze its runningtime. This is summarized in the following theorem (we denote by lg the logarithmic function withbase 2). Theorem 3.

Corollary 4.

Given a collection T of t X -trees with | X | = n, it takes O (( k ) k nt lg t + nt lg nt ) time to decidewhether T has tree-child hybridization number at most k and, if so, compute a corresponding tree-childhybridization network that displays T . It is easy to see that procedure TCS returns a sequence S only if it is a valid tree-child cherry pickingsequence for T . Thus, it sufﬁces to show that if a partial tree-child cherry picking sequence S has7n extension S ◦ S (cid:48) of weight at most k that is a cherry picking sequence for T , then the invocationTCS ( T , S , k ) ﬁnds a shortest such extension. In the remainder of this section, we call an extension S ◦ S (cid:48) of a partial tree-child cherry picking sequence S a solution of ( T , S ) if S ◦ S (cid:48) is a cherry picking sequencefor T ; S ◦ S (cid:48) is an optimal solution of ( T , S ) if there is no solution of ( T , S ) that is shorter than S ◦ S (cid:48) .We split the proof of Theorem 3 into two parts: First, we show that we deal with trivial cherriescorrectly: if ( T , S ) has a solution of weight at most k and T (cid:48) = T / S has a trivial cherry { x , y } such that y is not forbidden with respect to S , then ( T , S ◦ 〈 ( x , y ) 〉 ) has a solution of weight at most k and anyoptimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) is also an optimal solution of ( T , S ) . Thus, adding trivial cherriesto S as TCS does in lines 1–2 is safe. Section 3.1 presents this ﬁrst part of our proof. Second, we showthat if T (cid:48) has no trivial cherries, then either the trees in T (cid:48) have at most 4 k unique cherries or ( T , S ) hasno solution of weight at most k . Thus, aborting the search if | C | > k (since C contains two pairs foreach cherry of T (cid:48) ), as we do in line 13, is correct. The proof of this bound on the number of uniquecherries is divided into two parts. In Section 3.2, we show that this bound holds if S = 〈〉 , that is, if alltrees in T (cid:48) are X -trees. In Section 3.3, we extend this result to arbitrary partial tree-child cherry pickingsequences S . Section 3.4 then completes the proof of Theorem 3. Our algorithm begins by repeatedly pruning trivial cherries in lines 1–2; that is, as long as there exists atrivial cherry { x , y } in T / S with y not forbidden with respect to S , the algorithm extends S by addingthe pair ( x , y ) to S . In this section, we show that this is safe: if ( T , S ) has solution of weight at most k ,then so does ( T , S ◦ 〈 ( x , y ) 〉 ) , and any optimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) is an optimal solution of ( T , S ) .We begin with some simple observations. Proposition 5.

Let S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) 〉 be a tree-child cherry picking sequencefor a set of X -trees T . Then the following properties hold for all j ∈ [ r ] : (i) If y ∈ X is not forbidden with respect to S j , then y is a leaf in every tree in T ( j ) .(ii) If { x , y } is a cherry of T ( j ) , then either ( x , y ) or ( y , x ) is a pair in S j + r .(iii) If { x j , y j } is a trivial cherry of T ( j − ) , then x j is not in any tree in T ( j ) .Proof. Property (i) holds because y is not forbidden with respect to S j and, thus, y (cid:54) = x i for all 1 ≤ i ≤ j .Property (ii) follows because S j + r must delete at least one of x , y from the tree containing { x , y } as acherry and only the pair ( x , y ) or ( y , x ) achieves this. To see why Property (iii) holds, observe that y j isnot forbidden with respect to S j − . Thus, by Property (i), every tree in T ( j − ) contains y j as a leaf. Inparticular, every tree in T ( j − ) containing x j also contains y j . Thus, by the deﬁnition of a trivial cherry,every tree T ( j − ) containing x j contains the cherry { x j , y j } . Thus, applying the pair ( x j , y j ) to T ( j − ) deletes x j from any tree containing x j and no tree in T ( j ) contains x j . Lemma 6.

Let S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) 〉 be a tree-child cherry picking sequence for aset of X -trees T and suppose that { x , y } is a trival cherry of T ( j ) and y is not forbidden with respect to S j .Then there exists a tree-child cherry picking sequence S (cid:48) for T such that | S (cid:48) | = | S | , S (cid:48) j = S j , and ( x , y ) isa pair in S (cid:48) j + r .Proof. We start with the following trivial observation: Let T be a set of trees and let S be a tree-childcherry picking sequence for T . For an arbitrary permutation π of X and any X -tree T , let T | π be the treeobtained from T by changing the label of each leaf from its label z in T to the label π ( z ) in T | π . Let We use [ m ] to denote the set of integers {

1, . . . , m } and [ m ] to denote the set of integers {

0, . . . , m } . | π = { T | π | T ∈ T } . Similarly, let S | π be the sequence obtained from S by replacing every occurrenceof an element z ∈ X in S with π ( z ) . Then S | π is a tree-child cherry picking sequence for T | π . Here, weconsider the permutation π such that π ( x ) = y , π ( y ) = x , and π ( z ) = z for all z ∈ X \ { x , y } , where { x , y } is a trivial cherry of T ( j ) .By Proposition 5(ii), either ( x , y ) or ( y , x ) is a pair in S j + r . In the former case, the sequence S (cid:48) = S satisﬁes the lemma. In the latter case, neither x nor y is forbidden with respect to S j . It follows fromProposition 5(i) and the fact that { x , y } is a trivial cherry of T ( j ) that every tree in T ( j ) has { x , y } as acherry. In particular, neither x nor y is part of a pair in S j . Thus, since S is a tree-child cherry pickingsequence, the sequence S (cid:48) = S j ◦ ( S j + r + ) | π is a tree-child cherry picking sequence such that S (cid:48) j = S j and ( x , y ) ∈ S (cid:48) j + r . To see that S (cid:48) is a tree-child cherry picking sequence for T , observe that S j + r + isa tree-child cherry picking sequence for T ( j ) . Thus, as just observed, ( S j + r + ) | π is a tree-child cherrypicking sequence for T ( j ) | π . However, since { x , y } is a cherry of every tree in T ( j ) , we have T ( j ) | π = T ( j ) , thatis, ( S j + r + ) | π is a tree-child cherry picking sequence for T ( j ) and S (cid:48) = S j ◦ ( S j + r + ) | π is a tree-childcherry picking sequence for T . Lemma 7.

Let T be an X -tree, let T (cid:48) ⊆ T , and let S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) 〉 be a partial tree-childcherry picking sequence such that ( T \ T (cid:48) ) ∩ { y , y , . . . , y r } = (cid:59) . Then T (cid:48) / S ⊆ T / S.Proof.

We prove the claim by induction on | S | . If | S | =

0, then T (cid:48) / S = T (cid:48) ⊆ T = T / S , so the claim holdsin this case. If | S | >

0, then let R (cid:48) = T (cid:48) / S and R = T / S . Note that R ⊇ T − x . If x / ∈ T (cid:48) , then R (cid:48) = T (cid:48) ⊆ T − x ⊆ R . If y / ∈ T (cid:48) , then y / ∈ T because y / ∈ T \ T (cid:48) . Thus, R (cid:48) = T (cid:48) ⊆ T = R .So assume that x , y ∈ T (cid:48) . If { x , y } is a cherry of T (cid:48) , then R (cid:48) = T (cid:48) − x ⊆ T − x ⊆ R . If { x , y } is not a cherry of T (cid:48) , then x , y ∈ T (cid:48) implies that the path from x to y in T (cid:48) has at least one pendantsubtree. Since T (cid:48) ⊆ T , this implies that the path from x to y in T also has at least one pendant subtree,that is, { x , y } is not a cherry of T . Therefore, R (cid:48) = T (cid:48) ⊆ T = R .We have shown that in all possible cases, R (cid:48) ⊆ R . Now observe that R \ R (cid:48) ⊆ ( T \ T (cid:48) ) ∪ { x } . Since S is a partial tree-child cherry picking sequence, S r is a partial tree-child cherry picking sequence and x / ∈ { y , y , . . . , y r } . Since ( T \ T (cid:48) ) ∩ { y , y , . . . , y r } = (cid:59) , this implies that ( R \ R (cid:48) ) ∩ { y , y , . . . , y r } = (cid:59) .Thus, by the induction hypothesis, T (cid:48) / S = R (cid:48) / S r ⊆ R / S r = T / S .We are now ready to prove a stronger version of Lemma 6, which establishes that pruning trivialcherries is safe. Proposition 8.

Let S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) 〉 be a tree-child cherry picking sequence fora set of X -trees T and suppose that { x , y } is a trival cherry of T ( j ) and y is not forbidden with respect to S j .Then there exists a tree-child cherry picking sequence S (cid:48) = 〈 ( x (cid:48) , y (cid:48) ) , ( x (cid:48) , y (cid:48) ) , . . . , ( x (cid:48) r (cid:48) , y (cid:48) r (cid:48) ) , ( x (cid:48) r (cid:48) + , − ) 〉 for T such that | S (cid:48) | ≤ | S | , S (cid:48) j = S j , and ( x (cid:48) j + , y (cid:48) j + ) = ( x , y ) .Proof. By Lemma 6, there exists a tree-child cherry picking sequence S (cid:48) = 〈 ( x (cid:48) , y (cid:48) ) , ( x (cid:48) , y (cid:48) ) , . . . , ( x (cid:48) r (cid:48) , y (cid:48) r (cid:48) ) , ( x (cid:48) r (cid:48) + , − ) 〉 for T such that r (cid:48) ≤ r , S (cid:48) j = S j and ( x , y ) ∈ S (cid:48) j + r (cid:48) . We choose S (cid:48) from theset of all such cherry picking sequences so that the index j (cid:48) > j with ( x j (cid:48) , y j (cid:48) ) = ( x , y ) is minimized. If j (cid:48) = j +

1, the lemma holds. If j (cid:48) > j +

1, we obtain a contradiction to the choice of S (cid:48) by transforming S (cid:48) into another tree-child cherry picking sequence S (cid:48)(cid:48) = 〈 ( x (cid:48)(cid:48) , y (cid:48)(cid:48) ) , . . . , ( x (cid:48)(cid:48) r (cid:48)(cid:48) , y (cid:48) r (cid:48)(cid:48) ) , ( x (cid:48)(cid:48) r (cid:48)(cid:48) + , − ) 〉 for T suchthat | S (cid:48)(cid:48) | ≤ | S (cid:48) | ≤ | S | , S (cid:48)(cid:48) j = S (cid:48) j = S j , and ( x (cid:48)(cid:48) j (cid:48) − , y (cid:48)(cid:48) j (cid:48) − ) = ( x , y ) .So assume that j (cid:48) > j + ( x (cid:48) j (cid:48) − , y (cid:48) j (cid:48) − ) = ( v , w ) . We distinguish two cases: w = x : In this case, we set r (cid:48)(cid:48) = r (cid:48) − ( x (cid:48)(cid:48) h , y (cid:48)(cid:48) h ) = ( x (cid:48) h , y (cid:48) h ) for all 1 ≤ h ≤ j (cid:48) −

2, and ( x (cid:48)(cid:48) h , y (cid:48)(cid:48) h ) =( x (cid:48) h + , y (cid:48) h + ) for all j (cid:48) − ≤ h ≤ r (cid:48)(cid:48) +

1; that is, we obtain S (cid:48)(cid:48) by deleting the pair ( x (cid:48) j (cid:48) − , y (cid:48) j (cid:48) − ) from9 (cid:48) . Thus, S (cid:48)(cid:48) ⊂ S (cid:48) , | S (cid:48)(cid:48) | < | S (cid:48) | , and ( x (cid:48)(cid:48) j (cid:48) − , y (cid:48)(cid:48) j (cid:48) − ) = ( x (cid:48) j (cid:48) , y (cid:48) j (cid:48) ) = ( x , y ) . Since S (cid:48) is a tree-child cherrypicking sequence, this implies that S (cid:48)(cid:48) also is a tree-child cherry picking sequence. To see that S (cid:48)(cid:48) isa tree-child cherry picking sequence for T , it sufﬁces to prove that T / S (cid:48) j (cid:48) − = T / S (cid:48) j − and, thus, T / S (cid:48)(cid:48) = ( T / S (cid:48) j (cid:48) − ) / S (cid:48) j (cid:48) , r (cid:48) = ( T / S (cid:48) j (cid:48) − ) / S (cid:48) j (cid:48) , r (cid:48) = T / S (cid:48) for every tree T ∈ T .To prove this, observe that v (cid:54) = y and y ∈ T / S (cid:48) h for all T ∈ T and all 1 ≤ h < j (cid:48) because y j (cid:48) = y , that is, y is not forbidden with respect to S (cid:48) j (cid:48) − . Thus, since { x , y } is a trivial cherryof T / S j = T / S (cid:48) j and j < j (cid:48) , { x , y } is a cherry of every tree T / S (cid:48) j in T / S (cid:48) j that contains x .Since y is also a leaf of every tree T / S (cid:48) j (cid:48) − in T / S (cid:48) j (cid:48) − (again, because y is not forbidden withrespect to S (cid:48) j (cid:48) − ), this implies that { x , y } is also a cherry of every tree in T / S (cid:48) j (cid:48) − that contains x . In particular, since v (cid:54) = y , { v , w } = { v , x } is not a cherry of any tree T / S (cid:48) j (cid:48) − in T / S (cid:48) j (cid:48) − and T / S (cid:48) j (cid:48) − = T / S (cid:48) j (cid:48) − for all T ∈ T . w (cid:54) = x : In this case, we set ( x (cid:48)(cid:48) j (cid:48) − , y (cid:48)(cid:48) j (cid:48) − ) = ( x (cid:48) j (cid:48) , y (cid:48) j (cid:48) ) , ( x (cid:48)(cid:48) j (cid:48) , y (cid:48)(cid:48) j (cid:48) ) = ( x (cid:48) j (cid:48) − , y (cid:48) j (cid:48) − ) , and ( x (cid:48)(cid:48) h , y (cid:48)(cid:48) h ) = ( x (cid:48) h , y (cid:48) h ) for all h / ∈ { j (cid:48) − j (cid:48) } , that is, we obtain S (cid:48)(cid:48) by swapping ( x (cid:48) j (cid:48) − , y (cid:48) j (cid:48) − ) = ( v , w ) and ( x (cid:48) j (cid:48) , y (cid:48) j (cid:48) ) = ( x , y ) in S . This clearly implies that | S (cid:48)(cid:48) | = | S (cid:48) | and ( x (cid:48)(cid:48) j (cid:48) − , y (cid:48)(cid:48) j (cid:48) − ) = ( x (cid:48) j (cid:48) , y (cid:48) j (cid:48) ) = ( x , y ) . To see that S (cid:48)(cid:48) is atree-child cherry picking sequence, observe that every pair ( x (cid:48)(cid:48) h , y (cid:48)(cid:48) h ) in S (cid:48)(cid:48) with h (cid:54) = j (cid:48) is precededby a subset of the pairs that precede it in S (cid:48) . Thus, since S (cid:48) is a tree-child cherry picking sequence, y (cid:48)(cid:48) h is not forbidden with respect to S (cid:48)(cid:48) h − . For the pair ( x (cid:48)(cid:48) j (cid:48) , y (cid:48)(cid:48) j (cid:48) ) , y (cid:48)(cid:48) j (cid:48) is not forbidden with respect to S (cid:48)(cid:48) j (cid:48) − because S (cid:48)(cid:48) j (cid:48) − = S (cid:48) j (cid:48) − and ( x (cid:48)(cid:48) j (cid:48) , y (cid:48)(cid:48) j (cid:48) ) = ( x (cid:48) j (cid:48) − , y (cid:48) j (cid:48) − ) . This implies that y (cid:48)(cid:48) j (cid:48) is not forbiddenwith respect to S (cid:48)(cid:48) j (cid:48) − because y (cid:48)(cid:48) j (cid:48) = y (cid:48) j (cid:48) − = w (cid:54) = x = x (cid:48) j (cid:48) = x (cid:48)(cid:48) j (cid:48) − .It remains to show that T / S (cid:48)(cid:48) = T / S (cid:48) for all T ∈ T . To this end, it sufﬁces to show that T / S (cid:48)(cid:48) ⊆ T / S (cid:48) because T / S (cid:48) has only one leaf, x r + , and T / S (cid:48)(cid:48) (cid:54) = (cid:59) , that is, T / S (cid:48)(cid:48) ⊆ T / S (cid:48) implies that T / S (cid:48)(cid:48) = T / S (cid:48) .To see that T / S (cid:48)(cid:48) ⊆ T / S , let T (cid:48) = T / S (cid:48) j (cid:48) − . Then T (cid:48) / 〈 ( x , y ) 〉 ⊆ T (cid:48) , T (cid:48) \ ( T (cid:48) / 〈 ( x , y ) 〉 ) ⊆ { x } , and x / ∈ { w , y } . By Lemma 7, this implies that T (cid:48) / 〈 ( x , y ) , ( v , w ) , ( x , y ) 〉 ⊆ T (cid:48) / 〈 ( v , w ) , ( x , y ) 〉 . However,as argued above, { x , y } is a cherry of T (cid:48) , so x / ∈ T (cid:48) / 〈 ( x , y ) 〉 and, thus, x / ∈ T (cid:48) / 〈 ( x , y ) , ( v , w ) 〉 . Thisimplies that T (cid:48) / 〈 ( x , y ) , ( v , w ) , ( x , y ) 〉 = T (cid:48) / 〈 ( x , y ) , ( v , w ) 〉 and, therefore, T (cid:48) / 〈 ( x , y ) , ( v , w ) 〉 ⊆ T (cid:48) 〈 ( v , w ) , ( x , y ) 〉 . Since T (cid:48) = T / S (cid:48) j (cid:48) − = T / S (cid:48)(cid:48) j (cid:48) − , S (cid:48) j (cid:48) = S (cid:48) j (cid:48) − ◦ 〈 ( v , w ) , ( x , y ) 〉 , and S (cid:48)(cid:48) j (cid:48) = S (cid:48)(cid:48) j (cid:48) − ◦ 〈 ( x , y ) , ( v , w ) 〉 , this shows that T / S (cid:48)(cid:48) j (cid:48) ⊆ T / S (cid:48) j (cid:48) . Using Lemma 7 again, this shows that T / S (cid:48)(cid:48) = ( T / S (cid:48)(cid:48) j (cid:48) ) / S (cid:48)(cid:48) j (cid:48) + r (cid:48) = ( T / S (cid:48)(cid:48) j (cid:48) ) / S (cid:48) j (cid:48) + r (cid:48) ⊆ ( T / S (cid:48) j (cid:48) ) / S (cid:48) j (cid:48) + r (cid:48) = T / S (cid:48) . X -trees Once the algorithm has eliminated all trivial cherries from a set of input trees, each of the remaining(non-trivial) cherries of T / S is a candidate for being the next pair to be added to S . Our algorithm makesone recursive call for each possible choice of this next pair (lines 15–20). In order to limit the number ofrecursive calls it makes, the algorithm aborts and reports failure if there are more than 8 k choices tobranch on. To prove that this does not prevent us from ﬁnding a tree-child cherry picking sequence ofweight at most k , if such a sequence exists, we need to prove the following claim: Proposition 9. If ( T , S ) has a solution of weight at most k and T / S has no trivial cherries, then the numberof unique cherries in T / S is at most k. Note that this claim refers to the weight k of the whole sequence S ◦ S (cid:48) , not the weight of S (cid:48) . This isbecause the proof uses the structure of S as well as S (cid:48) to bound the number of unique cherries in T / S .10ur proof has two parts: In this subsection, we consider the case when S = 〈〉 , that is, when we havea set of X -trees T with hybridization number at most k and no trivial cherries. In the next subsection,we prove the claim for S (cid:54) = 〈〉 , via a reduction to the case when S = 〈〉 . Lemma 10. If T is a set of X -trees without trivial cherries and with tree-child hybridization number k, thenthe total number of cherries of the trees in T is at most k.Proof. Let N be a tree-child network with k reticulations that displays T and, for each tree T i ∈ T , let f i be an embedding of T i into N . Our strategy is to “charge” each cherry { x , y } of T to some reticulationedge in a manner that charges every reticulation edge for at most two cherries. Since N has hybridizationnumber at most k and, therefore, at most 2 k reticulation edges, this proves the lemma.We start by proving a number of auxiliary claims about how the images of cherries interact withreticulation edges and with each other. The ﬁrst three claims consider a ﬁxed cherry { x , y } of some tree T i ∈ T and a ﬁxed tree T j that does not have { x , y } as a cherry. Since { x , y } is non-trivial, such a tree T j exists. Let p be the common parent of x and y in T i and let e x = p x and e y = p y be the parent edgesof x and y in T i , respectively. Since T j is an X -tree, we have x , y ∈ T j . Let u be the LCA of x and y in T j , and let P x and P y be the paths from u to x and from u to y in T j , respectively. Since { x , y } is nota cherry of T j , the path P x ∪ P y has at least one pendant edge. Claim 1.

All pendant nodes of f i ( e x ) ∪ f i ( e y ) are reticulations.Proof. Consider any pendant node w of f i ( e x ) ∪ f i ( e y ) and let e be the edge connecting w to a node v in f i ( e x ) ∪ f i ( e y ) . Neither endpoint of e is the root of N . Since N is a tree-child network, there exists a treepath Q from w to a leaf f i ( (cid:96) w ) . Consider the path P from the root to (cid:96) w in T i . Since e x and e y are notin P , f i ( e x ) ∪ f i ( e y ) and f i ( P ) are edge-disjoint. On the other hand, since Q is a tree path, Q ⊆ f i ( P ) .Since w is not the root of N and f i ( P ) ’s top endpoint is the root of N , Q is a proper subpath of f i ( P ) , thatis, f i ( P ) contains a parent edge of w . If f i ( P ) contained e , then f i ( P ) would be a proper superpath of Q ∪ e because e ’s top endpoint also is not the root of N . Thus, f i ( P ) would contain the parent edge of v ,that is, f i ( P ) and f i ( e x ) ∪ f i ( e y ) would not be edge-disjoint, a contradiction. Therefore, e / ∈ f i ( P ) and w has another parent edge, that is, w is a reticulation. Claim 2.

The path f i ( e x ) ∪ f i ( e y ) contains at most one reticulation. This reticulation is incident to f i ( p ) .Proof. We prove that only the top edge of f i ( e x ) can be a reticulation edge. An analogous argument showsthat only the top edge of f i ( e y ) can be a reticulation edge. Thus, all reticulation edges in f i ( e x ) ∪ f i ( e y ) are incident to f i ( p ) . If the top edges of f i ( e x ) and f i ( e y ) are both reticulation edges, then both childrenof f i ( p ) are reticulations, a contradiction because N is a tree-child network. Thus, f i ( e x ) ∪ f i ( e y ) containsat most one reticulation.So assume that f i ( e x ) contains a reticulation edge and choose such an edge e that is closest to f i ( p ) .If e is incident to p , our claim holds. So assume e is not incident to p and let z be its top endpoint. By thechoice of e , z is a tree node. However, by Claim 1, this implies that both of z ’s children are reticulations,a contradiction again because N is a tree-child network. Claim 3.

If the path f i ( e x ) ∪ f i ( e y ) contains no reticulation, then it has at least one pendant node.Proof. If f i ( e x ) ∪ f i ( e y ) contains no reticulation and has no pendant nodes, then f i ( x ) and f i ( y ) arechildren of f i ( p ) in N . Thus, both f j ( P x ) and f j ( P y ) include f i ( p ) . Since f j ( P x ) and f j ( P y ) share onlytheir top endpoint f j ( u ) , we have f j ( u ) = f i ( p ) and thus f j ( P x ) = f i ( e x ) and f j ( P y ) = f i ( e y ) . This,however, is a contradiction because f i ( e x ) ∪ f i ( e y ) has no pendant nodes but P x ∪ P y has a pendant nodein T j , that is, f j ( P x ) ∪ f j ( P y ) must also have a pendant node in N .11or the next two claims, ﬁx two distinct cherries { x , y } and { w , z } of two trees T i ∈ T and T j ∈ T ,respectively. Let p be the common parent of x and y in T i , and let q be the common parent of w and z in T j . Claim 4. f i ( e x ) ∪ f i ( e y ) and f j ( e w ) ∪ f j ( e z ) do not share any reticulation edge.Proof. Assume the contrary. Then let e be a reticulation edge in ( f i ( e x ) ∪ f i ( e y )) ∩ ( f j ( e w ) ∪ f j ( e z )) andassume w.l.o.g. that e ∈ f i ( e x ) ∩ f j ( e w ) . By Claim 2, f i ( p ) = f j ( q ) ; e is the ﬁrst edge in both f i ( e x ) and in f j ( e w ) ; f i ( e y ) and f j ( e z ) are both tree paths from f i ( p ) to f i ( y ) and f j ( z ) , respectively; and the subpathsof f i ( e x ) to f i ( e w ) from e ’s bottom endpoint to f i ( x ) and f j ( w ) , respectively, are also tree paths.Since every pendant node of f i ( e y ) is a reticulation, by Claim 1, none of these pendant nodes canbelong to f j ( e z ) . Thus, f j ( z ) = f i ( y ) , that is, z = y . Similarly, none of the pendant nodes of the subpathof f i ( x ) from e ’s bottom endpoint to f i ( x ) can belong to f j ( w ) . Thus, f j ( w ) = f i ( x ) , that is, w = x . Thisshows that { x , y } = { w , z } , a contradiction. Claim 5.

If neither f i ( e x ) ∪ f i ( e y ) nor f j ( e w ) ∪ f j ( e z ) contains a reticulation edge, then these two paths arevertex-disjoint.Proof. Assume that neither f i ( e x ) ∪ f i ( e y ) nor f j ( e w ) ∪ f j ( e z ) contains a reticulation edge and assumeﬁrst that f i ( e x ) ∪ f i ( e y ) and f j ( e w ) ∪ f j ( e z ) are not edge-disjoint . Then, w.l.o.g., f i ( e x ) and f j ( e w ) share anedge e . Since f i ( e x ) and f j ( e w ) are tree paths, the same argument as in the proof of Claim 4 shows that x = w . If f i ( e y ) and f j ( e z ) also share an edge, then the same argument shows that y = z . Otherwise,w.l.o.g. f j ( q ) is an internal node of f i ( e x ) and the ﬁrst node after f j ( q ) in f j ( e z ) is a pendant node of f i ( e x ) . By Claim 1, this node is a reticulation, a contradiction. This shows that f i ( e x ) ∪ f i ( e y ) and f j ( e w ) ∪ f j ( e z ) are edge-disjoint.If f i ( e x ) ∪ f i ( e y ) and f j ( e w ) ∪ f j ( e z ) are edge-disjoint but not vertex-disjoint, then their shared vertex v satisﬁes either v (cid:54) = f i ( p ) and v (cid:54) = f j ( q ) or w.l.o.g. v = f i ( p ) . In the former case, the parent edge of v belongs to both f i ( e x ) ∪ f i ( e y ) and f j ( e w ) ∪ f j ( e z ) , a contradiction. In the latter case, both child edges of v belong to f i ( e x ) ∪ f i ( e y ) and f j ( e w ) ∪ f j ( e z ) has to contain at least one of them, again a contradiction.Now we call a cherry { x , y } of some tree T i a type-I cherry if the path f i ( e x ) ∪ f i ( e y ) contains areticulation edge; otherwise, it is a type-II cherry . We charge each type-I cherry { x , y } to the reticulationedge in f i ( e x ) ∪ f i ( e y ) . By Claim 4, every reticulation edge is charged for at most one type-I cherry. Forevery type-II cherry { x , y } , Claim 3 shows that w.l.o.g., f ( x ) ’s sibling v in N is a pendant node of f ( e x ) .By Claim 1, v is a reticulation. Thus, the edge e between v and f ( x ) ’s parent is a reticulation edge. Wecharge the cherry { x , y } to e . Since e has an endpoint in f ( e x ) , Claim 5 implies that e is charged foronly one type-II cherry. This proves that every reticulation edge is charged for at most two reticulations,one of type I and one of type II. This ﬁnishes the proof. Having shown, in Lemma 10, that Proposition 9 holds when S = 〈〉 , we extend the proof to arbitrarypartial tree-child cherry picking sequences in this section, thereby completing the proof of Proposition 9.The main idea is to construct a set of X -trees ˆ T that has the same set of cherries as T / S (and in particularhas no trivial cherries) and then show that ˆ T has reticulation number at most k . By Lemma 10, thisimplies that ˆ T , and thus T / S , has at most 4 k cherries. Lemma 11.

Let T be a set of X -trees and let S = 〈 ( x , y ) , ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) 〉 be a tree-childcherry picking sequence for T of weight at most k. For any j ∈ [ r ] , either there exists a trivial cherry of T ( j ) , or T ( j ) has at most k unique cherries. \ { x i , . . . , x i (cid:96) } x i x i (cid:96) zT ( j ) Figure 3: The construction of the tree ˆ T ( j ) from T ( j ) and a caterpillar C with leaf set { z , x i , . . . , x i (cid:96) } . Proof.

For j =

0, the claim holds by Lemma 10. For j >

0, we cannot apply Lemma 10 directly becausethe trees in T ( j ) may have different leaf sets. Assume that T ( j ) has no trivial cherry, because otherwise thelemma holds. In order to use Lemma 10 to bound the number of unique cherries in T ( j ) , we transform T ( j ) into a set of X -trees ˆ T ( j ) with the following properties:1. ˆ T ( j ) has the same unique cherries as T ( j ) ;2. ˆ T ( j ) has no trivial cherries; and3. ˆ T ( j ) has tree-child hybridization number at most k .By Properties 2 and 3 and Lemma 10, ˆ T ( j ) has at most 4 k unique cherries. Thus, by Property 1, T ( j ) hasat most 4 k unique cherries.To obtain ˆ T ( j ) from T ( j ) , let T (cid:48) ⊆ T be the subset of trees T ∈ T such that T ( j ) has at least two leaves.We can assume that T (cid:48) (cid:54) = (cid:59) because otherwise, T ( j ) has no cherries and the claim holds. Also note thatevery cherry of T ( j ) is a cherry of some tree T ( j ) with T ∈ T (cid:48) . Now consider any tree T ∈ T (cid:48) and let i < . . . < i (cid:96) be the indices in [ j ] such that ( x i h , y i h ) is a cherry of T ( i h − ) for all 1 ≤ h ≤ (cid:96) . In other words, T ( i ) (cid:54) = T ( i − ) if and only if i ∈ { i , . . . , i (cid:96) } . Observe that T ( j ) has label set X \ { x i , . . . , x i (cid:96) } . Let C be acaterpillar with leaf set { z , x i , . . . , x i (cid:96) } , from bottom to top. We construct a tree ˆ T ( j ) from T ( j ) and C by identifying z with the root of T ( j ) . This is illustrated in Figure 3. ˆ T ( j ) is the set of all such trees ˆ T ( j ) :ˆ T ( j ) = { ˆ T ( j ) | T ∈ T (cid:48) } .Property 1 holds because the trees in T ( j ) \ ( T (cid:48) ) ( j ) have no cherries and, for every tree T ∈ T (cid:48) , ˆ T ( j ) hasthe same cherries as T ( j ) : T ( j ) is a pendant subtree of ˆ T ( j ) , so every cherry of T ( j ) is a cherry of ˆ T ( j ) .Every cherry of ˆ T ( j ) that is not a cherry of T ( j ) would have to involve some leaf x i h , but none of theseleaves is part of a cherry because T ( j ) has at least two leaves.To see that Property 2 holds, observe that every trivial cherry { x , y } would have to be a cherry of every tree in ˆ T ( j ) because all trees in ˆ T ( j ) have the same label set. Thus, by Property 1, { x , y } would bea cherry of every tree T ( j ) such that T ∈ T (cid:48) . By the deﬁnition of T (cid:48) , { x , y } would therefore be a trivialcherry of T ( j ) , but T ( j ) has no trivial cherries. Thus, ˆ T ( j ) has no trivial cherries.To prove that ˆ T ( j ) has tree-child hybridization number at most k (Property 3), we construct a tree-childcherry picking sequence ˆ S of weight at most k for ˆ T ( j ) . This sequence is deﬁned asˆ S = 〈 ( x j + , y j + ) , . . . , ( x r , y r ) , ( x , x r + ) , . . . , ( x j , x r + ) , ( x r + , − ) 〉 ,that is, we swap the subsequences 〈 ( x , y ) , . . . , ( x j , y j ) 〉 and 〈 ( x j + , y j + ) , . . . , ( x r , y r ) 〉 of S and thenreplace y i with x r + in each pair ( x i , y i ) with 1 ≤ i ≤ j . By construction, ˆ S has the same weight as S ,that is, its weight is at most k .To see that ˆ S is a tree-child cherry picking sequence, observe that 〈 ( x j + , y j + ) , . . . , ( x r , y r ) 〉 is asubsequence of a tree-child cherry picking sequence, namely S , and is thus a partial tree-child cherry13icking sequence. Since S reduces each tree in T to the single leaf x r + , we have x r + / ∈ { x , . . . , x r } , so x r + is not forbidden with respect to 〈 ( x j + , y j + ) , . . . , ( x r , y r ) , ( x , x r + ) , . . . , ( x i , x r + ) 〉 , for any i ∈ [ j ] .Thus, ˆ S is a tree-child cherry picking sequence.It remains to prove that ˆ S is a cherry picking sequence for every tree ˆ T ( j ) ∈ ˆ T ( j ) . Observe thatthe sequence S (cid:48) = 〈 ( x j + , y j + ) , . . . , ( x r , y r ) 〉 reduces T ( j ) to the single leaf x r + . Thus, after applying S (cid:48) to ˆ T ( j ) , we obtain a subtree C (cid:48) of the caterpillar C with z replaced with x r + . ( S (cid:48) may also deletesome leaves of C .) Since the leaves x i , . . . , x i (cid:96) of C appear in this order from bottom to top in C ,the sequence 〈 ( x , x r + ) , . . . , ( x j , x r + ) 〉 reduces C (cid:48) to the single leaf x r + . Thus, ˆ S is a cherry pickingsequence for ˆ T ( j ) . Using the results from the previous three subsections, we are now ready to prove Theorem 3. Whileour algorithm computes T (cid:48) only in line 3, and n (cid:48) , k (cid:48) , and C only in lines 7–9, it is convenient forthe sake of this proof to view them as quantities that evolve over time, as functions of S . We deﬁne n (cid:48) ( T , S ) = |{ x ∈ X | x is a leaf of a tree in T / S }| and k (cid:48) ( T , S ) = | S | − | X | + n (cid:48) ( T , S ) for any partialtree-child cherry picking sequence S .We divide the proof of Theorem 3 into three parts. First, we prove that k (cid:48) ( T , S ) is invariant over thecourse of any invocation TCS ( T , S , k ) and that 0 ≤ k (cid:48) ( T , S ) ≤ k in every invocation the algorithm makes.This will be used in the analysis of the running time of the algorithm and in proving the correctness ofthe algorithm in the case when it returns a sequence in line 11. Then, we bound the running time of thealgorithm by O (( k ) k nt lg t + nt lg nt ) , where n = | X | and t = | T | . This implies in particular that thenumber of recursive calls the algorithm makes is ﬁnite, a fact that will be used in the correctness proof.Finally, we consider the tree of recursive calls the algorithm makes and use induction on the number ofdescendant invocations of any invocation TCS ( T , S , k ) to prove the correctness of this invocation. Lemma 12.

For a collection of X -trees T , any partial cherry picking sequence S, and any non-trivial cherry { x , y } of T / S, k (cid:48) ( T , S ◦ 〈 ( x , y ) 〉 ) = k (cid:48) ( T , S ) + .Proof. Since { x , y } is a non-trivial cherry of T / S , there exists a tree T / S ∈ T / S that contains both x and y but not the cherry { x , y } . Thus, applying the pair ( x , y ) to T / S does not remove x from all treesin T / S . In particular, n (cid:48) ( T , S ◦ 〈 ( x , y ) 〉 ) = n (cid:48) ( T , S ) and, therefore, k (cid:48) ( T , S ◦ 〈 ( x , y ) 〉 ) = | S ◦ 〈 ( x , y ) 〉| − | X | + n (cid:48) ( T , S ◦ 〈 ( x , y ) 〉 ) = | S | + − | X | + n (cid:48) ( T , S ) = k (cid:48) ( T , S ) + Lemma 13.

The value of k (cid:48) ( T , S ) is invariant over the course of any invocation TCS ( T , S , k ) and satisﬁes ≤ k (cid:48) ( T , S ) ≤ k. Moreover, an invocation TCS ( T , S , k ) satisﬁes k (cid:48) ( T , S ) = if and only if S = 〈〉 .Proof. First we prove that k (cid:48) ( T , S ) does not change over the course of any invocation TCS ( T , S , k ) . Notethat in a given invocation TCS ( T , S , k ) , S changes only in line 2. Each execution of line 2 adds a pair ( x , y ) to S , thereby increasing | S | by one. Since { x , y } is a trivial cherry of T (cid:48) and y is not forbiddenwith respect to S in this case, this also removes x from all trees in T / S , so n (cid:48) ( T , S ) decreases by one and k (cid:48) ( T , S ) = | S | − | X | + n (cid:48) ( T , S ) remains unchanged.We prove the bounds on k (cid:48) ( T , S ) for each invocation TCS ( T , S , k ) by induction on | S | .If | S | =

0, then S = 〈〉 . In this case, T / S = T , so n (cid:48) ( T , S ) = | X | , that is, k (cid:48) ( T , S ) = | S | − | X | + n (cid:48) ( T , S ) = | S | − | X | + | X | = | S | >

0, then TCS ( T , S , k ) is called by another invocation TCS ( T , S (cid:48) , k ) with | S (cid:48) | < | S | . By theinduction hypothesis, we have k (cid:48) ( T , S (cid:48) ) ≥

0. Let S (cid:48)(cid:48) be a snapshot of S (cid:48) in line 12 of the invocationTCS ( T , S (cid:48) , k ) . Then S = S (cid:48)(cid:48) ◦ 〈 ( x , y ) 〉 , where { x , y } is a non-trivial cherry of T / S (cid:48)(cid:48) . Thus, by Lemma 12, k (cid:48) ( T , S ) = k (cid:48) ( T , S (cid:48)(cid:48) ) +

1. Since k (cid:48) ( T , S (cid:48)(cid:48) ) = k (cid:48) ( T , S (cid:48) ) , this implies that k (cid:48) ( T , S ) > k (cid:48) ( T , S (cid:48) ) ≥

0. By the14econd condition in line 12, we have k (cid:48) ( T , S (cid:48)(cid:48) ) < k (because TCS ( T , S (cid:48) , k ) makes the recursive callTCS ( T , S , k ) ), so k (cid:48) ( T , S ) = k (cid:48) ( T , S (cid:48)(cid:48) ) + ≤ k .The following proposition now establishes the running time bound stated in Theorem 3. Proposition 14.

The total running time of the invocation

TCS ( T , 〈〉 , k ) and all its descendant invocationsis O (( k ) k nt lg t + nt lg nt ) , where n = | X | and t = | T | .Proof. We only provide a sketch of the argument that the algorithm’s state can be initialized in O ( nt lg nt ) time and that each invocation of procedure TCS, excluding the recursive calls it makes, has cost O ( nt lg t ) .A careful proof is straightforward but tedious. To prove the proposition, it then sufﬁces to prove that thealgorithm makes O (( k ) k ) recursive calls.Instead of computing T (cid:48) from scratch as in the pseudo-code of procedure TCS, we ﬁrst constructthe state of the top-level invocation TCS ( T , 〈〉 , k ) consisting of T (cid:48) and the lists of trivial and non-trivialcherries. Whenever an invocation makes a recursive call, it makes a copy of its state to be modiﬁed bythe recursive call.Identifying the cherries in T (cid:48) = T for the top-level invocation TCS ( T , 〈〉 , k ) takes O ( nt lg nt ) timeusing appropriate dictionaries (e.g., balanced binary search trees) to identify leaves with the same labelsin different trees and to collect all occurrences of the same cherry in different trees.Copying the state of the current invocation for each recursive call the algorithm makes takes O ( nt ) time because the state is easily seen to have size O ( nt ) . We charge this cost to the recursive call. Eachpair added to S eliminates the corresponding cherry from up to t trees and thereby creates up to t newcherries. Updating T (cid:48) and the lists of trivial and non-trivial cherries for each such cherry takes O ( lg t ) time, O ( t lg t ) time in total for each pair added to S . Each invocation adds at most n pairs correspondingto trivial cherries to S , in line 2. Each pair ( x , y ) added to S in line 17 can be charged to the recursivecall TCS ( T , S ◦ 〈 ( x , y ) 〉 , k ) made in line 17. Thus, each invocation adds at most one pair correspondingto a non-trivial cherry to S . The cost of updating T (cid:48) and the list of trivial and non-trivial cherries in eachinvocation is thus O ( nt lg t ) . Adding the cost of making a copy of the parent invocation’s state at thebeginning of each invocation, the cost per invocation is thus O ( nt lg t ) . To obtain the time bound statedin the proposition, it remains to bound the number of recursive calls the algorithm makes by O (( k ) k ) .Let m k (cid:48) be the number of invocations TCS ( T , S , k ) with k (cid:48) ( T , S ) = k (cid:48) . By Lemma 13, every invocationTCS ( T , S , k ) the algorithm makes satisﬁes 0 ≤ k (cid:48) ( T , S ) ≤ k and the total number of invocations istherefore (cid:80) kk (cid:48) = m k (cid:48) . Also by Lemma 13, there is exactly one invocation TCS ( T , S , k ) with k (cid:48) ( T , S ) = ( T , 〈〉 , k ) . Finally, by Lemma 12, every child invocation TCS ( T , S , k ) of an invocation TCS ( T , S , k ) satisﬁes k (cid:48) ( T , S , k ) = k (cid:48) ( T , S , k ) +

1. Thus, since each invocation makesat most 8 k recursive calls in line 17, we obtain m k (cid:48) + ≤ k · m k (cid:48) . A simple inductive argument now showsthat m k (cid:48) ≤ ( k ) k (cid:48) for all 0 ≤ k (cid:48) ≤ k . Thus, the total number of recursive calls the algorithm makes is atmost (cid:80) kk (cid:48) = ( k ) k (cid:48) = ( k ) k + − k − = O (( k ) k ) .To establish the correctness of procedure TCS, we need a few simple auxiliary lemmas. Lemma 15.

Let S be a partial cherry picking sequence S without any pairs of the form ( x , − ) . Any solutionof ( T , S ) has weight at least k (cid:48) ( T , S ) .Proof. Consider any cherry picking sequence S ◦ S (cid:48) for T . Let X be the set of leaf labels of the treesin T / ( S ◦ S (cid:48) ) , and let X be the subset of leaf labels of the trees in T / S that are not in X . Then n (cid:48) ( T , S ) = | X | + | X | .Every leaf x ∈ X must be removed from the trees in T / S by at least one pair ( x , y ) ∈ S (cid:48) . Forevery leaf x ∈ X , S (cid:48) must contain a pair ( x , − ) . Thus, | S (cid:48) | ≥ | X | + | X | = n (cid:48) ( T , S ) . Therefore, | S ◦ S (cid:48) | − | X | = | S | + | S (cid:48) | − | X | ≥ | S | − | X | + n (cid:48) ( T , S ) = k (cid:48) ( T , S ) .15 emma 16. Let T be a collection of X -trees, and S a partial tree-child cherry picking sequence such that atleast one tree in T / S has more than one leaf. Then any optimal solution of ( T , S ) is an extension of somesequence S ◦ 〈 ( x , y ) 〉 , where { x , y } is a cherry of T / S.Proof.

Consider any optimal solution S ◦ S (cid:48) of ( T , S ) . Since there exists a tree T ∈ T such that T / S hasat least two leaves, the ﬁrst pair in S (cid:48) is a pair ( x , y ) with x , y ∈ X . Let S (cid:48) = 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) and assume forthe sake of contradiction that { x , y } is not a cherry of any tree in T / S . Then S ◦ S (cid:48)(cid:48) ⊂ S ◦ S (cid:48) , so S ◦ S (cid:48)(cid:48) isa tree-child cherry picking sequence and | S ◦ S (cid:48)(cid:48) | < | S ◦ S (cid:48) | . Since { x , y } is not a cherry of any tree in T / S , we have T / ( S ◦ 〈 ( x , y ) 〉 ) = T / S for all T ∈ T . Thus, T / ( S ◦ S (cid:48)(cid:48) ) = T / ( S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) ) = T / ( S ◦ S (cid:48) ) for all T ∈ T . Since S ◦ S (cid:48) is a cherry picking sequence for T , this shows that S ◦ S (cid:48)(cid:48) is a cherry pickingsequence for T , a contradiction.The following proposition now ﬁnishes the proof of Theorem 3 by proving that the invocationTCS ( T , 〈〉 , k ) returns a shortest tree-child cherry picking sequence for T if and only if T has a tree-childcherry picking sequence of weight at most k . Proposition 17.

Given a set T of X -trees, a partial tree-child cherry picking sequence S, and an integer k, TCS ( T , S , k ) returns an optimal solution of ( T , S ) if and only if ( T , S ) has a solution of weight at most k.Otherwise, it returns N ONE .Proof.

1, then TCS ( T , S , k ) makes no recursive calls. Thus, it returns a sequence in line 11or N ONE in line 5 or 13. (Note that TCS ( T , S , k ) cannot reach line 20 without making a recursive call, asthis is only possible if | C | = { x , y } of some tree in T (cid:48) has x , y both forbidden, and thesecases are covered by lines 11 and 5 respectively.) By Proposition 8, if S is a snapshot of S at the start ofthe invocation TCS ( T , S , k ) and S is a snapshot of S in line 3, then ( T , S ) has a solution of weight atmost k if and only if ( T , S ) has a solution of weight at most k , and any optimal solution of ( T , S ) also isan optimal solution of ( T , S ) .If TCS ( T , S , k ) returns N ONE in line 5, then T / S has a cherry { x , y } with both x and y forbiddenwith respect to S . Any solution S ◦ S (cid:48) of ( T , S ) must include the pair ( x , y ) or ( y , x ) in S (cid:48) becauseotherwise the tree in T / S that has { x , y } as a cherry is not reduced to a single leaf by S (cid:48) . Since both x and y are forbidden with respect to S , there is no such extension S ◦ S (cid:48) of S that is tree-child. Thus, ( T , S ) has no solution, and neither does ( T , S ) . It is therefore correct to return N ONE .If TCS ( T , S , k ) returns S ◦ 〈 ( x , − ) 〉 in line 11, then observe that S is a partial tree-child cherrypicking sequence. Indeed, by the assumption of the proposition, S is a partial tree-child cherry pickingsequence. For every pair ( x , y ) added to S in line 2, y is not forbidden with respect to S , so S ◦ 〈 ( x , y ) 〉 is also tree-child. By applying this argument inductively, we conclude that S is tree-child.Since TCS ( T , S , k ) returns S ◦ 〈 ( x , − ) 〉 in line 11 only if | C | = S reduces each tree in T to asingle leaf. Since S is tree-child, this is the same leaf x for every tree T ∈ T . Thus, S ◦ 〈 ( x , − ) 〉 isa solution of ( T , S ) . Since every solution S ◦ S (cid:48) of ( T , S ) must include at least one pair ( z , − ) in S (cid:48) , S ◦ 〈 ( x , − ) 〉 is an optimal solution of ( T , S ) and, therefore, also of ( T , S ) . Finally, by Lemma 13, | S | − | X | + n (cid:48) ( T , S ) = k (cid:48) ( T , S ) ≤ k ; n (cid:48) ( T , S ) = T / S has x as its only leaf. Thus, | S | − | X | < k and | S ◦ 〈 ( x , − ) 〉| − | X | ≤ k , that is, ( T , S ) and ( T , S ) both havesolutions of weight at most k and returning S ◦ 〈 ( x , − ) 〉 is correct.Finally, if TCS ( T , S , k ) returns N ONE in line 13, then | C | > k or C (cid:54) = (cid:59) and k (cid:48) ( T , S , k ) ≥ k .16f | C | > k , then T / S has more than 4 k unique cherries. Since T / S has no trivial cherries,Proposition 9 shows that ( T , S ) has no solution of weight at most k , and neither does ( T , S ) . Thus,returning N ONE is correct.If C (cid:54) = (cid:59) and k (cid:48) ( T , S ) ≥ k , then observe that { x , y } is a non-trivial cherry of T / S for every pair ( x , y ) ∈ C . Lemma 12 shows that k (cid:48) ( T , S ◦ 〈 ( x , y ) 〉 ) = k (cid:48) ( T , S ) + > k for all ( x , y ) ∈ C . By Lemma 15,this shows that ( T , S ◦ 〈 ( x , y ) 〉 ) has no solution of weight at most k for any ( x , y ) ∈ C . By Lemma 16,this implies that ( T , S ) has no solution of weight at most k , and neither does ( T , S ) . Thus, returningN ONE is correct. This ﬁnishes the proof that every invocation TCS ( T , S , k ) that makes no recursive callsgives a correct answer.Next consider an invocation TCS ( T , S , k ) that does make recursive calls. Then C (cid:54) = (cid:59) . By Lemma 16, ( T , S ) (and thus ( T , S ) ) has a solution of weight at most k if and only if there exists a pair ( x , y ) ∈ C such that ( T , S ◦ 〈 ( x , y ) 〉 ) has a solution of weight at most k . Moreover, if such a pair exists, then onesuch pair has the property that any optimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) also is an optimal solution of ( T , S ) and, thus, of ( T , S ) .If there exists a pair ( x , y ) such that ( T , S ◦ 〈 ( x , y ) 〉 ) has a solution of weight at most k , then choose ( x , y ) so that any optimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) also is an optimal solution of ( T , S ) . By theinduction hypothesis, the invocation TCS ( T , S ◦ 〈 ( x , y ) 〉 , k ) in line 17 returns an optimal solution S (cid:48) of ( T , S ◦ 〈 ( x , y ) 〉 ) . The solution S opt of ( T , S ) returned in line 20 is no longer than S (cid:48) . Since S opt is asolution of some instance ( T , S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 ) with ( x (cid:48) , y (cid:48) ) ∈ C , it is a solution of ( T , S ) and is thus anoptimal solution of ( T , S ) and ( T , S ) . Thus, the algorithm produces the correct answer.If there is no pair ( x , y ) ∈ C such that ( T , S ◦ 〈 ( x , y ) 〉 ) has a solution of weight at most k , thenall recursive calls made in line 17 of the invocation TCS ( T , S , k ) return N ONE . Thus, TCS ( T , S , k ) alsoreturns N ONE . Since Lemma 16 shows that ( T , S ) has no solution of weight at most k in this case, this iscorrect. In this section, we discuss a method used in our implementation of procedure TCS to improve its runningtime. We prove that it preserves the correctness of the algorithm, but we do not know whether it provablyimproves the algorithm’s running time. In this sense, it is a heuristic.The intuition behind rendundant branch elimination is the following: Suppose that T / 〈 ( x , y ) , ( z , w ) 〉 and T / 〈 ( z , w ) , ( x , y ) 〉 result in the same set of trees. (This can easily happen, for example, if x , y , z , w are all distinct.) Then the branch of the algorithm that starts by applying the sequence 〈 ( x , y ) , ( z , w ) 〉 ﬁnds a solution if and only if the branch that starts by applying the sequence 〈 ( z , w ) , ( x , y ) 〉 does. Sothe algorithm does not need to explore this second branch; it is redundant, and redundant branchelimination ensures that the algorithm does not make this recursive call.Procedure TCS2 below is a modiﬁed version of procedure TCS that uses redundant branch elimination.The only difference between procedures TCS and TCS2 is that TCS2 maintains a set R of redundantpairs (with R set to (cid:59) in the top-level invocation TCS2 ( T , 〈〉 , k , (cid:59) ) ) and ignores extensions S ◦ 〈 ( x , y ) 〉 ofthe current sequence S such that ( x , y ) ∈ R . If { x , y } is a trivial cherry, this means that the invocationTCS2 ( T , S , k , R ) returns N ONE . If { x , y } is a non-trivial cherry, then TCS2 ( T , S , k , R ) does not make therecursive call TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R ) . Note that R does not contain all redundant pairs for S , onlya subset for which we prove below that they can safely be ignored based on the recursive calls thealgorithm has made so far.Procedure TCS2 calls a procedure U PDATE

R in lines 3 and 22. Given a partial tree-chlid cherry pickingsequence S , a set of pairs R that are redundant for S , and a pair ( x , y ) , U PDATE R ( T , S , ( x , y ) , R ) returnsthe subset R (cid:48) ⊆ R containing all pairs that are redundant also for S ◦ 〈 ( x , y ) 〉 .17he following deﬁnition formalizes the concept of a redundant pair. Deﬁnition 1.

Let T be a set of X -trees, S a tree-child sequence, and ( x , y ) ∈ X × X . Let count ( x , y , T / S ) be the number of trees in T / S that have { x , y } as a cherry. An extension S ◦ S (cid:48) of S is dominated byS ◦ 〈 ( x , y ) 〉 if there exists an index j > • ( x , y ) is the j th element of S (cid:48) ; • count ( x , y , T / S ) = count ( x , y , T / ( S ◦ S (cid:48) j − )) ; and • for all ( x (cid:48) , y (cid:48) ) ∈ S (cid:48) j − , y (cid:48) (cid:54) = x and { x (cid:48) , y (cid:48) } (cid:54) = { x , y } .If a sequence S ◦ S (cid:48) ◦ 〈 ( x , y ) 〉 is dominated by S ◦ 〈 ( x , y ) 〉 , we say that ( x , y ) is a redundant pair for S ◦ S (cid:48) . Lemma 18.

If a sequence S ◦ S (cid:48) is dominated by S ◦ 〈 ( x , y ) 〉 , ( x , y ) is the jth pair in S (cid:48) , and ( x , y ) / ∈ S (cid:48) j − ,then count ( x , y , T / ( S ◦ S (cid:48) i )) = count ( x , y , T / ( S ◦ S (cid:48) i − )) for all i ∈ [ j − ] .Proof. Let T (cid:48) = T / S and let ( x (cid:48) i , y (cid:48) i ) be the i th pair in S (cid:48) , for some i ∈ [ j − ] . Since { x (cid:48) i , y (cid:48) i } (cid:54) = { x , y } ,the pair ( x (cid:48) i , y (cid:48) i ) does not eliminate the cherry { x , y } from any tree in T (cid:48) / S (cid:48) i − that contains thischerry, so count ( x , y , T (cid:48) / S (cid:48) i ) ≥ count ( x , y , T (cid:48) / S (cid:48) i − ) . Since ( x , y ) ∈ S (cid:48) j , Observation 19 shows that count ( x , y , T (cid:48) ) = count ( x , y , T (cid:48) / S (cid:48) j − ) . Thus, if count ( x , y , T (cid:48) / S (cid:48) i ) > count ( x , y , T (cid:48) / S (cid:48) i − ) , then therealso exists an index i (cid:48) ∈ [ j − ] such that count ( x , y , T (cid:48) / S (cid:48) i (cid:48) ) < count ( x , y , T (cid:48) / S (cid:48) i (cid:48) − ) , a contradiction.This proves that count ( x , y , T (cid:48) / S (cid:48) i ) = count ( x , y , T (cid:48) / S (cid:48) i − ) for all i ∈ [ j − ] .The next observation follows immediately from Deﬁnition 1 and Lemma 18. Observation 19.

If a sequence S ◦ S (cid:48) is dominated by S ◦ 〈 ( x , y ) 〉 , then so is any extension of S ◦ S (cid:48) andany preﬁx S ◦ S (cid:48)(cid:48) ⊆ S ◦ S (cid:48) such that ( x , y ) ∈ S (cid:48)(cid:48) . Lemma 20.

Let ( x , y ) ∈ X × X , and S ◦ S ◦ S ◦ S a cherry picking sequence. If S ◦ 〈 ( x , y ) 〉 dominatesS ◦ S ◦ S ◦ 〈 ( x , y ) 〉 and S ◦ S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) , for some sequence S (cid:48) , thenS ◦ 〈 ( x , y ) 〉 also dominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) , for any sequence S (cid:48)(cid:48) .Proof. First assume that | S | > | S | >

0, and ( x , y ) / ∈ S ◦ S ◦ S . Then ( x , y ) is the j th element of S ◦ S ◦ S ◦ ( x , y ) ◦ S (cid:48)(cid:48) , for j = | S ◦ S ◦ S | + >

1. Since S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 and ( x , y ) / ∈ S ◦ S , we have y (cid:48) (cid:54) = x and { x , y } (cid:54) = { x (cid:48) , y (cid:48) } for every pair ( x (cid:48) , y (cid:48) ) ∈ S ◦ S and Lemma 18shows that count ( x , y , T / S ) = count ( x , y , T / ( S ◦ S )) = count ( x , y , T / ( S ◦ S ◦ S )) . Similarly, since S ◦ S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) and ( x , y ) / ∈ S ◦ S , we have y (cid:48) (cid:54) = x and { x (cid:48) , y (cid:48) } forevery pair ( x (cid:48) , y (cid:48) ) ∈ S ◦ S and count ( x , y , T / ( S ◦ S )) = count ( x , y , T / ( S ◦ S ◦ S ◦ S ) . Together, these twoobservations imply that count ( x , y , T / S ) = count ( x , y , T / ( S ◦ S ◦ S ◦ S )) and y (cid:48) (cid:54) = x and { x (cid:48) , y (cid:48) } (cid:54) = { x , y } for every pair ( x (cid:48) , y (cid:48) ) ∈ S ◦ S ◦ S . Thus, S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) .If | S | =

0, then S ◦ 〈 ( x , y ) 〉 = S ◦ S ◦ 〈 ( x , y ) 〉 and it follows immediately that S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦〈 ( x , y ) 〉◦ S (cid:48) . By Observations 19, this implies that S ◦〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦〈 ( x , y ) 〉 and thus also S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) .If | S | =

0, then S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 = S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 , so it follows immediately that S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 . By Observation 19, this implies that S ◦ 〈 ( x , y ) 〉 alsodominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) .If ( x , y ) ∈ S ◦ S , then the fact that S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 and Observation 19imply that it also dominates S ◦ S ◦ S and thus also S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) .Finally, if ( x , y ) ∈ S , then consider the longest preﬁx S (cid:48) ⊆ S such that ( x , y ) / ∈ S (cid:48) . Then, byObservation 19, S ◦ S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S (cid:48) ◦ 〈 ( x , y ) 〉 . As shown so far, this impliesthat S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S (cid:48) ◦ 〈 ( x , y ) 〉 . Since S (cid:48) ◦ 〈 ( x , y ) 〉 is a preﬁx of S and, thus, of S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) , Observation 19 now shows that S ◦ 〈 ( x , y ) 〉 dominates S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) .18 rocedure TCS2 ( T , S , k , R ) Input:

A collection of phylogenetic trees T , a partial tree-child cherry picking sequence S , aninteger k , and a set R of redundant pairs for S Output:

An optimal solution of ( T , S ) if ( T , S ) has a solution of weight at most k and there do notexist a proper preﬁx S p ⊂ S and a pair ( x , y ) ∈ R such that S p ◦ 〈 ( x , y ) 〉 dominates someoptimal solution of ( T , S ) . N ONE if ( T , S ) has no solution of weight at most k . In anyother case, the output may be N ONE or a (possibly suboptimal) solution of ( T , S ) . while there exists a trivial cherry { x , y } of T / S with y not forbidden with respect to S do if ( x , y ) / ∈ R then R ← U PDATE R ( T , S , ( x , y ) , R ) S ← S ◦ 〈 ( x , y ) 〉 else Return N

ONE T (cid:48) ← T / S if T (cid:48) contains a cherry { x , y } with x , y both forbidden with respect to S then return N ONE else n (cid:48) ← |{ x ∈ X : x is a leaf of a tree in T (cid:48) }| k (cid:48) ← | S | − | X | + n (cid:48) C ← { ( x , y ) | { x , y } is a cherry of some tree in T (cid:48) } if | C | = then return S ◦ 〈 ( x , − ) 〉 , where x is the last remaining leaf in all trees else if | C | > k or k (cid:48) ≥ k then return N ONE else S opt ← N ONE R (cid:48) ← R foreach ( x , y ) ∈ C \ R with y not forbidden with respect to S do R (cid:48)(cid:48) ← U PDATE R ( T , S , ( x , y ) , R (cid:48) ) S temp ← TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) if w ( S temp ) < w ( S opt ) then S opt ← S temp R (cid:48) ← R (cid:48) ∪ { ( x , y ) } return S opt Procedure U PDATE R ( T , S , ( x , y ) , R ) Input:

A collection of phylogenetic X -trees T , a partial tree-child cherry picking sequence S , a pair ( x , y ) ∈ X × X , a set R of redundant pairs for S Output:

A subset R (cid:48) ⊆ R of redundant pairs for S ◦ 〈 ( x , y ) 〉 return { ( x (cid:48) , y (cid:48) ) ∈ R | x (cid:48) (cid:54) = y and count ( x (cid:48) , y (cid:48) , T / ( S ◦ 〈 ( x , y ) 〉 )) = count ( x (cid:48) , y (cid:48) , T / S ) } Proposition 21.

Let T be a set of X -trees, and S ◦ S (cid:48) a tree-child cherry picking sequence for T . Supposethat S ◦ S (cid:48) is dominated by S ◦ 〈 ( x , y ) 〉 , for some pair ( x , y ) ∈ X × X . Then there exists a tree-child cherrypicking sequence S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) for T with w ( S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) ) ≤ w ( S ◦ S (cid:48) ) . In other words: If some branch of the algorithm already looks for an optimal solution of ( T , S ◦〈 ( x , y ) 〉 ) ,then there is no need to also look for an optimal solution of ( T , S ◦ S (cid:48)(cid:48)(cid:48) ) , for any sequence S ◦ S (cid:48)(cid:48)(cid:48) that isdominated by S ◦ 〈 ( x , y ) 〉 . Proof.

We can write S (cid:48) = S (cid:48)(cid:48) ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48)(cid:48) such that ( x , y ) / ∈ S (cid:48)(cid:48) . Let | S (cid:48)(cid:48) | = k . For 0 ≤ i ≤ k , let S (cid:48) i = S (cid:48)(cid:48) i ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) i + k ◦ S (cid:48)(cid:48)(cid:48) . We prove by induction on k − i that S ◦ S (cid:48) i is a tree-child cherry pickingsequence for T , for all 0 ≤ i ≤ k . Since S (cid:48) = 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) ◦ S (cid:48)(cid:48)(cid:48) and w ( S ◦ S (cid:48) ) = w ( S ◦ S (cid:48) ) , this proves theproposition. S ◦ S (cid:48) k is clearly a tree-child cherry picking sequence for T because S (cid:48) k = S (cid:48) . So assume that i < k andthat S ◦ S (cid:48) i + is a tree-child cherry picking sequence for T .Let ( x (cid:48) , y (cid:48) ) be the ( i + ) st pair in S (cid:48)(cid:48) , that is, ( x (cid:48) , y (cid:48) ) is the predecessor pair of ( x , y ) in S (cid:48) i + . Since S ◦ 〈 ( x , y ) 〉 dominates S ◦ S (cid:48) , the choice of S (cid:48)(cid:48) implies that y (cid:48) (cid:54) = x and, by Lemma 18, count ( x , y , T / ( S ◦ S (cid:48)(cid:48) i )) = count ( x , y , T / ( S ◦ S (cid:48)(cid:48) i + )) . Since S ◦ S (cid:48) i + is tree-child, the former implies that S ◦ S (cid:48) i is tree-child.We use the latter in the following proof that S ◦ S (cid:48) i is a cherry picking sequence for T .Let T ∈ T be an arbitrary tree, let T (cid:48) = T / ( S ◦ S (cid:48)(cid:48) i ) , let T a = T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) , ( x , y ) 〉 , and let T b = T (cid:48) / 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 . We show that T b ⊆ T a and that T a \ T b ⊆ { x (cid:48) } . Thus, since S ◦ S (cid:48) i + is a tree-childcherry picking sequence and, therefore, x (cid:48) (cid:54) = y (cid:48)(cid:48) for all ( x (cid:48)(cid:48) , y (cid:48)(cid:48) ) ∈ S (cid:48)(cid:48) i + k ◦ S (cid:48)(cid:48)(cid:48) , Lemma 7 shows that T / ( S ◦ S (cid:48) i ) = T b / ( S (cid:48)(cid:48) i + k ◦ S (cid:48)(cid:48)(cid:48) ) ⊆ T a / ( S (cid:48)(cid:48) i + k ◦ S (cid:48)(cid:48)(cid:48) ) = T / ( S ◦ S (cid:48) i + ) . Since T / ( S ◦ S (cid:48) i + ) has a single leaf and T / ( S ◦ S (cid:48) i ) has at least one leaf, this shows that T / ( S ◦ S (cid:48) i ) = T / ( S ◦ S (cid:48) i + ) , that is, S ◦ S (cid:48) i is a cherry pickingsequence for T . Since this is true for every tree T ∈ T , S ◦ S (cid:48) i is a cherry picking sequence for T .It remains to show that T b ⊆ T a and T a \ T b ⊆ { x (cid:48) } . Since count ( x , y , T / ( S ◦ S (cid:48)(cid:48) i )) = count ( x , y , T / ( S ◦ S (cid:48)(cid:48) i + )) , either both T (cid:48) = T / ( S ◦ S (cid:48)(cid:48) i ) and T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) 〉 = T / ( S ◦ S (cid:48)(cid:48) i + ) contain { x , y } as a cherry orneither of them does.If neither T (cid:48) nor T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) 〉 contains { x , y } as a cherry, then T a = T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) , ( x , y ) 〉 = T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) 〉 = T (cid:48) / 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 = T b , so T b ⊆ T a and T a \ T b = (cid:59) ⊆ { x (cid:48) } .If both T (cid:48) and T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) 〉 contain { x , y } as a cherry, then observe that T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) 〉 does notcontain { x (cid:48) , y (cid:48) } as a cherry. If T (cid:48) also does not contain { x (cid:48) , y (cid:48) } as a cherry, then we have that T a = T (cid:48) / 〈 ( x (cid:48) , y (cid:48) ) , ( x , y ) 〉 = T (cid:48) / 〈 ( x , y ) 〉 and T b = T (cid:48) / 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 = T a / 〈 ( x (cid:48) , y (cid:48) ) 〉 . Since applying the pair ( x (cid:48) , y (cid:48) ) to T a can only remove the leaf x (cid:48) , this shows that T a ⊆ T b and T a \ T b ⊆ { x (cid:48) } .The ﬁnal case is when T (cid:48) contains both { x , y } and { x (cid:48) , y (cid:48) } as cherries. Since { x (cid:48) , y (cid:48) } (cid:54) = { x , y } , T (cid:48) must contain distinct vertices p and q such that p is the common parent of x and y , and q is thecommon parent of x (cid:48) and y (cid:48) . It follows that T b and T a can both be derived from T (cid:48) by deleting x and x (cid:48) and suppressing p and q . Thus, T a = T b , that is, once again, T b ⊆ T a and T a \ T b = (cid:59) ⊆ { x (cid:48) } .While our algorithm uses redundant pairs to ignore some dominated sequences in its search fora shortest tree-child cherry picking sequence, it cannot ignore all dominated sequences. Indeed, inmany cases, every possible tree-child cherry picking sequence for T is dominated by another sequence.Consider, for example, a binary tree on X = { a , b , c , d } with cherries { a , b } and { c , d } . Any sequencefor this tree must begin with ( a , b ) , ( b , a ) , ( c , d ) or ( d , c ) . If the ﬁrst pair is ( a , b ) , then the second pairmust be either ( c , d ) or ( d , c ) . But the sequence 〈 ( a , b ) , ( c , d ) 〉 is dominated by 〈 ( c , d ) 〉 , and similarly 〈 ( a , b ) , ( d , c ) 〉 is dominated by 〈 ( d , c ) 〉 . A similar argument applies to any other sequence we might try.Thus, if we did ignore all redundant pairs for every sequence, the algorithm would not ﬁnd any cherry20icking sequence for T . This is the reason why procedure TCS2 explicitly keeps a set R of redundantpairs that are safe to ignore; it ignores a sequence S ◦ 〈 ( x , y ) 〉 only if ( x , y ) ∈ R .Following the terminology of Linz and Semple [ ] , we call a pair ( x j , y j ) in a partial cherry pickingsequence S = 〈 ( x , y ) , . . . , ( x r , y r ) 〉 essential if T / S j (cid:54) = T / S j − , that is, { x j , y j } is a cherry of at leastone tree in T / S j − and, therefore, applying the pair ( x j , y j ) to T / S j − removes x j from at least onetree in T / S j − .Our correctness proof of procedure TCS2 is divided into two parts: First we prove that if, for a giveninvocation TCS2 ( T , S , k , R ) , every pair in S is essential and every pair in R is redundant for S , then(i) This is true at any time during the execution of of this invocation (even though the invocation maymodify S and R ) and(ii) For every recursive call TCS2 ( T , S (cid:48)(cid:48) , k , R (cid:48)(cid:48) ) this invocation makes, every pair in S (cid:48)(cid:48) is essential andevery pair in R (cid:48)(cid:48) is redundant for S (cid:48)(cid:48) .Since the top-level invocation TCS2 ( T , 〈〉 , k , (cid:59) ) satisﬁes S = 〈〉 and R = (cid:59) , that is, all pairs in S are triviallyessential and all pairs in R are trivially redundant for S , an inductive argument then implies that everypair in S is essential and every pair in R is redundant for S at any time during the execution of anyinvocation TCS2 ( T , S , k , R ) . The second part of the proof shows that, under this condition, the invocationTCS2 ( T , 〈〉 , k , (cid:59) ) returns a shortest tree-child cherry picking sequence for T if this sequence has weight atmost k ; otherwise, it returns N ONE .The following lemma shows that replacing R with the set returned by U PDATE R ( T , S , ( x , y ) , R ) when-ever we append a pair ( x , y ) to a sequence S maintains the property that every pair in R is redundantfor S . Lemma 22.

Let S ◦ 〈 ( x , y ) 〉 be a partial tree-child cherry picking sequence whose pairs are all essential,and let R ⊆ X × X . For every pair ( x (cid:48) , y (cid:48) ) in the subset R (cid:48) ⊆ R returned by U PDATE R ( T , S , ( x , y ) , R ) , thesequence S ◦ 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 is dominated by S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 .Proof. By the deﬁnition of R (cid:48) in line 1 of procedure U PDATE

R, we have x (cid:48) (cid:54) = y and count ( x (cid:48) , y (cid:48) , T / S ) = count ( x (cid:48) , y (cid:48) , T / ( S ◦ 〈 ( x , y ) 〉 )) for all ( x (cid:48) , y (cid:48) ) ∈ R (cid:48) . Observe also that { x , y } (cid:54) = { x (cid:48) , y (cid:48) } . Indeed, sinceevery pair in S ◦ 〈 ( x , y ) 〉 is essential, there exists a tree in T / S that has { x , y } as a cherry, while thereis no tree in T / ( S ◦ 〈 ( x , y ) 〉 ) that has { x , y } as a cherry. Thus, if { x , y } = { x (cid:48) , y (cid:48) } , we would have count ( x (cid:48) , y (cid:48) , T / S ) (cid:54) = count ( x (cid:48) , y (cid:48) , T / ( S ◦ 〈 ( x , y ) 〉 )) , so ( x (cid:48) , y (cid:48) ) / ∈ R (cid:48) . Since ( x (cid:48) , y (cid:48) ) is not the ﬁrst pair in 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 , the sequence S ◦ 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 is therefore dominated by S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 .We are now ready to prove Claims (i) and (ii) above. Since each invocation TCS2 ( T , S , k , R ) maymodify S and R , we use S and R to refer to the values of S and R passed as arguments to this invocation,and S and R to refer to the current values of S and R at any point during the execution of TCS2 ( T , S , k , R ) . Lemma 23.

Consider any invocation

TCS2 ( T , S , k , R ) such that every pair in S is essential and everypair in R is redundant for S . Then(i) At any time during the execution of this invocation, every pair in S is essential and there exists a properpreﬁx S p ⊂ S for each pair ( x (cid:48) , y (cid:48) ) ∈ R such that S p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 ; and(ii) For every recursive call TCS2 ( T , S (cid:48)(cid:48) , k , R (cid:48)(cid:48) ) this invocation makes, every pair in S (cid:48)(cid:48) is essential and everypair in R (cid:48)(cid:48) is redundant for S (cid:48)(cid:48) .Proof. (i) Initially, we have S = S and R = R . Thus, since every pair in S is essential and every pairin R is redundant for S , (i) holds for this choice of S and R . Next we prove that any modiﬁcation theinvocation makes to S and R maintains (i). Observe that TCS2 ( T , S , k , R ) modiﬁes S and R only inlines 3 and 4. Consider one iteration of the loop in lines 1–6 and let ( x , y ) be the pair added to S in this21teration. Since { x , y } is a trivial cherry of T / S in this case and every pair in S essential, every pair in S ◦〈 ( x , y ) 〉 is essential. By Lemma 22, every pair ( x (cid:48) , y (cid:48) ) in the set R (cid:48) returned by U PDATE R ( T , S , ( x , y ) , R ) in line 3 has the property that S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 . Since R (cid:48) ⊆ R , there exists aproper preﬁx S p ⊂ S such that S p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 . Thus, by Lemma 20, S p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 also dominates S ◦ 〈 ( x , y ) , ( x (cid:48) , y (cid:48) ) 〉 (where S and S ◦ S in Lemma 20 correspond to S p and S respectively, S = 〈〉 , and S = 〈 ( x , y ) 〉 ) . Therefore, replacing S with S ◦ 〈 ( x , y ) 〉 , and R with the set returned byU PDATE R ( T , S , ( x , y ) , R ) maintains that every pair in S is essential and, for every every pair ( x (cid:48) , y (cid:48) ) ∈ R ,there exists a proper preﬁx S p ⊂ S such that S p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 .(ii) Consider any recursive call TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) the invocation TCS2 ( T , S , k , R ) makes inline 23. By (i), all pairs in S are essential. Since ( x , y ) ∈ C , { x , y } is a cherry of T / S . Thus, every pair in S ◦ 〈 ( x , y ) 〉 is essential. By Lemma 22, the set R (cid:48)(cid:48) returned by U PDATE R ( T , S , ( x , y ) , R (cid:48) ) in line 22 containsonly pairs that are redundant for S ◦ 〈 ( x , y ) 〉 . Thus, (ii) holds.The following corollary follows by applying Lemma 23 inductively after observing that S = 〈〉 and R = (cid:59) for the top-level invocation TCS2 ( T , 〈〉 , k , (cid:59) ) . Corollary 24.

At any point during the execution of an invocation

TCS2 ( T , S , k , R ) , there exists a properpreﬁx S p ⊂ S for each pair ( x (cid:48) , y (cid:48) ) ∈ R such that S p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 . The next lemma states the fairly weak correctness guarantee each invocation TCS2 ( T , S , k , R ) provides. As we show below, in Corollary 26, this lemma implies that the invocation TCS2 ( T , 〈〉 , k , (cid:59) ) returns a shortest tree-child cherry picking sequence for T if there is such a sequence of weight at most k . Lemma 25.

Consider any invocation

TCS2 ( T , S , k , R ) the algorithm makes. If ( T , S ) has a solution ofweight at most k, then either TCS2 ( T , S , k , R ) returns an optimal solution of ( T , S ) or there exist anextension S ◦ S (cid:48) of S , a pair ( x , y ) ∈ R , and a proper preﬁx S p ⊂ S such that S p ◦ 〈 ( x , y ) 〉 dominatesS ◦ S (cid:48) .Proof. Since no invocation TCS2 ( T , S , k , R ) makes more recursive calls than the corresponding invocationTCS ( T , S , k ) , Proposition 14 shows that each invocation TCS2 ( T , S , k , R ) has a ﬁnite number of descendantinvocations, which we denote by | TCS2 ( T , S , k , R ) | . Thus, if the lemma does not hold, we can choosean invocation TCS2 ( T , S , k , R ) that violates the lemma and has the minimum number of descendantinvocations | TCS2 ( T , S , k , R ) | among all such invocations.Since TCS2 ( T , S , k , R ) fails to ﬁnd an optimal solution of ( T , S ) , TCS2 ( T , S , k , R ) returns N ONE in line 6, 9, 17 or 27, or it returns a suboptimal solution of ( T , S ) in line 15 or 27. Next we considerthese different cases: TCS2 ( T , S , k , R ) returns N ONE in line 9 or 17:

In this case, TCS ( T , S , k ) would have returned N ONE in line 5 or 13. Thus, by Proposition 17, ( T , S ) has no solution of weight at most k , a contradiction. TCS2 ( T , S , k , R ) returns a sequence S ◦ S (cid:48) in line 15: In this case, TCS ( T , S , k ) would have re-turned the same sequence in line 11. Thus, by Proposition 17, S ◦ S (cid:48) is an optimal solutionof ( T , S ) , a contradiction. TCS2 ( T , S , k , R ) returns N ONE in line 6:

In this case, consider the contents of S and R immediatelybefore TCS2 ( T , S , k , R ) returns. There exists a trivial cherry { x , y } of T / S such that y is notforbidden with respect to S and ( x , y ) ∈ R . Since ( T , S ) has a solution of weight at most k ,Proposition 8 shows that ( T , S ◦ 〈 ( x , y ) 〉 ) also has a solution of weight at most k and any optimalsolution of ( T , S ◦ 〈 ( x , y ) 〉 ) is also an optimal solution of ( T , S ) . By Corollary 24, there exists aproper preﬁx S p ⊆ S such that S p ◦ 〈 ( x , y ) 〉 dominates S ◦ 〈 ( x , y ) 〉 and, thus, by Observation 19, S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) , for any optimal solution S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) of ( T , S ◦ 〈 ( x , y ) 〉 ) , a contradiction.22 CS2 ( T , S , k , R ) returns N ONE or a suboptimal solution in line 27:

In this case, the correspondinginvocation TCS ( T , S , k ) would have reached line 20. Since ( T , S ) has a solution of weight atmost k , Proposition 17 shows that TCS ( T , S , k ) would have returned an optimal solution S ◦ S (cid:48) of ( T , S ) . This solution satisﬁes S ◦ S (cid:48) = S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) , for some pair ( x , y ) ∈ C , referring tothe state of S in line 3 of TCS ( T , S , k ) . This shows that there exists a pair ( x , y ) ∈ C such that ( T , S ◦ 〈 ( x , y ) 〉 ) has a solution of weight at most k and any optimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) isalso an optimal solution of ( T , S ) .Now consider the subset C opt ⊆ C of all pairs ( x , y ) such that ( T , S ◦ 〈 ( x , y ) 〉 ) has a solution ofweight at most k and any optimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) is an optimal solution of ( T , S ) .Order the pairs in C opt so that the pairs in C opt \ R precede the pairs in C opt ∩ R , and the pairs in C opt \ R are arranged in the order in which TCS2 ( T , S , k , R ) makes the corresponding recursivecalls TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , R (cid:48)(cid:48) ) . If for a pair ( x , y ) ∈ C opt , TCS2 ( T , S , k , R ) makes the recursivecall TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , R (cid:48)(cid:48) ) and this recursive call returns an optimal solution S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) of ( T , S ◦ 〈 ( x , y )) , then TCS2 ( T , S , k , R ) returns a solution S ◦ S (cid:48) of ( T , S ) that is no longer than S ◦〈 ( x , y ) 〉◦ S (cid:48)(cid:48) . By the choice of C opt , S ◦ S (cid:48) is thus an optimal solution of ( T , S ) . Since we assumethat TCS2 ( T , S , k , R ) does not return an optimal solution of ( T , S ) , it follows that for each pair ( x , y ) ∈ C opt , either TCS2 ( T , S , k , R ) does not make the recursive call TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) (that is, ( x , y ) ∈ C opt ∩ R ) or it makes this recursive call (that is, ( x , y ) ∈ C opt \ R ) but the recursivecall returns N ONE or a suboptimal solution of ( T , S ◦ 〈 ( x , y ) 〉 ) .Now let ( x , y ) be the ﬁrst pair in C opt according to the ordering deﬁned above. • If TCS2 ( T , S , k , R ) does not make the recursive call TCS2 ( T , S ◦〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) , then ( x , y ) ∈ R . Thus, by Corollary 24, there exists a proper preﬁx S p ⊂ S such that S p ◦ 〈 ( x , y ) 〉 dominates S ◦ 〈 ( x , y ) 〉 . Since S ◦ 〈 ( x , y ) 〉 is an extension of S , this is a contradiction. • If TCS2 ( T , S , k , R ) does make the recursive call TCS2 ( T , S ◦〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) , then TCS2 ( T , S ◦〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) does not return an optimal solution of ( T , S ◦〈 ( x , y ) 〉 ) . Thus, since | TCS2 ( T , S ◦〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) | < | TCS2 ( T , S , k , R ) | , the choice of TCS2 ( T , S , k , R ) implies that there existan extension S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) of S ◦ 〈 ( x , y ) 〉 , a preﬁx S p ⊆ S , and a pair ( x (cid:48) , y (cid:48) ) ∈ R (cid:48)(cid:48) such that S p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) . Now we distinguish two cases. – If ( x (cid:48) , y (cid:48) ) ∈ R , we prove that there exists a proper preﬁx S (cid:48) p ⊂ S such that S (cid:48) p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) . Since S ⊆ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) and R ⊆ R , this implies thatTCS2 ( T , S , k , R ) does not violate the lemma, a contradiction. If S p ⊂ S , we can set S (cid:48) p = S p . So assume that S ⊆ S p ⊆ S . Since ( x (cid:48) , y (cid:48) ) ∈ R , Cororally 24 shows thatthere exists a proper preﬁx S (cid:48) p ⊂ S such that S (cid:48) p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 . ByLemma 20, S (cid:48) p ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 also dominates S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) (where ( x , y ) in Lemma 20corresponds to ( x (cid:48) , y (cid:48) ) , and S , S ◦ S , S ◦ S ◦ S , S ◦ S ◦ S ◦ S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48)(cid:48) correspondto S (cid:48) p , S , S p , S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) respectively). – If ( x (cid:48) , y (cid:48) ) / ∈ R , then ( x (cid:48) , y (cid:48) ) ∈ R (cid:48) \ R , which implies that ( x (cid:48) , y (cid:48) ) ∈ C \ R and, therefore,TCS2 ( T , S , k , R ) makes a recursive call TCS2 ( T , S ◦〈 ( x (cid:48) , y (cid:48) ) 〉 , k , R (cid:48)(cid:48) ) before the recursivecall TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R (cid:48)(cid:48) ) . Since S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 dominates S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) , Propo-sition 21 shows that there exists a solution S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 ◦ S (cid:48)(cid:48) of ( T , S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 ) thatsatisﬁes w ( S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 ◦ S (cid:48)(cid:48) ) ≤ w ( S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) ) . Since S ◦ 〈 ( x , y ) 〉 ◦ S (cid:48) is an optimalsolution of ( T , S ) , this implies that S ◦ 〈 ( x (cid:48) , y (cid:48) ) 〉 ◦ S (cid:48)(cid:48) is also an optimal solution of ( T , S ) .Thus, ( x (cid:48) , y (cid:48) ) ∈ C opt , a contradiction because ( x , y ) is the ﬁrst pair in C opt . Corollary 26.

The invocation

TCS2 ( T , 〈〉 , k , (cid:59) ) returns a shortest tree-child cherry picking sequence for T ifthere exists such a sequence of weight at most k. Otherwise, TCS2 ( T , 〈〉 , k , (cid:59) ) returns N ONE . roof. If there is no tree-child cherry picking sequence for T of weight at most k , then Proposition 17shows that the invocation TCS ( T , 〈〉 , k ) returns N ONE . Since each invocation TCS2 ( T , S , k , R ) is easilyseen to return a sequence only if TCS ( T , S , k ) returns a sequence, this implies that TCS ( T , 〈〉 , k , (cid:59) ) returnsN ONE if there is no tree-child cherry picking sequence of weight at most k .So assume that there exists a tree-child cherry picking sequence for T of weight at most k . IfTCS2 ( T , 〈〉 , k , (cid:59) ) does not return a shortest tree-child cherry picking sequence for T , then Lemma 25states that there exists an extension S of 〈〉 , proper preﬁx S p ⊆ 〈〉 , and a pair ( x , y ) ∈ (cid:59) such that S p ◦ 〈 ( x , y ) 〉 dominates S . However, neither S p nor the pair ( x , y ) can exist. Thus, TCS2 ( T , 〈〉 , k , (cid:59) ) returns a shortest tree-child cherry picking sequence for T .As already observed in the proof of Lemma 25, each invocation TCS2 ( T , S , k , R ) makes at most as manyrecursive calls as its corresponding invocation TCS ( T , S , k ) , so the total number of recursive calls made bythe algorithm is still bounded by O (( k ) k ) . Using standard techniques, including binary search trees andinteger sorting, and a careful implementation of lines 1–6 that avoids calling U PDATE

R in each iteration,it is possible to show that the cost per recursive call remains O ( nt lg t ) , including the cost to queryand maintain R . Thus, the worst-case running time of the algorithm remains O (( k ) k nt lg t + nt lg nt ) .Since we are interested in using redundant branch elimination mainly as a heuristic improvement ofthe running time of the algorithm in practice, we do not prove this here. Note that redundant branchelimination is a heuristic only as far as improving the running time is concerned; Corollary 26 aboveshows that it preserves the algorithm’s correctness. In order to evaluate the usefulness of the algorithm presented in this paper, we implemented it and ranexperiments on synthetic and realistic inputs to answer the following questions: • How difﬁcult inputs can our algorithm handle, both in terms of the number of reticulations in thecomputed network and the number of trees in the input? • How does the running time of our algorithm compare to that of its closest competitor, H

YBRO S CALE ?The answer to this second question is that, for inputs with at least 3 trees, our algorithm runs signiﬁcantlyfaster than H

YBRO S CALE . Since H

YBRO S CALE computes optimal hybridization networks, without anyrestrictions on their structure, while our algorithm computes optimal tree-child networks, we effectivelybuy this faster running time at the price of restricting the types of outputs we can compute and,consequently, possibly missing some optimal networks that are not tree-child. This raises the followingnatural question: • For inputs for which both our algorithm and H

YBRO S CALE were able to compute a network, byhow much did the reticulation numbers of the computed networks differ?The discussion of our experimental results is divided into the following subsections: Section 5.1discusses the hardware and software environment on which we ran our experiments, as well as somehigh-level characteristics of our implementation. The complete source code, test data, and the programswe used to prepare the test data are available from https://github.com/nzeh/tree_child_code ,including detailed documentation. Section 5.2 describes the data sets used in our experiments. Sec-tion 5.3 brieﬂy discusses the tuning parameters of our implementation used throughout our experiments.Section 5.4 discusses our experimental results. 24 .1 Evaluation Environment and Some Implementation Details

Our evaluation platform was a Linux system with a quad-core Intel Xeon W3570 running at 1.7GHzand 24GB of DDR3 RAM clocked at 1333MHz. The operating system was Debian GNU / Linux 9 with a4.19.46-64 Linux kernel. Our code for computing a tree-child network was implemented in Rust version1.27.0. Hybroscale was implemented in Java, and we used Java version 1.8.0_161 to run it.Our code implements procedure TCS2, that is, it uses redundant branch optimization. It also uses anumber of additional optimizations:

Check for redundant pairs using occurrence counts:

The check for redundant pairs (pairs in R ) wasimplemented by recording for each cherry { x , y } of T / S how many trees contained the cherry { x , y } the last time an ancestor invocation made a recursive call TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R ) orTCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R ) . It is easy to verify that ( x , y ) is redundant for the current sequence S ifand only if the number of trees in T / S that contain the cherry { x , y } is the same as the number oftrees that contained the cherry { x , y } the last time a recursive call TCS2 ( T , S ◦ 〈 ( x , y ) 〉 , k , R ) wasmade. No copying of an invocation’s state for each recursive call:

The state of each invocation (current setof trees, set of trivial cherries, set of non-trivial cherries, partial tree-child cherry picking sequence,and information about the cherries and trees containing each leaf) is fairly large. To avoid theoverhead of copying this state for each recursive call, each recursive call instead modiﬁes its parentinvocation’s state without making a copy. These modiﬁcations are recorded in a log and are undonewhen the recursive call returns, thereby restoring the parent invocation’s state.

Search for the optimal k : The search for an optimal tree-child cherry picking sequence calls the proce-dure TCS2 ( T , 〈〉 , k , (cid:59) ) with increasing values of k until it reports success. This guarantees that theparameter k is no larger than the tree-child hybridization number of each input. Parallelization:

The different branches of the recursive search for an optimal tree-child cherry pickingsequence are clearly independent and can thus be assigned to different threads of a parallelimplementation of procedure TCS2. One challenge is that, especially in the presence of redundantbranch elimination, the computational costs of different branches can differ substantially.To balance the load between threads, we implemented a work sharing scheduler that allows idlethreads to send messages to busy threads to request part of their workload. In response to sucha request, the busy thread sends a branch on its recursion stack that is yet to be explored to therequesting thread. In the interest of minimizing the number of messages exchanged betweenthreads, the busy thread always shares the next branch from the bottom of its recursion stack,hopefully corresponding to a large subtree in the algorithm’s recursion.The communication protocol was implemented using light-weight spinlocks to minimize the amountof time busy threads spend on communicating with other threads.

Cluster reduction:

Cluster reduction [

3, 16 ] has been observed to be the most important optimizationin phylogenetic network construction methods for pairs of trees [ ] . While we expect clusterreduction to be less effective for more than two trees, our implementation still applies clusterreduction because it is relatively cheap and should still have a signiﬁcant impact on the algorithm’srunning time for real-world inputs.In order to complete all our experiments in a reasonable amount of time, we limited every run of ouralgorithm or of H YBRO S CALE to 60 minutes. If the algorithm did not produce a result within this timelimit, we consider this input to be unsolvable by the algorithm in the context of this evaluation.25 .2 Test Data

We used synthetic and real-world data for the performance evaluation of our algorithm.

To generate a test instance with t trees over a set of n leaves and with tree-child hybridization numberclose to k , we generated a random tree-child network N on n leaves and with k reticulations. Then weextracted a random set of t trees displayed by N . Network generation.

To generate the network N , we initialized N to be a tree with two leaves. Anetwork with n leaves and k reticulations can then be obtained by adding s r = n + k − k r = k reticulations to N . The total number of non-leaf nodes to be added is s r + k r . Thus, as long as s r > k r >

0, we added either a tree node or a reticulation.To add a tree node, we chose an existing leaf u and added two new leaves v and w with parent u . This turns u into a tree node while not affecting any existing reticulations or tree nodes. Thus, s r decreases by one while k r remains unchanged.To add a reticulation, we choose two leaves u and v ; merge v into u , making u and v the same node;and then add a new leaf w with parent u . This turns u into a reticulation while not affecting any existingreticulations or tree nodes. Thus, k r decreases by one while s r remains unchanged.In order to ensure that the network is tree-child, the two nodes u and v to be merged are chosenfrom the set M of all nodes whose parents and siblings are tree nodes. We also ensure that the networkhas no parallel edges by picking u and v so that they have different parents. Thus, if | M | = | M | = M have the same parent, then there exist no two nodes u and v that can be addedwhile keeping the network tree-child and not introducing any parallel edges. In this case, we add a newtree node. If it is possible to add a reticulation node, then we add a tree node with probability s r s r + k r anda reticulation with probability k r s r + k r .If we add a tree node, we choose the leaf u to be turned into a tree node uniformly at random fromthe current set of leaves.If we add a reticulation, we choose u and v uniformly at random from the set M . If the two chosennodes u and v have the same parent, we repeat this selection process until they do not.This random addition of tree nodes and reticulations continues until s r = k r =

0. If k r = s r >

0, we keep adding tree nodes using the procedure above until s r =

0. If s r = k r >

0, wekeep adding reticulations using the procedure above until either k r = | M | = | M | = M have the same parent. Tree generation.

We select t (or fewer) trees displayed by N by repeating the following process: Deleteone of the parent edges of each reticulation in N uniformly at random and suppress every node with onlyone child in the resulting tree. If the newly generated tree already exists within the list of trees (with thesame Newick representation) then we do not add it to the list. We maintain a count on the number oftimes this occurs. Once this count reaches 100 or if we have t trees in our list then we terminate theprocess and return the trees.Note that the set of trees we obtain using this process may have tree-child hybridization number lessthan k . First, the network generation does not guarantee that we obtain a network with k reticulationsif we stop the network generation with a value of k r > N , there may exist a tree-childnetwork with fewer reticulations than N that also displays this set of trees.26 .2.2 Real-World Data The real-world data we used in our experiments was derived from a collection of gene trees for 159,905distinct homologous gene sets found in a set of 1,173 bacterial and archaeal genomes. These gene treeswere constructed by Beiko and are described in more detail in [ ] . They were also used as a test dataset, for example, in the evaluation of a method for constructing SPR supertrees [ ] . Beiko’s data set (asalmost every real-word data set) poses two challenges for our algorithm. First, bipartitions with lowsupport in this data set were collapsed, so the input trees are multifurcating. Second, since not all genesare present in all taxa, the label sets of the input trees differ.To obtain a collection of binary trees over the same label set, we used a two-step process: First,given the desired number of leaves n as a parameter, we selected a subset of n taxa X and all treesthat contain all of these taxa. Then we restricted the selected trees to the chosen label set X , therebyobtaining a collection of multifurcating trees over this set of n taxa. Second, we resolved multifurcationsin these trees to obtain a collection of binary trees. If we had resolved multifurcations randomly, it wouldhave been very likely that any network displaying the constructed trees contains many reticulations thatresult only from inconsistent resolutions of the input trees. To avoid this, we introduced inconsistentresolutions into different input trees only if the input trees forced us to do so. This procedure is describedin more detail below and at https://github.com/nzeh/tree_child_code .We did not evaluate whether the resulting trees are biologically plausible (beyond the degree towhich every binary resolution of a well supported multifurcating tree is plausible). Our only goal wasto construct a test data set whose characteristics, in terms of number of reticulations and existence ofclusters that allow the input to be decomposed into easier inputs, resemble those of typical real-worldinputs, in order to evaluate the usefulness of our algorithm to construct phylogenetic networks fornon-trivial real-world inputs. Selection of leaf set and trees.

To extract as many trees with a given number of common leaves n , weused the following strategy: We started with an empty set of leaves X = (cid:59) and the entire set of 159,905input trees T . Then we repeated the following process n times: Let Y be the set of all unique taxa ofthe trees in T and let x ∈ Y \ X be a taxon that occurs in the maximum number of trees in T . Then weadded x to X and discarded all trees from T that did not contain x . At the end of this iterative process,we obtained a set of trees T that contained all taxa in X . As already mentioned, the next step was torestrict every tree in T to the label set X . Binary resolution.

Binary resolutions were obtained by repeating the following process until all treeswere binary: Inspect the trees in T in an arbitrary order. For each tree, inspect its multifurcations in anarbitrary order. For each multifuraction u , consider all pairs { v , w } such that v and w are children of u .For each such pair, count the number of resolved triples (triplets of the form a b | c as opposed to a | b | c )that would be introduced by resolving { v , w } (that is, by making v and w children of a new node u (cid:48) andmaking u (cid:48) a child of u ) and which are also present in at least one other tree in T .If there exists such a pair { v , w } with at least one introduced resolved triplet that exists also in someother tree in T , then resolve the pair { v , w } that maximizes the number of introduced resolved tripletsthat exist on other trees. If no such pair is found, then move on to the next multifurcation in the currenttree or to the next tree if there are no more multifurcations left to inspect in the current tree.If the above steps resolve at least one multifurcation, then start another iteration. Otherwise, pickan arbitrary multifurcation in one of the trees and a random pair of children of this multifurcation andresolve it. Then start another iteration. (This random resolution will be matched by all other trees in thenext iteration, thus forcing consistency between the trees.)27 est instances. By running the above procedure with parameter n ∈ {

10, 20, 30, 40, 50, 60, 80, 100, 150 } ,we generated tree sets with this number of leaves and with between 21 and 1,684 trees for n =

150 and n =

20, respectively. To obtain an input with a given number of leaves n and a given number of trees t ,we selected t of the trees with n leaves uniformly at random. Our implementation of procedure TCS2 accepts a number of command-line arguments, mainly tofacilitate the type of performance evaluation we conducted. The most important options are turningcluster reduction on and off, turning redundant branch elimination on or off, conﬁguring the number ofthreads across which to distribute the algorithm’s work, and controlling how frequently busy threadscheck for work requests from idle threads. More threads allow the operating system to help with loadbalancing but too many threads result in scheduling overhead. Similarly, frequent checks for workrequests from idle threads help with load balancing by ensuring that idle threads never remain idle fortoo long but increase the overhead that slows down busy threads.In preliminary experiments, we determined that we obtained the best performance using 8 threads( -p 8 ) on our system. The frequency of checks for work requests had negigible impact on the algorithm’sperformance as long as idle threads do not wait for work for too long. Throughout the experimentsdiscussed here, we made a busy thread check for work requests from idle threads every 100 iterationsthrough its main loop ( -w 100 ). Cluster reduction never hurt performance but helped substantially onmost real-world inputs, so we never turned it off. Since redundant branch elimination is a potentiallyimportant optimization of our algorithm discussed in Section 4, we dedicate a separate section todiscussing its impact on the algorihm’s performance.

Our ﬁrst experiments concerned whether redundant branch elimination helps to reduce the running timeof the algorithm in practice. To evaluate this, we ran the algorithm with redundant branch eliminationon a synthetic data set. For the runs with redundant branch elimination, we used three test inputs forevery possible combination of the following parameters: • Number of trees: t ∈ {

2, 5, 10, 15, 20, 50, 100 }• Number of reticulations: k ∈ {

2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }• Number of leaves: n ∈ {

20, 50, 100, 150, 200 } resulting in a set of 1,155 inputs. The algorithm was able to solve 1,016 of these inputs within the 1-hourtime limit. Without redundant branch elimination, the algorithm was not able to solve any syntheticinputs with k > k ≤

8, it was able to solve 658 inputswithin the time limit.Figure 4 shows the speed-up achieved by using redundant branch elimination on the 658 inputsthe algorithm was able to solve without it. As can be seen, the effect of redundant branch eliminationincreases with increasing reticulation number and, correspondingly, with increasing running time of thealgorithm, reaching a speed-up of up to 1,000 on some instances with 6 and 7 reticulations.Figure 5 shows that redundant branch elimination increases the difﬁculty of inputs our algorithmcan solve within the 1-hour time limit. Without branch reduction, the algorithm was able to solveall instances with reticulation numbers up to 6 and some instances with up to 8 reticulations. Withredundant branch reduction, the algorithm was able to solve all instances with reticulation numbers upto 8 and some instances with up to 11 reticulations.28 s p ee d u p : ( t i m e - B R ) / ( t i m e + B R ) time -BR (s)10 s p ee d u p : ( t i m e - B R ) / ( t i m e + B R ) Figure 4: The speed-up (running time without redundant branch elimination divided by the running timewith redundant branch elimination) achieved by redundant branch elimination on 658 instances solvablewith and without redundant branch elimination. (a) as a function of the number of reticulations and (b)as a function of the running time without redundant branch elimination. The shading of reticulationnumbers 7 and 8 indicate that not all inputs with 7 or 8 reticulations were solved by the algorithm, soparticularly the ﬂattening of the curve may be the result of limiting the running time of the algorithm andtesting only a restricted set of inputs. We would expect that the effect of redundant branch eliminationkeeps increasing as the number of reticulations increases, given that the seems to be no plateauing ofthe speed-up as a function of running time in Figure (b). t i m e ( s ) -BR t i m e ( s ) +BR Figure 5: Running times of our algorithm with and without redundant branch elimination, as functionsof the number of reticulations. As in Figure 4, the shaded regions indicate reticulation numbers forwhich not all input instances were solved within the 1-hour time limit. Transparent dots are data points,opaque dots indicate the average together with the 95% conﬁdence intervals.29 t i m e ( s ) t i m e ( s ) Figure 6: Running times of our algorithm on real-world data as a function of the reticulation number(left) or the level (right).

10 20 30 40 50 60 80 100 150leaves1015202530354045 n e t w o r k c o m p l e x i t y parameterklevel n e t w o r k c o m p l e x i t y parameterklevel Figure 7: The reticulation number and the level as a function of the number of leaves and trees in thereal-world inputs.

Our next experiment tested whether we can solve real-world instances with non-trivial numbers ofreticulations efﬁciently using our algorithm. For this experiment, we extracted 10 test instances from thereal-world data set for every possible combination of the following parameters: • Number of trees: t ∈ {

2, 3, 4, 5, 6, 7, 8 }• Number of leaves: n ∈ {

10, 20, 30, 40, 50, 60, 80, 100, 150 } The algorithm was run with redundant branch elimination but with cluster reduction. Of the 630test inputs, our algorithm was able to solve 306 within the 1-hour time limit. The left graph in Figure 6shows the running time of our algorithm on the instances it was able to solve as a function of the number30 t i m e / t r ee s ( s ) k 12345678

20 50 100 150 200leaves10 t i m e / l e a v e s ( s ) k 12345678 Figure 8: Running times of the algorithm with redundant branch elimination on al synthetic test inputsdivided by the number of trees (left) and the number of leaves (right). Error bars denote a 95% conﬁdenceinterval.of reticulations. We make two important observations: First, even though our algorithm was not ableto solve any synthetic inputs with more than 11 reticulation even with redundant branch eliminationturned on, it was able to solve real-world inputs with up to 50 reticulations. Second, the running timevaries greatly across instances with the same number of reticulations. Both observations can be explainedby the fact that the real-world data has much more structure and can be decomposed into non-trivialclusters. Figure 7 shows the number of reticulations and the level of the real-world inputs as a functionof the number of trees. These ﬁgures demonstrate that the network levels are signiﬁcantly lower thanthe number of reticulations, something that was observed for inputs consisting of two trees and which isthe key to the fast running times of MAF-based algorithms for pairs of trees. It comes a bit of a surprisethat the same is true also for more than two trees. However, the right graph in Figure 7 demonstratesthat the gap between level and reticulation number narrows as the number of trees increases.Using cluster reduction, the running time of the algorithm is determined by the level of the computednetwork rather than the reticulation number. Thus, the right graph in Figure 6 shows the running time asa function of the level of the computed network. This ﬁgure highlights another important fact: We wereable to solve real-world instances with level up to 21 whereas level 11 was the limit for synthetic inputs.This suggests that even the clusters seem have signiﬁcantly more structure than random instances, whichallows the algorithm to branch on fewer non-trivial cherries in each recursive call than on syntheticinstances.

The theoretical analysis of our algorithm predicts an exponential dependence of its running time only onthe number of reticulations k , whereas the running time should only depend nearly linearly on both n and t . To verify this, we divided the observed running times, for each value of k between 1 and 8,by n and then by t . Figures 8 shows the results. The negative slopes of these curves conﬁrms that therunning time in practice depends at most linearly on each of n and t .31 .4.4 Comparison with H YBRO S CALE

The most interesting question is whether optimal tree-child networks can be computed signiﬁcantly fasterthan unrestricted hybridization networks. To answer this question, we compared the running time of ouralgorithm against that of its closest competitor H

YBRO S CALE , which computes unrestricted hybridizationnetworks. For this comparison, we used synthetic data and real-world data. In order to test a widerange of test inputs, we limited the time per run to 20 minutes for synthetic inputs and to 60 minutes forreal-world inputs. Since we ran our algorithm with 8 threads, we did the same for H

YBRO S CALE . Synthetic data.

We tested both our algorithm and H

YBRO S CALE on 6 test inputs for every possiblecombination of the following parameters: • Number of trees: t ∈ {

3, 5, 10, 20 }• number of reticulations: k ∈ {

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } and on six inputs with 2 trees and k ∈ {

2, 4, . . . , 28, 30 } . All instances had 20 leaves. We use a widerrange of reticulation numbers (and compensate for this by using only three instances for each valueof k ) for inputs with only two trees because we expected H YBRO S CALE to run very fast on such inputs(because MAF-based algorithms are very fast for pairs of trees).As can be seen in Figure 9 and as expected, H

YBRO S CALE outperforms our algorithm on inputsconsisting of two trees and for more than 7 reticulations. For more than two trees, our algorithm runsfaster than H

YBRO S CALE due to the near-linear dependence of our algorithm on the number of treesand the exponential dependence of H

YBRO S CALE on the number of trees. The difference becomes verypronounced for 10 and 20 trees, where H

YBRO S CALE was unable to solve most instances whereas ouralgorithm solved all test instances within the 20-minute time limit. Additionally, Hybroscale ran out ofmemory on certain occasions.

Real-world data.

For this experiment, we used the same data set as in Section 5.4.2. As mentionedbefore, our algorithm solved 306 of the 630 inputs in the 1-hour time limit; H

YBRO S CALE solved 152inputs, which were a subset of the 306 inputs solved by our algorithm. On 5 of the 2-tree inputs,H

YBRO S CALE outperformed our algorithm. On all other inputs, including all other 2-tree inputs, ouralgorithm was faster. Figure 10 shows the detailed results.

The ﬁnal question we were interested in was whether optimal tree-child hybridization networks havesigniﬁcantly more reticulations than the optimal unrestricted hybridization networks for the same sets oftrees or whether tree-child hybridization networks are often also optimal hybridization networks.Of the 268 synthetic inputs that both our algorithm and H

YBRO S CALE were able to solve, only 3had a greater tree-child hybridization number than their hybridization number. For all three inputs, thedifference was 1.Of the 142 real-world inputs solved by both our algorithm and H

YBRO S CALE , 21 had a greatertree-child hybridization number than their hybridization number. For 20 of these inputs, the differencewas 1; for 1 input, the difference was 2.This indicates that very often, tree-child hybridization networks achieve the optimal hybridizationnumber and, even when they do not, they offer a reasonable approximation of optimal hybridizationnetworks. Given that they are substantially easier to compute, as our results in the previous subsectiondemonstrate, tree-child networks therefore offer a useful analysis tool that can be used in place ofhybridization networks in many instances. 32 t i m e ( s ) t i m e ( s )

10 trees 1 2 3 4 5 6 7 8 9 10 11 12 13reticulations found Tree-child20 trees algorithmHybroscaleparallel, w=100

Figure 9: Running times of our algorithm and H

YBRO S CALE on synthetic inputs. Since our algorithmsolves all test instances and H

YBRO S CALE does not, we choose the tree-chlid hybridization number asthe x -axis. Bars indicate a 95% conﬁdence interval. Stars indicate signiﬁcant differences between therunning times of the two algorithms using an independent t -test with unequal variances (*: p < p < We have presented the ﬁrst ﬁxed-parameter algorithm for computing optimal tree-child networks formany trees, based on the recently introduced concept of tree-child cherry picking sequences. While thetheoretical running time of our algorithm is substantially greater than MAF-based network constructionmethods for two trees, our experimental results conﬁrm that our algorithm can be used to solve non-trivialreal-world inputs efﬁciently. Similarly to MAF-based algorithms for two trees, a key factor determiningwhether an instance can be solved efﬁciently is whether it can be decomposed into non-trivial clusters.While it comes as no surprise that randomly generated inputs consisting of more than two trees (almost)cannot be decomposed into clusters and thus cannot be solved efﬁciently, except for fairly small numbersof reticulations, the real-world inputs in our experiments contained sufﬁciently many non-trivial clusters,which allowed us to solve some inputs with up to 50 reticulations within one hour or less.The closest competitor of our algorithm, H

YBROSCALE , which computes unrestricted hybridizationnetworks, outperforms our algorithm on inputs consisting of two trees, which is to be expected becauseMAF-based methods are very efﬁcient for computing optimal hybridization networks for pairs of trees.Already for 3 trees, our algorithm outperforms H

YBROSCALE and, for more than 6 trees, H

YBROSCALE cannot solve any of the inputs our algorithm can solve, due to its exponential dependence on the number33 t i m e ( s ) t i m e ( s ) t i m e ( s ) Figure 10: Running times of our algorithm and H

YBRO S CALE on real-world inputs. Since our algorithmsolves all test instances that H

YBRO S CALE was able to solve, we chose the tree-child level as the x -axis.Bars indicate a 95% conﬁdence interval. Stars indicate signiﬁcant differences between the running timesof the two algorithms using an independent t -test with unequal variances (*: p < p < O (( ck ) k · poly ( n , t )) time, ideallyin O ( c k · poly ( n , t )) time? For temporal networks, a recent result [ ] shows that this is indeed the case.An interesting open question is whether the techniques used in that algorithm can also be used to obtainfaster algorithms for computing general tree-child networks.Most real-world inputs are multifurcating, as a result of suppressing branches in gene trees with lowsupport. Thus, it would be of great importance to obtain efﬁcient methods for constructing (tree-child)hybridization networks from multifurcating trees. Our algorithm is able to do this but only if we sacriﬁcethe FPT bound on its running time: the bound on the number of non-trivial cherries in Proposition 9,which is the key to bounding the branching number of our algorithm, holds only if the input trees arebinary. It remains an open question whether there exists a ﬁxed-parameter algorithm for computingoptimal tree-child hybridization networks for multifurcating trees. References [ ] Benjamin Albrecht. Computing hybridization networks for multiple rooted binary phylogenetictrees by maximum acyclic agreement forests. arXiv:1408.3044 , 2014. [ ] Benjamin Albrecht. Computing all hybridization networks for multiple binary phylogenetic inputtrees.

BMC Bioinformatics , 16(1):236, 2015. [ ] M. Baroni, C. Semple, and M. Steel. Hybrids in real time.

Systematic Biology , 55:46–56, 2006. [ ] Mihaela Baroni, Stefan Grünewald, Vincent Moulton, and Charles Semple. Bounding the numberof hybridisation events for a consistent evolutionary history.

Journal of Mathematical Biology ,51(2):171–182, 2005. [ ] Robert G. Beiko. Telling the whole story in a 10,000-genome world.

Biology Direct , 6(1):34, 2011. [ ] Magnus Bordewich, Simone Linz, Katherine St. John, and Charles Semple. A reduction algorithmfor computing the hybridization number of two trees.

Evolutionary Bioinformatics Online , 3:86–98,2007. [ ] Magnus Bordewich and Charles Semple. Computing the hybridization number of two phyloge-netic trees is ﬁxed-parameter tractable.

IEEE / ACM Transactions on Computational Biology andBioinformatics , 4(3):458–466, 2007. [ ] Sander Borst. Personal communication, June 2019. [ ] Zhi-Zhong Chen and Lusheng Wang. Algorithms for reticulate networks of multiple phylogenetictrees.

IEEE / ACM Transactions on Computational Biology and Bioinformatics , 9(2):372–384, 2012. [ ] Peter J. Humphries, Simone Linz, and Charles Semple. Cherry picking: a characterization ofthe temporal hybridization number for a set of phylogenies.

Bulletin of Mathematical Biology ,75(10):1879–1890, 2013. [ ] Leo van Iersel, Steven Kelk, Nela Lekic, Chris Whidden, and Norbert Zeh. Hybridization number onthree rooted binary trees is EPT.

SIAM Journal on Discrete Mathematics , 30(3):1607–1631, 2016.35 ] Leo van Iersel, Steven Kelk, and Celine Scornavacca. Kernelizations for the hybridization numberproblem on multiple nonbinary trees.

Journal of Computer and System Sciences , 82(6):1075–1089,2016. [ ] Leo van Iersel and Simone Linz. A quadratic kernel for computing the hybridization number ofmultiple trees.

Information Processing Letters , 113(9):318–323, 2013. [ ] Steven Kelk. Treetistic. http: // skelk.sdf-eu.org / clustistic / , 2012. [ ] Zhijiang Li and Norbert Zeh. Computing maximum agreement forests without cluster partitioningis folly. In

Proceedings of the 25th Annual European Symposium on Algorithms , pages 56:1–56:14,2017. [ ] Simone Linz and Charles Semple. A cluster reduction for computing the subtree distance betweenphylogenies.

Annals of Combinatorics , 15(3):465–484, 2011. [ ] Simone Linz and Charles Semple. Attaching leaves and picking cherries to characterise thehybridisation number for a set of phylogenies.

Advances in Applied Mathematics , 105:102–129,2019. [ ] Sajad Mirzaei and Yufeng Wu. Fast construction of near parsimonious hybridization networks formultiple phylogenetic trees.

IEEE / ACM Transactions on Computational Biology and Bioinformatics ,13(3):565–570, 2016. [ ] Chris Whidden, Robert G. Beiko, and Norbert Zeh. Fixed-parameter algorithms for maximumagreement forests.

SIAM Journal on Computing , 42(4):1431–1466, 2013. [ ] Christopher Whidden, Norbert Zeh, and Robert G Beiko. Supertrees based on the subtree prune-and-regraft distance.

Systematic biology , 63(4):566–581, 2014. [ ] Yufeng Wu. Close lower and upper bounds for the minimum reticulate network of multiplephylogenetic trees.

Bioinformatics , 26(12):i140–i148, 2010.36

Construction of a Tree-Child Network from a Tree-Child Cherry PickingSequence

Procedure T REE C HILD N ETWORK F ROM S EQUENCE ( T , S ) Input:

A set of X -trees T and a tree-child cherry picking sequence S = 〈 ( x , y ) , . . . , ( x r , y r ) , ( x r + , − ) 〉 for T Output: