Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings
DDrawing Tree-Based Phylogenetic Networkswith Minimum Number of Crossings
Jonathan Klawitter andPeter Stumpf University of W¨urzburg, Germany University of Passau, Germany
Abstract.
In phylogenetics, tree-based networks are used to model andvisualize the evolutionary history of species where reticulate events suchas horizontal gene transfer have occurred. Formally, a tree-based network N consists of a phylogenetic tree T (a rooted, binary, leaf-labeled tree)and so-called reticulation edges that span between edges of T . The net-work N is typically visualized by drawing T downward and planar andreticulation edges with one of several different styles. One aesthetic crite-ria is to minimize the number of crossings between tree edges and retic-ulation edges. This optimization problem has not yet been researched.We show that, if reticulation edges are drawn x-monotone, the problemis NP-complete, but fixed-parameter tractable in the number of reticula-tion edges. If, on the other hand, reticulation edges are drawn like “ears”,the crossing minimization problem can be solved in quadratic time. Keywords:
Phylogenetic Network · Tree-Based · Crossing Minimization
The evolution of a set of species is usually depicted by a phylogenetic tree [12].More precisely, a phylogenetic tree T is a rooted, binary tree where the leavesare labeled bijectively by the set of species. The internal vertices of T , eachhaving two children, represent bifurcation events in the evolution of the taxa.The heights assigned to vertices indicate the flow of time from the root, lyingfurthest in the past, to the present-day species.Evolutionary histories can however not always be fully represented by atree [3]. Indeed, reticulate events such as hybridization, horizontal gene transfer,recombination, and reassortment require the use of vertices with higher inde-gree [8, 13]. A phylogenetic network N generalizes a phylogenetic tree in exactlythis sense, that is, besides the root, leaves and vertices with indegree one andoutdegree two, N may contain vertices with indegree two and outdegree one. Tree-Based Networks.
Motivated by the question of whether the evolutionaryhistory of the taxa is fundamentally tree-like, Francis and Steel [4] introduceda class of phylogenetic networks called tree-based networks , which are “merelyphylogenetic trees with additional edges”. Formally, a tree-based network N is a r X i v : . [ c s . D M ] A ug Klawitter and Stumpf a phylogenetic network that has a subdivision T (cid:48) of a phylogenetic tree T asspanning tree. Then T is called the base tree of N and T (cid:48) the support tree of N . Lately, tree-based networks have received a lot of attention in combinatorialphylogenetics [1, 4, 9, 11] and while drawings of several other types of phyloge-netics networks have been investigated in the past [2, 7, 8, 14], this has, to thebest of our knowledge, not been done for tree-based networks. In this paper, welook at drawings of tree-based networks with different drawing styles inspiredby drawings in the literature.For a tree-based network N , we assume that both the base tree T and thesupport tree T (cid:48) as spanning tree of N are fixed. We call an edge not containedin the embedding of T (cid:48) into N a reticulation edge . Therefore, we can perceive adrawing of N as a drawing of T (or T (cid:48) ) and the reticulation edges. A vertex of N that is also in T is called a tree vertex . Drawing styles.
Our drawing conventions are that N is drawn downwards withvertices at their fixed associated height and T is drawn planar in the style of adendrogram, that is, each tree edge ( u, v ) consists of a horizontal line segmentstarting at u and a vertical line segment ending at v . For reticulation edges, wehave different drawing styles; see Figure 1. In the horizontal style – the onlystyle where the two endpoints of a reticulation edge must have the same height– reticulation edges are drawn as horizontal line segments. This style has forexample been used by Kumar et al. [10, Figure 4]. We assume that all horizontaledges come with slightly different heights. The next two styles are inspired byFigures 3 and 6 by Vaughan et al. [15]. There, a reticulation edge ( u, v ) is drawnwith two horizontal and one vertical line segment and thus with two bends. Thestyles differ in where the vertical line segment is placed. We define vertex (cid:96) ( u, v )as follows. If the lowest common ancestor (lca) w in T (cid:48) of u and v is a tree vertex,set (cid:96) ( u, v ) = w . Otherwise, set (cid:96) ( u, v ) to be the first tree vertex below w . In the ear style , the vertical line segment is placed to the right of the subtree rooted at (cid:96) ( u, v ). In the snake style , the vertical line segment lies between u and v and, inparticular, its x-coordinate lies between the x-coordinates of the left and rightsubtree of (cid:96) ( u, v ). (a) (b) (c) Fig. 1.
Drawings of tree-based networks with the (a) horizontal, (b) snake, and (c) earstyle for the red reticulation edges.
The aesthetic criteria to optimize for when constructing a drawing of N , witheither of the styles, is the number of crossings. Our focus is on crossings between rossing Minimization for Drawings of Tree-Based Networks 3 reticulation edges and tree edges. Crossings between pairs of reticulation edgesmay be minimized in a post-processing step.We make the following important observation. The number of crossings in adrawing of N is fully determined by the order of the leaves or, equivalently, bythe rotation of each tree vertex. Formally, we use a map c : V ( T ) → V ( T ) thatassigns to each non-leaf vertex v of T one of its children. In a drawing of N ,we then consider v to be rotated left , if c ( v ) is its left child, and rotated right ,if c ( v ) is its right child. Two vertices are rotated the same way if they are bothrotated left or if they are both rotated right. Let ¯ c ( v ) denote the child of v thatis not c ( v ). Contribution and outline.
First, we show that the number of crossings can beminimized in quadratic time for ear-style drawings. Second, we prove that theproblem is NP-hard for the horizontal style. On the positive side, we devise fixed-parameter tractable (fpt) algorithms for the horizontal and the snake style.
Consider an ear-style drawing of a tree-based network N . Let e = ( u, v ) be areticulation edge of N and f = ( x, y ) a tree edge of N . First, note that thevertical line segment of e is placed such that it does not cross any tree edge.Next, note that if the subtree T ( (cid:96) ( u, v )) rooted at (cid:96) ( u, v ) does not contain f ,then e and f cannot cross. Let l be the horizontal line segment of e starting at v .Assume T ( (cid:96) ( u, v )) contains f and the y-coordinate range of f contains the y-coordinate of v . Observe that l and f cross if and only if f is in the right subtreeof (cid:96) ( v, y ); see Figure 2 (a). (An analogous condition holds for the horizontal linesegment starting at u .) Rotating (cid:96) ( u, v ) thus changes whether f and l cross.Furthermore, in general, the existence of each possible crossing depends on therotation of a single tree vertex. We can thus minimize the number of crossingsin an ear-style drawing of N by deciding for each tree vertex which orientationresults in less crossings. We show that this can be done efficiently. Theorem 1.
Let N be a tree-based network with n leaves and k reticulationedges. Then an ear-style drawing of N with minimum number of crossings canbe computed in O ( nk ) time.Proof. The idea of the algorithm is to sweep upwards through N and, wheneveran endpoint v of a reticulation edge is met, to tell v ’s ancestor tree verticeshow many crossings it costs to have v in the left subtree. Each tree vertex isthus equipped with with two counters that inform about which rotation is lessfavorable; see Figure 2 (a).Let e = ( u, v ) be a reticulation edge. Above we observed that a horizontalsegment of e can only have crossings with tree edges below (cid:96) ( u, v ). Therefore,we first compute and store the lca for each pair of endpoints of each reticulationedge in O ( n + k ) time with an algorithm by Gabow and Tarjan [5, Section4.6]. We then start the sweep from the leaves towards the root of N . At every Klawitter and Stumpf w (a) (b) (c) v u(cid:96) ( u, v ) v u v u Fig. 2. (a) Start of sweep line algorithm with counters at 0; (b) adding potentialcrossings to counters; (c) rotating v based on counters. endpoint v of a reticulation edge ( u, v ) (or ( v, u )), determine in O ( n ) time forevery vertex u of T the width of its left and right subtree at the height of v ;for example with a post-order traversal of T . Then from v up to (cid:96) ( u, v ), add foreach tree vertex w the width of the subtree not containing v to the respectivecounter; see Figure 2 (b). This way, we count potential crossings of the horizontalsegment at v with the vertical segments of all edges at the height of v in thissubtree at once. When the sweep reaches a tree vertex w , as in Figure 2 (c),pick the best rotation for w based on its counters. In total we have 2 k steps forendpoints of reticulation edges taking O ( n ) time and O ( n ) steps for tree verticestaking O (1) time. Hence, the algorithm runs in O ( nk ) time.To minimize crossings between pairs of reticulation edges in a post-processingstep, we only have to consider pairs of reticulation edges that have the verticalsegment to the right of the same subtree and that are nested, that is, two reticu-lation edges ( u, v ) and ( x, y ) with u above x and y above v . The vertical segmentof ( u, v ) should then be to the right of the vertical segment of ( x, y ). In this section, we show that the crossing minimization decision problem forhorizontal-style drawings is NP-complete. We prove the NP-hardness with areduction from MAX-CUT, which is known to be NP-complete [6]. Recall thatin an instance of MAX-CUT we are given a graph G = ( V, E ) and a parame-ter p ∈ N , and have to decide whether there exists a bipartition ( A, B ) of V withat least p edges with one end in A and one end in B . Theorem 2.
The crossing minimization problem for horizontal-style drawingsof a tree-based network is NP-complete.Proof.
Firstly, since we can non-deterministically generate all the drawings of N and count the number of crossings of a drawing in polynomial time, the problemis in NP. Concerning the hardness, we polynomial-time reduce a MAX-CUTinstance with a graph G = ( V, E ) to crossing minimization on a tree-basednetwork N . In the following construction of N , assume that leaves are always(re)assigned the height 0. rossing Minimization for Drawings of Tree-Based Networks 5 The main idea is to have one edge gadget N e for each e ∈ E that inducesa crossing if and only if e is not in our cut; see Figure 3. Let h : V → N bean arbitrary vertex ordering. Let e = { u, v } ∈ E and suppose h ( u ) < h ( v ). Theconstruction of N e then works as follows. We have a tree vertex u e with two leavesas children and a tree vertex v e with u e and a leaf as children. We set c ( v e ) = u e and the heights of u e and v e to h ( u ) and h ( v ) respectively. We add a reticulationedge f e between u e c ( u e ) and v e ¯ c ( v e ). Note that f e and u e ¯ c ( u e ) cross if and onlyif u e and v e are rotated the same way. To connect all edge gadgets, we replacethe leaves of an arbitrary rooted, binary tree with | E | leaves and a downwardplanar embedding with the edge gadgets; see Figure 4. u e v e N e f e v (cid:63) c ( v (cid:63) ) l l l . . .v (cid:63) c ( v (cid:63) ) l l k } T v N v v e v e (cid:48) Fig. 3.
An edge gadget N e ; a vertex gadget N v based on the tree T v . We want to ensure that the tree vertices v , . . . , v deg( v ) corresponding to thesame node v ∈ V are all rotated the same way. If this is enforced, we canconsider all nodes in V where the corresponding tree vertices are rotated leftas one partition set and all nodes in V where the corresponding tree verticesare rotated right as the other partition set. If on the other hand a cut is given,we simply choose for each vertex the rotation of the corresponding tree verticesaccordingly. Now, to ensure the same rotation for all corresponding tree vertices,we construct a vertex gadget N v for each node v ∈ V (in some order); see Figure 3.We start with a rooted, binary tree T v on three leaves l , l , l such that l and l have a common parent. Let v (cid:63) denote the child of the root of T v and let c ( c ( v (cid:63) )) = l . Add a bundle of k = 2( | V | + 1) · | E | reticulation edges between l and l . We will see that k is large enough such that this bundle does not inducecrossings in a crossing minimum drawing. It thus enforces that l lies between l and l . We substitute l by our current construction; see Figure 3.Lastly, for 1 ≤ i ≤ deg( v ), we add a reticulation edge between v i c ( v i ) and theincoming edge of l , and a reticulation edge between v i ¯ c ( v i ) and l . Note thatif v (cid:63) and v i are rotated the same way, we get two crossings less than otherwise.However, different rotations can save at most one crossing in the edge gadgetcontaining v i . Hence, in a crossing minimum drawing, v (cid:63) and v i are rotated thesame way. In fact, v , . . . , v deg( v ) , v (cid:63) are rotated the same way. This completesthe construction of N . Note that N has a size polynomial in the size of G .Note that the order of the edge gadgets does not influence the number ofcrossings with the two reticulation edges added for v i ; this number is fixed for Klawitter and Stumpf crossing minimum drawings. Therefore, we can compute the total number k of crossings induced by vertex gadgets. Furthermore, since k ≤ | V | | E | andthus k ≥ k + | E | + 1, we get that crossing one edge bundle would induce morecrossings than we obtain from the vertex gadgets and from the edge gadgets.Hence, no bundle induces crossings in a crossing minimum drawing. u vw u e v e u g w g v f w f g ef w (cid:63) v (cid:63) u (cid:63) G N
Fig. 4.
A crossing-minimum drawing of N inducing a max-cut on G . We conclude that minimizing crossings boils down to minimizing crossingsin edge gadgets. Finally, by the construction of N and our observations, we getthat N admits a horizontal-style drawing with k ≤ k + | E | − p crossings if andonly if G admits a cut of size at least p . The statement follows.A snake-style drawing where endpoints of reticulation edges have the sameheight is a horizontal-style drawing; the reduction thus also works for this style. Corollary 1.
The crossing minimization problem for snake-style drawings of atree-based network is NP-complete.
For the ear style, we have seen that whether a reticulation edge and a tree edgecross, depends on the rotation of at most one tree vertex, since horizontal linesegments always go to the right. This is not the case for horizontal-style andsnake-style drawings. However, fixing the rotation of (cid:96) ( u, v ) for each reticulationedge ( u, v ), also fixes for the horizontal line segments of ( u, v ) whether they goto the left or right. Further, while the vertical line segment may have a singlecrossing, this crossing occurs if and only if one endpoint of the reticulation edgeis the lca of both endpoints. We can again conclude that the existence of eachcrossing of a horizontal line segment with a tree edge depends on the rotation rossing Minimization for Drawings of Tree-Based Networks 7 of a single tree vertex – with two differences to the ear style: (i) A horizontalline segment can now also go towards the left. (ii) A horizontal line segment of areticulation edge ( u, v ) ends between the two subtrees of (cid:96) ( u, v ), i.e., one of thetwo subtrees can have crossings with only one of the horizontal line segments of( u, v ). With these observations we can now devise a fixed-parameter tractablealgorithm. Theorem 3.
Let N be a tree-based network with n leaves and k reticulationedges. Then a snake-style drawing of N with minimum number of crossings canbe computed in O (2 k · nk ) time. The computation is thus fixed-parameter tractablewhen parametrized by k .Proof. Let L = { (cid:96) ( u, v ) | ( u, v ) is a reticulation edge } . Suppose the rotationfor all v ∈ L is fixed. With the observation above, we can slightly adapt ouralgorithm from Theorem 1 to compute for every v (cid:54)∈ L the rotation that inducesless crossings. Namely, the algorithm has to differentiate whether line segmentsgo to the left or right, and pick a rotation only for v (cid:54)∈ L .We try this for all possible combinations of rotations of vertices in L andthen pick the drawing with the least crossings. Since there are O (2 k ) such com-binations, the statement on the running-time follows.Note that this implies the same statement for the horizontal style. References
1. Anaya, M., Anipchenko-Ulaj, O., Ashfaq, A., Chiu, J., Kaiser, M., Ohsawa, M.S.,Owen, M., Pavlechko, E., St. John, K., Suleria, S., Thompson, K., Yap, C.: Ondetermining if tree-based networks contain fixed trees. Bulletin of MathematicalBiology (5), 961–969 (2016). doi:10.1007/s11538-016-0169-x
2. Calamoneri, T., Di Donato, V., Mariottini, D., Patrignani, M.: Visualizing co-phylogenetic reconciliations. Theoretical Computer Science , 228–245 (2020). doi:10.1016/j.tcs.2019.12.024
3. Doolittle, W.F.: Phylogenetic Classification and the Universal Tree. Science (5423), 2124–2128 (1999). doi:10.1126/science.284.5423.2124
4. Francis, A.R., Steel, M.: Which Phylogenetic Networks are MerelyTrees with Additional Arcs? Systematic Biology (5), 768–777 (2015). doi:10.1093/sysbio/syv037
5. Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of dis-joint set union. Journal of Computer and System Sciences (2), 209–221 (1985). doi:10.1016/0022-0000(85)90014-5
6. Garey, M.R., Johnson, D.S.: Computers and intractability, vol. 174. freeman SanFrancisco (1979)7. Huson, D.H.: Drawing Rooted Phylogenetic Networks. IEEE/ACM Transactionson Computational Biology and Bioinformatics (1), 103–109 (2009)8. Huson, D.H., Rupp, R., Scornavacca, C.: Phylogenetic networks: con-cepts, algorithms and applications. Cambridge University Press (2010). doi:10.1093/sysbio/syr055 Klawitter and Stumpf9. Jetten, L., van Iersel, L.: Nonbinary Tree-Based Phylogenetic Networks.IEEE/ACM Transactions on Computational Biology and Bioinformatics (1),205–217 (2018). doi:10.1109/TCBB.2016.2615918
10. Kumar, V., Lammers, F., Bidon, T., Pfenninger, M., Kolter, L., Nilsson, M.A.,Janke, A.: The evolutionary history of bears is characterized by gene flow acrossspecies. Scientific Reports (1) (2017). doi:10.1038/srep46487
11. Pons, J.C., Semple, C., Steel, M.: Tree-based networks: characterisations, metrics,and support trees. Journal of Mathematical Biology (4), 899–918 (2019)12. Semple, C., Steel, M.A.: Phylogenetics, vol. 24. Oxford University Press (2003)13. Steel, M.: Phylogeny: discrete and random processes in evolution. Society for In-dustrial and Applied Mathematics (2016)14. Tollis, I.G., Kakoulis, K.G.: Algorithms for visualizing phylogenetic net-works. In: Hu, Y., N¨ollenburg, M. (eds.) Graph Drawing and Net-work Visualization. pp. 183–195. Springer International Publishing (2016). doi:10.1007/978-3-319-50106-2 15
15. Vaughan, T.G., Welch, D., Drummond, A.J., Biggs, P.J., George, T., French, N.P.:Inferring Ancestral Recombination Graphs from Bacterial Genomic Data. Genetics (2), 857–870 (2017).(2), 857–870 (2017).