[PDF] A branching process approach to level-k phylogenetic networks

Abstract

The mathematical analysis of random phylogenetic networks via analytic and algorithmic methods has received increasing attention in the past years. In the present work we introduce branching process methods to their study. This approach appears to be new in this context. Our main results focus on random level-k networks with n labelled leaves. Although the number of reticulation vertices in such networks is typically linear in n, we prove that their asymptotic global and local shape is tree-like in a well-defined sense. We show that the depth process of vertices in a large network converges towards a Brownian excursion after rescaling by n^{-1/2}. We also establish Benjamini--Schramm convergence of large random level-k networks towards a novel random infinite network.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b A BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS by Benedikt Stuﬂer

Abstract . —

The mathematical analysis of random phylogenetic networks via an-alytic and algorithmic methods has received increasing attention in the past years.In the present work we introduce branching process methods to their study. Thisapproach appears to be new in this context. Our main results focus on random level- k networks with n labelled leaves. Although the number of reticulation vertices insuch networks is typically linear in n , we prove that their asymptotic global and localshape is tree-like in a well-deﬁned sense. We show that the depth process of verticesin a large network converges towards a Brownian excursion after rescaling by n − / .We also establish Benjamini–Schramm convergence of large random level- k networkstowards a novel random inﬁnite network.

1. Introduction

Phylogenetic networks may be used to model the evolutionary history of speciesthat have undergone reticulate events [ ], such as horizontal gene transfer (by whichgenes are transferred across species) or hybrid speciation (by which lineages recom-bine to create a new one) [ ].The application of phylogenetic networks in evolutionary biology motivates themathematical study of their number and shape, which has received increasing at-tention in recent literature. See for example [

3, 4, 5, 6, 7, 8, 9 ] and references giventherein. In the present work, we introduce branching process methods to their study.This approach appears to be new in this context and makes a ﬁne addition to thecurrent toolbox of analytic and algorithmic methods. k -networks. — A binary rooted phylogeneticnetwork N on a ﬁnite non-empty set X of leaves may be deﬁned as a simple rooteddirected graph with no directed cycles that satisﬁes the following constraints:1. It’s unique root has indegree 0 and outdegree 2.2. All non-root vertices are either tree nodes (indegree 1, outdegree 2), reticulationnodes (indegree 2, outdegree 1), or leaves (indegree 1, outdegree 0). Key words and phrases . —

Phylogenetic networks, random graphs, branching processes.

BENEDIKT STUFLER

The leaves are bijectively labelled with elements of X . It will be notationally con-venient to additionally admit the network consisting of a single labelled root vertexwith no edges.There is an inﬁnite number of such networks on a given set X . For this reason,one restricts to subclasses for which this number is ﬁnite. We are going to focus onlevel- k networks, with k denoting a ﬁxed positive integer. Recall that a cutvertex in a connected graph is a vertex whose removal disconnects the graph. Similarly,a bridge is an edge whose removal disconnects the graph. A block (or 2 -connectedcomponent ) is a connected induced subgraph that is maximal with the property ofhaving no cutvertices of its own. The reader may consult books on the foundation ofgraph theory for further details [ ]. We say the binary rooted phylogenetic network N is a level- k network, if the following conditions are met:1. Any block of N contains at most k reticulation vertices of N .2. Any block of N with at least 3 vertices contains at least 2 vertices that are thesource of bridges of N .Here we view directed edges as joining a source vertex to a destination vertex accord-ing to their orientation. The second condition ensures that there are only ﬁnitelymany such networks on a given set X . In fact, their number satisﬁes the followingasymptotic expression: Lemma 1.1 . —

The number N ( k, n ) of level- k networks on an n -element set satisﬁes N ( k, n ) ∼ a k n − / ρ − nk n ! as n → ∞ (1.1) for constants a k , ρ k > that only depend on k . For k = 1 and k = 2 this was already shown by [ ] via analytic methods, whoadditionally calculated the involved constants, gave exact enumerating formulas,studied unrooted networks, and proved limit theorems for the numbers of undirectedcycles and inner edges in random networks. Note that each vertex v in N may be reached from the root by following a directed path that only traversesedges according to their orientation. There may be multiple such paths, and wedenote the length of a shortest path by h N ( v ). Often, h N ( v ) is called the depth orheight of v . The maximal height of vertices in N is denoted by H( N ). We let | N | denote the number of vertices of N .Throughout this paper, we ﬁx a positive integer k . For each integer n ≥ N n be drawn uniformly at random among all level- k networks on the set[ n ] := { , . . . , n } . If we order its vertices v , . . . , v | N n | we may form the correspondingcontinuous height process (h N n ( v t ) : 0 ≤ t ≤ | N n | ) that starts at h N n ( v ) := 0 andlinearly interpolates the values h N ( v i ) for integers i ∈ { , . . . , | N n |} . Of course, theheight process depends on the order of vertices we choose. BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 3

Theorem 1.2 . —

We may couple N n with an ordering of its vertices such that theassociated height process (h N n ( v t ) : 0 ≤ t ≤ | N n | ) satisﬁes ( b k n − / h N n ( v s | N n | ) : 0 ≤ s ≤ d −→ ( e ( s ) : 0 ≤ s ≤ as n → ∞ (1.2) for a constant b k > that only depends on k . Here ( e ( s ) : 0 ≤ s ≤ denotesBrownian excursion normalized to have duration . The same limit as in (1.2) holdsif we form the height process using leaves only. Questions concerning heights in various classes of random phylogenetic networkswere raised in [ , Sec. 7] and [ , Sec. 8].We prove Theorem 1.2 by establishing a new relation between distances in therandom phylogenetic network N n and distances in a Galton–Watson tree conditionedon having n leaves. This allows us to apply invariance principles for the latter givenin [

11, 12 ].An immediate consequence of Theorem 1.2 is that b k n − / H( N n ) d −→ sup ≤ s ≤ e ( s ) . (1.3)Likewise, a uniformly selected vertex (or a uniformly selected leaf) u n of N n satisﬁes b k n − / h N n ( u n ) d −→ e ( r )(1.4)with 0 ≤ r ≤ , Theorem 1.3 . —

There are constants

C, c > such that for all x > and n ≥ P (H( N n ) > x ) ≤ C exp( − cx /n ) . (1.5) Moreover, all higher moments in (1.3) and (1.4) converge.

Our main tool for proving Theorem 1.3 is a similar tail-bound by [ ] for theheight of Galton–Watson trees conditioned to be large. Theorem 1.3 entails that themoments converge in (1.3), yielding b k n − / E [H( N n )] → E (cid:20) sup ≤ s ≤ e ( s ) (cid:21) = p π/ b pk n − p/ E [H( N n ) p ] → − p/ p ( p − p/ ζ ( p )(1.7)for each integer p ≥

2. Here Γ refers to Euler’s gamma-function, and ζ to Riemann’szeta-function. Compare with [ , Eq. (1.6)].Instead of using lengths of shortest directed paths, we may also study the structureof the undirected graph G n underlying the network N n . The graph distance fromthe root to a vertex in this graph needs not coincide with the length of a shortest directed path. It might be shorter. If we write H( G n ) for the maximal height with BENEDIKT STUFLER respect to the graph distance, then by construction H( G n ) ≤ H( N n ). Hence, the tailbound of Theorem 1.3 also applies to G n : P (H( G n ) > x ) ≤ C exp( − cx /n ) . (1.8)Moreover, if we let h G n ( v ) denote the graph distance of a vertex in G n from the rootand construct the height process for G n accordingly, then it is clear from the proofof Theorem 1.2 that there exists a constant b ′ k ≥ b k such that( b ′ k n − / h G n ( v s | G n | ) : 0 ≤ s ≤ d −→ ( e ( s ) : 0 ≤ s ≤ N n . (As opposed to the maximal length of a shortestdirected or undirected path from the root to any vertex in N n .) It is clear from theproofs of Theorems 1.2 and 1.3 that (1.3), (1.5), (1.6), and (1.7) hold analogouslyfor these parameters, if we replace the constant b k by some constant b ′′ k > b ′′′ k > Theorem 1.4 . —

Let G n be the graph underlying the random level- k phylogeneticnetwork N n , let b ′ k > be as in (1.9) . Let µ n denote either the uniform measure onthe vertices or on the leaves of G n . Let ( T e , d T e , µ T e ) denote the Brownian continuumrandom tree. Then ( G n , b ′ k n − / d G n , µ n ) d −→ ( T e , d T e , µ T e ) as n → ∞ . (1.10) in the Gromov–Hausdorﬀ–Prokhorov sense. The intuition behind Theorem 1.4 is that although G n contains a linear numberof cycles, the global shape is tree-like as the cycles are so short (at most O (log n ) incircumference) that they contract to points when rescaling distances by n − / . Carehas to be taken that they nevertheless inﬂuence the global shape. A path betweentwo typical points in G n has length roughly √ n and traverses roughly √ n cycles,thus they distort the distance on average by a stretch factor.The Brownian continuum random tree ( T e , d T e , µ T e ) was introduced and studied inthe series of pioneering papers [

14, 15, 16 ]. In some sense, it’s a random “continuum”tree with uncountably many points. Formally, it may be deﬁned as the randommetric space T e corresponding to the random semi-metric d ( x, y ) = e ( x ) + e ( y ) − min( x,y ) ≤ t ≤ max( x,y ) e ( t ) , x, y ∈ [0 , µ T e is the push-forward of the uniform measure on[0 ,

1] under the canonical surjection [0 , → T e . The Brownian continuum randomtree is universal in the sense that it describes the asymptotic geometry of a variety ofdiﬀerent models of random graphs [

17, 18, 19, 20 ], hence linking seemingly unrelatedmodels of random structures.

BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 5

The notion of Gromov–Hausdorﬀ–Prokhorov convergence was introduced in [ ].It allows us to assign a distance to a pair ( X, d X , µ X ) and ( Y, d Y , µ Y ) of compactmetric spaces equipped with Borel probability measures. We refer the reader to [ ,Sec. 6] for a detailed introduction. How does the vicinity of the root ora random vertex (or random leaf) in N n evolve as n becomes large? The frequencieswith which we observe given shapes converge, allowing us to describe the asymptoticlocal shape of N n via random networks having a countably inﬁnite number of vertices.In order to make this precise, we require some notation. For any integer ℓ ≥ U ℓ ( · ) that maps a pair of a phylogenetic network N and one of its vertices v to the ℓ -neighbourhood subnetwork U ℓ ( N, v ) induced by allvertices that may be reached from v via an undirected path of length at most ℓ . Thatis, all vertices that may be reached from v by crossing at most ℓ edges, regardlesswhether we follow the direction of the edges or not. We consider the vertices of U ℓ ( N, v ) as unlabelled, and the edges as directed. If v is equal to the root vertex of N we may simply write U ℓ ( N ). Theorem 1.5 . —

There is a random inﬁnite level- k network ˆ N such that N n d −→ ˆ N as n → ∞ (1.12) in the local topology. Even stronger, for any sequence ℓ n of positive integers satisfying ℓ n = o ( √ n ) it holds that d TV ( U ℓ n ( N n ) , U ℓ n ( ˆ N )) → . (1.13)The limit (1.12) states that for each integer ℓ ≥ G it holds that P ( U ℓ ( N n ) = G ) → P ( U ℓ ( ˆ N ) = G ) . (1.14)Thus, (1.13) is a much stronger statement, as it allows us to describe the asymptoticshape of larger neighbourhoods of the root. The assumption ℓ n = o ( √ n ) is as generalas possible: Thm. 1.2 implies that for any constant but arbitrarily small ǫ > ℓ n = ⌊ ǫ √ n ⌋ .It is natural to also study what happens in the vicinity of a uniformly selectedvertex of N n . Theorem 1.6 . —

Let u n denote a uniformly selected vertex of N n . There is arandom inﬁnite directed vertex marked graph ˆ N ∗ such that ( N n , u n ) d −→ ˆ N ∗ as n → ∞ (1.15) in the local topology. Even stronger, for any sequence ℓ n of positive integers satisfying ℓ n = o ( √ n ) it holds that d TV ( U ℓ n ( N n , u n ) , U ℓ n ( ˆ N ∗ )) → . (1.16) BENEDIKT STUFLER

Moreover, for any integer ℓ ≥ and any ﬁnite directed unlabelled vertex-markedgraph G the number N ℓ,G of vertices u in N n with U ℓ ( N n , u ) = G satisﬁes N ℓ,G | N n | p −→ P ( U ℓ ( ˆ N ∗ ) = G ) . (1.17)Borrowing terminology from statistical physics, (1.15) may be called annealedlocal convergence, and (1.17) quenched local convergence.We might also wonder what happens in the vicinity of a uniform random leaf w n of N n . Theorem 1.6 actually entails a local limit for the vicinity of w n as well.Indeed, it follows directly from Theorem 1.6 that | N n | n p −→ P (marked vertex of ˆ N ∗ is a leaf) . (1.18)Setting ˆ N ∗ l := ( ˆ N ∗ | marked vertex of ˆ N ∗ is a leaf) , (1.19)it follows that ( N n , w n ) d −→ ˆ N ∗ l (1.20)and even d TV ( U ℓ n ( N n , w n ) , U ℓ n ( ˆ N ∗ l )) → . (1.21)Likewise, it follows that the number N leaf ℓ,G of leaves whose ℓ -neighbourhood equals G satisﬁes N leaf ℓ,G n p −→ P ( U ℓ ( ˆ N ∗ l ) = G ) . (1.22)Using general principles for locally convergent random graphs [ ], Theorem 1.6entails laws of large numbers for subgraph counts. That is, the number of copiesof some given ﬁnite graph in N n concentrates at a constant multiple of n , with theconstant factor being given by the expected value of some functional of ˆ N ∗ . See [ ,Lem. 4.3] for details. Notation. —

Unless otherwise stated, all unspeciﬁed limits are taken as n → ∞ .The arrows p −→ , d −→ , and a.s. −→ denote convergence in probability, convergence indistribution, and almost sure convergence. Equality in distribution is denoted by d = . The total variation distance between measures and random variables is denotedby d TV . For a vertex v in a directed graph N its indegree refers to the number ofedges in N whose destination is v . The outdegree of v is the numberof edges in N whose source is v . We will refer to a vertex w as a child of v if there is a directededge from v to w in N . In a rooted undirected tree we use the notions “outdegree”and “child” as if all edges were directed as pointing away from the root. BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 7

2. A bijective encoding in terms of decorated trees

Throughout the following, we ﬁx an integer k ≥ k -network in order to refer to binary rooted phylogenetic level- k networks. The collec-tion of k -networks on a given set X will be denoted by N [ X ]. Recall that we admitthe k -network consisting of a single labelled root vertex with no edges.In [

23, 24 ] a decompositions of level- k networks into smaller simple networks wasdeveloped. In this section we explain how this leads to a blow-up procedure for therandom generation of k -networks. k networks are blow-ups of decorated trees. — Let N be a k -networkand let v be a vertex of N with outdegree 2. We say the vertex v splits (or is splitting ), if N contains no vertex that may be reached via a directed path fromboth children of v . That is, we cannot walk from each of its two children to thesame vertex by only crossing edges in accordance with their direction.This terminology allows us to diﬀerentiate three types of k -networks:1. The trivial k -network consisting of a single root vertex and nothing else.2. k -networks where the root has outdegree 2 and splits.3. k -networks where the root has outdegree 2 and does not split.It is easy to describe a k -network N on a set X where the root has outdegree 2and splits. N is obtained in a unique way by taking an unordered pair of k -networkswith disjoint leaf sets that partition X , and adding a root vertex that is the sourceof two directed edges that point to the respective roots of these two k -networks.Now, suppose that N is a k -network where the root has outdegree 2 and does notsplit. Then it is contained in a unique block B with at least 4 vertices. Verticeswith indegree 1 and outdegree 1 in B are tree nodes of N , and hence the sourceof a bridge of N whose destination lies outside of B . Likewise, vertices of B withindegree 2 and outdegree 0 are reticulation nodes of N , and the source of a bridgeof N whose destination lies outside of B . Let S be the network obtained from B by additionally adding these bridges and their destinations, which then correspondprecisely to the leaves of S . We say S is a simple network , and B is its core . Thus,the network N may be obtained from the k -network S by identifying the leaves of S with the roots of smaller k -networks. The leaf sets of the k -networks attached to S in this way partition the set X into non-empty disjoint subsets. We may view theleaf set of a k -network attached to a leaf of S as the label of the leaf. Thus, any k -network N on a set X whose root does not split may be obtained in a unique wayby forming a partition M of X into non-empty subsets, choosing a simple k -network S on M , and for each partition class C ∈ M we identify the leaf of S labelled by C with the root of a k -network on C .Letting N ( k, n ) , B ( k, n ) ≥ k -networks and simple k -networks on a given n -element set, it follows by standard combinatorial tools [ BENEDIKT STUFLER

26, 27 ] that the exponential generating series N ( z ) := X n ≥ N ( k, n ) z n n ! and B ( z ) := X n ≥ B ( k, n ) z n n !satisfy the equation N ( z ) = z + N ( z ) B ( N ( z )) . (2.1)Moreover, it also follows that to each k -network N on a set X of at least twoleaves we may associate a k -network head( N ), which is either a simple network onsome partition (with non-empty partition classes) of X , or it is the cherry network(consisting of a root with two children) on a 2-partition of X (with non-emptypartition classes). The network N is obtained by identifying the leaves of its headwith the roots of the subnetworks on the corresponding partition classes.Whenever one of these subnetworks is not the trivial network consisting of a singleleaf, it has a head-structure of its own and is constructed from this head structureand even smaller subnetworks. We may proceed recursively in this way until onlytrivial subnetworks are left. In this way, we may form a pair Λ( N ) := ( T, δ ) of arooted unordered tree T whose leaves are labelled bijectively with the element of X ,and a function δ that assigns to each inner vertex v of T a head structure δ ( v ) asits decoration . We say ( T, δ ) is a decorated tree . The formal deﬁnition is as follows:1. If a network N is trivial, that is it consists of a single leaf and nothing else, welet T be given by a rooted tree consisting of a single vertex (labelled like theleaf of N ). As T has no inner vertices, δ is the trivial function with an emptydomain.2. If a network N has at least two leaves, it consists of the network head( N ) withsome number ℓ ≥ N , . . . , N ℓ attached to theleaves of head( N ). We let T be the tree whose root o has ℓ children, such thatthe i th child is the root of the tree corresponding (recursively) to N i . We set δ ( o ) = head( N ) and extend δ according to the decorations in Λ( N ) , . . . , Λ( N ℓ ).Thus: Lemma 2.1 . —

For each ﬁnite non-empty set X , the function Λ ( = Λ X ) is a bi-jection between the collection N [ X ] of k -networks on X , and the collection P [ X ] ofpairs ( T, δ ) , where T is a rooted unordered tree whose leaves are bijectively labelledwith the elements of X , and δ is a function that assigns to each inner vertex v of T a head structure (that is, a cherry network or a simple network) on the children of v (or equivalently the corresponding partition of X ). Lemma 2.1 identiﬁes the combinatorial class of k -networks with a special caseof a class of Schr¨oder-enriched parenthesizations. In general, if we have a class ofcombinatorial structures where each structure has a “size” given by a positive integer,we may form the corresponding class of Schr¨oder-enriched parenthesizations. It isthe collection of all unordered rooted trees where each inner vertex has a decorationgiven by a structure whose size agrees with the outdegree of the vertex. See [ ] for BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 9 details. In the present case, the structures are the head networks, and the “size” ofa head network is its number of leaves.The inverse Λ − function of Λ may be described as a blow-up procedure, thatmaps a decorated tree ( T, δ ) to the network obtained by “blowing up” each innervertex v of T by its decoration δ ( v ). That is, we delete the edges between v andits children, identify the root of δ ( v ) with v , and identify each child of v with thecorresponding leaf in δ ( v ). Let P n be uni-formly selected from the collection P [ X ] for X := { , . . . , n } , n ≥

2. Since,Λ − : P [ X ] → N [ X ] is a bijection, it follows that the k -network Λ − ( P n ) is dis-tributed like the uniform k -network N n on X . Hence we may assume without lossof generality that N n = Λ − ( P n ) . (2.2)Letting H ( k, i ) ≥ i leaves for all i ≥ H ( z ) := X i ≥ H ( k, i ) i ! z i = B ( z ) + z / . (2.3)We refer to w := ( H ( k, i ) /i !) i ≥ as a weight-sequence . For each set Y we let H [ Y ] de-note the collection of head-structures with leaves labelled bijectively by the elementsof Y . For ease of notation, we set H [ d ] := H [ { , . . . , d } ](2.4)for all integers d ≥ (planted) plane tree is a rooted ordered unlabelled tree. The children of anyof its vertices are endowed with a linear order. We refer the reader to [ , Sec.1.2.2] for a detailed introduction to this type of trees. The following procedure,formulated in [ , Lem. 6.7] for general Schr¨oder enriched parenthesizations, allowsus to generate P n from a random (weighted) plane tree with n leaves: Proposition 2.2 ( [ , Lem. 6.7] ) . — Set p = 1 and p i = H ( k, i ) /i ! for all i ≥ .The outcome ( τ n , δ n ) of the following procedure is distributed like P n .1. Generate a random plane tree τ n with n leaves according to its distribution P ( τ n = T ) = ( X P Y v ∈ P p d + P ( v ) ) − Y v ∈ T p d + T ( v ) , (2.5) with the sum-index P ranging over the ﬁnite collection of plane trees with n leaves, where each internal vertex has outdegree at least two.2. For each inner vertex v of τ n sample a head-structure δ n ( v ) ∈ H [ d + τ n ( v )](2.6) uniformly at random.3. Choose a bijection σ between the set of leaves of τ n and X uniformly at random,and distribute labels to the leaves of τ n accordingly. BENEDIKT STUFLER

Note that the decorated tree P n is unordered, whereas in ( τ n , δ n ) the children ofany vertex are endowed with a linear order. If we forget about these linear orders,we obtain a decorated unordered tree that is distributed like P n . Note also that thedistribution of τ n in (2.5) depends on the weight sequence w .

3. Asymptotic enumeration

In this section we will prove Lemma 1.1 and identify τ n as a critical Galton–Watson tree conditioned on having n leaves.The number of possible simple components in k -networks is inﬁnite. However,[ ] developed a decomposition of simple networks in terms of so-called generators .A generator is a directed multi-graph obtained from the core of a simple network bycontracting each vertex of the core with in-degree 1 and outdegree 1. Conversely,simple networks are precisely the networks obtained by blowing up edges of gener-ators into paths (that is, replacing the edge by a path with at least one edge), andafterwards adding outgoing edges to all vertices with indegree 1 and outdegree 1,and all vertices with indegree 2 and outdegree 0. However, care has to be takenwhen a generator contains multi-edges, that is, pairs of vertices joined by exactlytwo edges. In this case, for each such pair we have to blow-up at least one of thetwo edges by a path with at least two edges.For all i, j, ℓ ≥ G ( k, i, j, ℓ ) denote the number of generators with the followingproperties:1. The number of vertices with indegree 2 is at most k .2. There are i vertices with outdegree 0, labelled from 1 to i .3. There are j edges that are not multi-edges, labelled from 1 to j .4. There are ℓ pairs of multi-edges joining the same two vertices, and each paircarries a unique label from 1 to ℓ .It follows that the series H ( z ) deﬁned in Equation (2.3) satisﬁes H ( z ) = z X i,j,ℓ ≥ G ( k, i, j, ℓ ) i ! j ! ℓ ! z i − z ) j (cid:18) z − z + z − z ) (cid:19) ℓ . (3.1)The total number of generators that contain at most k vertices with indegree 2is ﬁnite, see [

23, 24 ]. Consequently, it follows from (3.1) that there is a bivariatepolynomial f ( z, w ) = 0 with f (0 , w ) = 0 = f ( z,

0) such that H ( z ) = z f ( z, (1 − z ) − ) . (3.2)This crucial equation entails that H ( z ) has radius of convergence 1 andlim t ր H ′ ( t ) = ∞ . (3.3) BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 11

Let 0 < t < H ′ ( t ) = 1. Let ξ be a random non-negativeinteger with distribution given by P ( ξ = ℓ ) =  p ℓ t ℓ − ℓ ≥ − P i ≥ p i t i − ℓ = 0 . (3.4)By the choice of t , this is well-deﬁned and E [ ξ ] = H ′ ( t ) = 1 . (3.5)As t <

1, it follows that ξ has ﬁnite exponential moments. That is, there is an ǫ > E [(1 + ǫ ) ξ ] < ∞ . (3.6)Recall that a ξ -Galton–Watson τ is a random plane tree that starts with a singleroot vertex and where each vertex receives oﬀspring according to an independentcopy of ξ . Thus, if T is a plane tree with n leaves, P ( τ = T ) = Y v ∈ T P ( ξ = d + T ( v )) = P ( ξ = 0) n t n − Y v ∈ T p d + T ( v ) . (3.7)Thus, P ( τ = T ) is equal to the product of P ( τ n = T ) and a factor that only dependson n (and not on T ). Hence, τ n is distributed like the result of conditioning τ onthe event that its number L ( τ ) of leaves is equal to n . That is, τ n d = ( τ | L ( τ ) = n ) . (3.8)Equations (3.5) and (3.6) allow us to apply a general result [ , Thm. 3.1] for thenumber of leaves in a critical Galton–Watson tree, yielding P ( L ( τ ) = n ) ∼ s P ( ξ = 0)2 π V [ ξ ] n − / (3.9)The probability generating function Z ( z ) := E [ z L ( τ ) ] for L ( τ ) satisﬁes the recursiveequation Z ( z ) = z P ( ξ = 0) + X ℓ ≥ P ( ξ = ℓ ) Z ( z ) ℓ = z (1 − H ( t ) /t ) + H ( t Z ( z )) /t . (3.10)Equation (3.5) entails that t = X i ≥ p i t i > H ( t ) . (3.11)Hence it follows from Equation (3.10) that t Z ( z/ ( t − H ( t ))) = z + H ( t Z ( z/ ( t − H ( t )))) . (3.12)Equation (2.1) entails that N ( z ) = z + H ( N ( z )), hence N ( z ) = t Z ( z/ ( t − H ( t ))) . (3.13)In other words, it holds for all n ≥ N ( k, n ) = P ( L ( τ ) = n ) t ( t − H ( t )) − n . (3.14) BENEDIKT STUFLER

Using (3.9), it follows that N ( k, n ) ∼ s P ( ξ = 0)2 π V [ ξ ] t n − / ( t − H ( t )) − n (3.15)This proves Lemma 1.1 for a k := q P ( ξ =0)2 π V [ ξ ] t and ρ k = t − H ( t ).

4. The asymptotic global shape

In the present section we prove Theorems 1.2, 1.3, and 1.4.

The following observation is easy, but we are going to use itoften enough to state it explicitly:

Proposition 4.1 . —

Let d ≥ be an integer. Then the number of vertices | H | ina head structure H ∈ H [ d ] is bounded by d + k ) .Proof . — This follows from the fact that any simple network has at most k reticu-lation vertices, and all other vertices either have outdegree 2 or 0:We may build any simple network from a cherry network (consisting of a rootvertex with two children) step by step, where in each step we either add two childrento a leaf, or fuse two leaves together and add a single child to the newly createdreticulation vertex. (Please note that not every network created in this way is asimple network.) The ﬁrst kind of step increases the total number of vertices bytwo and increases the number of leaves by 1. The second kind of step leaves thetotal number of vertices invariant and decreases the number of leaves by 1. Letting s and s denote the number of steps of kind 1 and 2, the resulting network has V := 3 + 2 s vertices and L := 2 + s − s leaves. Since we can have at most k reticulation vertices, s ≤ k holds, and hence2( L + k ) ≥ s ) ≥ V. Since a head structure from H [ d ] has d leaves, it follows that it has at most 2( d + k )vertices.We are also going to need a bound for the maximal outdegree of τ n : Proposition 4.2 . —

There is a constant

C > such that the maximal outdegree ∆( τ n ) of τ n satisﬁes ∆( τ n ) ≤ C log n (4.1) with a probability that tends to as n → ∞ .Proof . — Since P ( ξ = 1) = 0, it follows that τ n has (almost surely) no inner verticeswith outdegree 1. Consequently, the number | τ n | of vertices of τ n is bounded by thenumber of vertices of a binary tree with n leaves, that is, | τ n | ≤ n − . (4.2) BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 13

Using (3.9), it follows that for any x > P (∆( τ n ) > x ) = P ( L ( τ ) = n, ∆( τ ) > x ) P ( L ( τ ) = n )(4.3) ≤ P ( L ( τ ) = n ) − n P ( ξ > x ) ≤ O ( n / ) P ( ξ > x ) . By (3.6) it follows that there are constants

C, c > P ( ξ > x ) ≤ C exp( − cx )(4.4)for all x >

0. Taking x = C ′ log n for a suﬃciently large constant C ′ >

0, it followsthat P (∆( τ n ) > C ′ log n ) → n → ∞ . Since E [ ξ ] = 1, we may deﬁne the size-biased randompositive integer ˆ ξ by P ( ˆ ξ = i ) = i P ( ξ = i ) . (4.6)For each integer ℓ ≥

0, we deﬁne the size-biased Galton–Watson tree ˆ τ ( ℓ ) as a randomﬁnite plane tree with a marked vertex having height ℓ . For ℓ = 0, we let ˆ τ (0) denotean independent copy of the Galton–Watson tree with a marked vertex that coincideswith its root vertex. Inductively, we deﬁne ˆ τ ( ℓ ) for ℓ ≥ ξ . A child ofthe root is selected uniformly at random and identiﬁed with an independent copy ofˆ τ ( ℓ − . Each of the remaining children of the root gets identiﬁed with an independentcopy of τ , that is, a fresh independent copy for each remaining child.Thus, the vertices of ˆ τ ( ℓ ) that receive oﬀspring according to ˆ ξ (as opposed to ξ )together with the marked vertex of ˆ τ ( ℓ ) form a path of length ℓ from the root to themarked vertex of ˆ τ ( ℓ ) .It is elementary to verify that for each ﬁnite plane tree T and each vertex v of T with height ℓ it holds that P (cid:16) ˆ τ ( ℓ ) = ( T, v ) (cid:17) = Y u ∈ T P (cid:0) ξ = d + T ( u ) (cid:1) = P ( τ = T ) . (4.7)Any locally ﬁnite rooted tree T may be decorated in a canonical random way bychoosing for each inner vertex v a decoration from H [ d + T ( v )] uniformly at random,independently from the remaining decorations. Thus we may form canonical randomdecorations ( τ, δ ) and (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ). It follows from (4.7) that for any deterministicdecoration γ of T P (cid:16) (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ) = ( T, γ ) (cid:17) = P ( τ = T ) Y u ∈ T,d + T ( u ) > |H [ d + T ( u )] | (4.8) = P (( τ, δ ) = ( T, γ )) . BENEDIKT STUFLER

We let the height h N ( v ) of a vertex u in anetwork N be the length of a shortest directed path from the root of N to u . Usingthe decorated trees ( τ, δ ) and (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ), we are now ready to prove the following: Lemma 4.3 . —

Let η denote the length of a shortest directed path from the root ofa uniformly selected network from H [ ˆ ξ ] to a leaf that is uniformly selected among its ˆ ξ leaves. With a probability that tends to as n → ∞ , any vertex v of τ n has theproperty, that the corresponding vertex u in N n = Λ − ( P n ) satisﬁes | h N n ( u ) − E [ η ]h τ n ( v ) | ≤ n / . (4.9) Proof . — Let us take a closer look at the network Λ − (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ). The path v , . . . , v ℓ from the root v of ˆ τ ( ℓ ) to its marked vertex v ℓ corresponds to vertices u , . . . , u ℓ in Λ − (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ). In particular, u coincides with the root of Λ − (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ). Everypath in Λ − (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ) from u to u ℓ must pass through u , . . . , u ℓ − and is entirelycontained in the subnetworks corresponding to ˆ δ ( ℓ ) ( v ) , . . . , ˆ δ ( ℓ ) ( v ℓ − ). The lengthof a shortest directed path from u to u ℓ is distributed like the sum η + . . . + η ℓ of ℓ independent copies η , . . . , η ℓ of η .It follows from Equation (3.6) and Proposition 4.1 that η has ﬁnite exponentialmoments, that is, E [(1 + ǫ ) η ] < ∞ . (4.10)for some ǫ >

0. Let ˜ η := η − E [ η ]. It follows that there is a constant c > λ > E [exp( λ ˜ η )] ≤ cλ and E [exp( − λ ˜ η )] ≤ cλ . (4.11)Using Markov’s inequality, it follows that for all x > P ( η + . . . + η ℓ − ℓ E [ η ] > x ) ≤ P exp λ ℓ X i =1 ( η i − E [ η ]) ! > exp( λx ) ! (4.12) ≤ E [exp( λ ˜ η )] ℓ exp( λx ) ≤ (1 + cλ ) ℓ exp( λx ) . Repeating the same argument for − ˜ η instead of ˜ η , we arrive at P ( | η + . . . + η ℓ − ℓ E [ η ] | > x ) ≤ cλ ) ℓ exp( λx ) . (4.13)Taking x = n / and λ = n − / , it follows that uniformly for all 1 ≤ ℓ ≤ √ n log n P ( | η + . . . + η ℓ − ℓ E [ η ] | > n / ) ≤ exp( − Θ( n / ))(4.14)for all ℓ ≥ ] entails that the height of the tree τ n admits a distributionallimit when rescaled by n − / . In particular, it is smaller than √ n log n with a BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 15 probability that tends to 1 as n → ∞ . Thus, the probability that a “bad” vertexexists in ( τ n , δ n ) such that Inequality (4.9) fails is bounded by o (1) + √ n log n X ℓ =1 P (( τ n , δ n ) contains a “bad” vertex with height ℓ ) . (4.15)We know by (4.2) that τ n has at most 2 n vertices in total. Critically applyingEquation (4.8) and using Inequality (4.14) and Equation (3.9), it follows that thesum in (4.15) may be bounded by P ( L ( τ ) = n ) √ n log n X ℓ =1 P (( τ, δ ) contains a “bad” vertex with height ℓ )(4.16) ≤ O ( n / ) √ n log n X ℓ =1 (2 n ) P (the marked vertex of (ˆ τ ( ℓ ) , ˆ δ ( ℓ ) ) is bad) ≤ O ( n / ) √ n log n X ℓ =1 P ( | η + . . . + η ℓ − ℓ E [ η ] | > n / ) . By Inequality (4.14), it follows that this upper bound may be further bounded by O ( n log n ) exp( − Θ( n / )), which tends to zero as n → ∞ . This completes theproof.For ease of notation, in everything that follows we will simply consider a vertex v ∈ τ n of the tree τ N also as a vertex v ∈ N n of the network N n . This saves us fromrepeatedly writing “the vertex of N n that corresponds to v ∈ τ n ”.Note that N n is likely to have more vertices than τ n . Each head structure δ n ( v ) for v ∈ τ n may contribute a surplus of vertices that do not correspond to any verticesof τ n . These are precisely the non-root non-leaf vertices of δ n ( v ). We will needﬁne-grained information on the growth of the surplus: Lemma 4.4 . —

Let v , . . . , v | τ n | denote the depth-ﬁrst-search ordered list of verticesof τ n . For each head structure H let S ( H ) denote the number of surplus vertices of H , that is, the number of non-root non-leaf vertices of H . Let κ denote the numberof surplus vertices of a uniform random head structure from H [ ξ ] . With a probabilitythat tends to as n → ∞ , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ℓ X i =1 S ( δ n ( v i )) − ℓ E [ κ ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n / (4.17) holds for all ≤ ℓ ≤ | τ n | .Proof . — Note that κ has ﬁnite exponential moments: ξ has ﬁnite exponential mo-ments by (3.6) and any head structure from H [ ξ ] has at most 2( ξ + k ) vertices byProposition 4.1. Hence κ ≤ ξ + k ), entailing that E [(1 + ǫ ) κ ] < ∞ (4.18)for some ǫ > BENEDIKT STUFLER

Let κ , κ , . . . denote independent copies of κ . The ﬁnite exponential momentsproperty (4.18) allows us to argue analogously as for Inequality (4.13), yielding thatthere is a constant c > λ > x > ℓ ≥ P ( | κ + . . . + κ ℓ − ℓ E [ κ ] | > x ) ≤ cλ ) ℓ exp( λx ) . (4.19)Taking x = n / and λ = n − / , it follows that uniformly for all 1 ≤ ℓ ≤ n P ( | κ + . . . + κ ℓ − ℓ E [ κ ] | > n / ) ≤ exp( − Θ( n / )) . (4.20)Recall that τ n has at most 2 n vertices by Inequality (4.2). Hence the probabilitythat there is a “bad” integer 1 ≤ ℓ ≤ | τ n | for which (4.17) fails is bounded by P ( L ( τ ) = n ) − n X m = n m X ℓ =1 P ( | τ | = m, ℓ is “bad”)(4.21)Let ξ , ξ , . . . denote independent copies of ξ , and for each i ≥ H ( ξ i ) be uni-formly selected from H [ ξ i ]. Thus, S ( H ( ξ )) , S ( H ( ξ )) , . . . are independent copiesof κ .The depth-ﬁrst-search ordered outdegrees of τ may be described by ( ξ , . . . , ξ L )with L denoting the ﬁrst integer for which P Li =1 ( ξ i −

1) = −

1. The decorations( δ ( v )) v ∈ τ may be described by H , H , . . . . Thus the event that | τ | = m and that ℓ is “bad” implies that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ℓ X i =1 S ( H i ) − ℓ E [ κ ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > n / (4.22)Using (4.20) and (3.9), it follows that the upper bound (4.21) may be boundedfurther by Θ( n / )2 n exp( − Θ( n / )) . (4.23)This bound tends to zero as n → ∞ . Hence the proof is complete.We are now ready to prove Theorems 1.2, 1.3, and 1.4. Proof of Thm. 1.2 . — Let v , . . . , v | τ n | denote the depth-ﬁrst-search ordered list ofvertices of τ n , which we will also consider as vertices of N n = Λ − ( P n ). For each1 ≤ i ≤ | τ n | let s i denote the size of the surplus S ( δ n ( v i )) and let v i, , . . . , v i,s i denotethe vertices of the surplus. Thus,( u , . . . , u | N n | ) := ( v , v , , . . . , v ,s , . . . , v | τ n | , v | τ n | , , . . . , v | τ n | ,s | τn | )(4.24)is an ordering of the vertices of N n .There is a constant C > n → ∞ ,it holds for all 1 ≤ i ≤ | τ n | and all 1 ≤ j ≤ s i that | h N n ( v i ) − h N n ( v i,j ) | ≤ C log n. (4.25) BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 17

To see this, note that the diﬀerence of distances in (4.25) is bounded by the numberof vertices | δ n ( v i ) | of the head structure δ n ( v i ). By Proposition 4.1, this is at most2( d + τ n ( v i ) + k ). by Proposition 4.2 this may be further bounded by2( d + τ n ( v i ) + k ) ≤ τ n ) + k ) = O (log n ) . Hence (4.25) holds with high probability for all i and j .Note further that for each 1 ≤ r ≤ | N n | there is a unique index 1 ≤ i ( r ) ≤ | τ n | such that either u r = v i ( r ) or u r = v i ( r ) ,j for some 1 ≤ j ≤ s i ( r ) . By Inequality (4.25)it follows that | h N n ( u r ) − h N n ( v i ( r ) ) | ≤ C log n (4.26)holds for all 1 ≤ r ≤ | τ n | with a probability that tends to 1 as n → ∞ . By Lemma 4.3and the triangle inequality, it follows that | h N n ( u r ) − E [ η ]h τ n ( v i ( r ) ) | ≤ O ( n / ) . (4.27)Furthermore, Lemma 4.4 (where S ( δ n ( v i )) corresponds to the notation s i here)implies that with probability tending to 1 as n → ∞|| N n | − (1 + E [ κ ]) | τ n || ≤ n / (4.28)and for all 1 ≤ r ≤ | N n | | r − (1 + E [ κ ]) i ( r ) | ≤ O ( n / ) . (4.29)By the main result of [ ], there is a constant b > τ n ( v t | τ n | ) : 0 ≤ t ≤

1) (with linear interpolation between values h τ n ( v i ) for 1 ≤ i ≤| τ n | ) satisﬁes ( bn − / h τ n ( v t | τ n | ) : 0 ≤ t ≤ d −→ ( e ( s ) : 0 ≤ s ≤ C ([0 , , R ).Now it comes all together: By (4.27), (4.28), (4.29), and (4.30) (and the fact n ≤ | τ n | ≤ n from (4.2)) it follows that( b E [ η ] − ) n − / h N n ( v t | N n | ) : 0 ≤ t ≤ d −→ ( e ( s ) : 0 ≤ s ≤ . (4.31)Thus, Equation (1.2) holds with b k := b/ E [ η ] . (4.32)Let w , . . . , w n denote the depth-ﬁrst-search ordered list of leaves of τ n . Thus, w , . . . , w n also correspond to the leaves of N n . The convergence( b E [ η ] − ) n − / h N n ( w tn ) : 0 ≤ t ≤ d −→ ( e ( s ) : 0 ≤ s ≤ . (4.33)follows by analogous arguments, since the number of vertices between consecutiveleaves in the depth-ﬁrst-search ordered list of vertices of τ n may be controlled in anentirely analogous fashion as the surplus of the head structures, ensuring that( bn − / h τ n ( w tn ) : 0 ≤ t ≤ d −→ ( e ( s ) : 0 ≤ s ≤ . (4.34)This completes the proof of Theorem 1.2. BENEDIKT STUFLER

Proof of Theorem 1.3 . — We want to show that there are constants

C, c > x > n ≥ P (H( N n ) > x ) ≤ C exp( − cx /n ) . (4.35)By (4.2) we know that | τ n | ≤ n . By Proposition 4.1 it follows that the totalnumber | N n | of vertices in N n satisﬁes | N n | = 1 + X v ∈ τ n ( | δ n ( v ) | − ≤ X v ∈ τ n ( d + τ n ( v ) + k )= 1 + 2( | τ n | −

1) + 2 | τ n | k ≤ n ( k + 1) . Consequently, H( N n ) ≤ n ( k + 1).Hence it suﬃces to show (4.35) for x ≤ n ( k + 1). Moreover, we may always take C large enough (depending on c ) so that C exp( − c ) >

1, implying that (4.35) isautomatically fulﬁlled for all 0 < x < √ n . Thus, it suﬃces to show the existence of c, C > √ n ≤ x ≤ n ( k + 1) . (4.37)The main theorem of [ ] establishes tail-bounds for the height of critical Galton–Watson trees conditioned on having n vertices if the oﬀspring distribution has ﬁnitevariance. In [ , Lem. 6.61, Eq. (6.41)] this result and further observations from [ ]were used to deduce the similar bounds for blow-ups of such trees. By [ ], [ , Sec.6.1.7] we can view a Galton–Watson tree conditioned on having n leaves as a blow-up of a diﬀerent Galton–Watson tree conditioned on having n vertices, allowing usto apply [ , Lem. 6.61, Eq. (6.41)] to deduce that there are constants c , C > x > n ≥ P (H( τ n ) > x ) ≤ C exp( − c x /n ) . (4.38)Let ǫ > ǫ later on. Inequality (4.38)entails that P (H( τ n ) > ǫx ) ≤ C exp( − c ǫ x /n ) . (4.39)A vertex of N n with maximal height may either be a vertex that pertains to τ n ora vertex that lies in the surplus of some head structure. (The later case is possiblesince we look at directed paths. Thus a vertex with maximal height in N n is not necessarily a leaf of N n .) If such a vertex u lies in the surplus of some head structure δ n ( v ) for v ∈ τ n , then h N n ( u ) ≤ h N n ( v ) + | δ n ( v ) | . Hence, if h N n ( u ) > x , then h N n ( v ) > x/ | δ n ( v ) | > x/

2. It follows that(4.40) P (H( τ n ) ≤ ǫx, H( N n ) > x ) ≤ P (max v ∈ τ n | δ n ( v ) | > x/

2) + P (H( τ n ) ≤ ǫx, max v ∈ τ n h N n ( v ) > x/ . BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 19

By Proposition (4.1) and Equation (4.3) it follows that P (cid:18) max v ∈ τ n | δ n ( v ) | > x/ (cid:19) ≤ P (∆( τ n ) > x/ − k )(4.41) ≤ O ( n / ) P ( ξ > x/ − k ) . Recall that by (4.37) we assumed that √ n ≤ x ≤ n ( k + 1). Hence, using Inequal-ity (4.4) it follows that P (cid:18) max v ∈ τ n | δ n ( v ) | > x/ (cid:19) ≤ O ( n / ) exp( − Θ( x ))(4.42) ≤ exp( − Θ( x )) ≤ exp( − Θ( x /n )) . In order to bound the probability for the event that simultaneously H( τ n ) ≤ ǫx and max v ∈ τ n h N n ( v ) > x/

2, we may argue identically as for (4.16) to obtain P (H( τ n ) ≤ ǫx, max v ∈ τ n h N n ( v ) > x/ ≤ O ( n / ) ⌊ ǫx ⌋ X ℓ =1 P ( η + . . . + η ℓ > x/ . (4.43)Here η , η , . . . denote independent copies of η .Setting ǫ := 1 / (4 E [ η ]), it follows that x/ − E [ η ] ℓ ≥ x/ ≤ ℓ ≤ ǫx . Hence,by (4.13), it follows that there is a constant c > λ (independent of ℓ and x ) and all 1 ≤ ℓ ≤ ǫx P ( η + . . . + η ℓ > x/ ≤ c λ ) ℓ exp( λx/ ≤ c λ ) x/ (4 E [ η ]) exp( λx/ λ small enough, it follows that there are constants C , c > P ( η + . . . + η ℓ > x ) ≤ C exp( − c x ) . (4.45)Recall that we assumed √ n ≤ x ≤ n ( k + 1) by (4.37), hence it follows from (4.43)that P (H( τ n ) ≤ ǫx, max v ∈ τ n h N n ( v ) > x/ ≤ O ( n / ) exp( − c x )(4.46) ≤ exp( − Θ( x )) ≤ exp( − Θ( x /n )) . Combining (4.39), (4.40), (4.42) and (4.46), it follows that P (H( N n ) > x ) ≤ C exp( − c x /n )(4.47)for some constants C , c > x or n . Proof of Thm. 1.4 . — We deﬁne η ′ analogously to η , as the length of a shortest undirected path from the root of a uniformly selected network from H [ ˆ ξ ] to a leafthat is uniformly selected among its ˆ ξ leaves. That is, the path may cross edges inany direction, regardless of their orientation. BENEDIKT STUFLER

Lemma 4.3 holds analogously for undirected paths. That is, interpreting thevertices of τ n as part of G n ,sup v ∈ τ n | h G n ( v ) − E [ η ′ ]h τ n ( v ) | ≤ n / (4.48)holds with probability tending to 1 as n → ∞ . Here h G n ( v ) denotes the graphdistance d G n ( o, v ) from the root vertex o of G n to the vertex v . Hence Thm. 1.2 andall intermediate observations in its proof hold analogously for undirected paths, thatis, ( b ′ k n − / h G n ( v s | G n | ) : 0 ≤ s ≤ d −→ ( e ( s ) : 0 ≤ s ≤ b ′ k := b/ E [ η ′ ](4.50)as n → ∞ .Given two vertices v, w ∈ τ n there is a unique path Q in τ n that joins them in τ n ,but there may be several diﬀerent paths P in G n that join v and w in G n . However,we know that the vertices and edges of P are always entirely contained in the head-structures ( δ n ( u )) u ∈ Q . Let lca( u, v ) denote the lowest common ancestor of v and w in the tree τ n . The distance d τ n in the tree τ n satisﬁes d τ n ( v, w ) = h τ n ( v ) + h τ n ( w ) − τ n (lca( u, v )) . (4.51)A similar statement holds for the distance d G n in G n : If P v and P w are shortest paths in G n from v to the root o and from w to o , we may construct a shortest path P from v to w by following P v until we encounter for the ﬁrst time a vertex v ′ from δ n (lca( v, w )), walking from v ′ to the analogously deﬁned vertex w ′ from δ n (lca( v, w ))by a path that lies entirely in δ n (lca( v, w )), and then following P w (in its reversedirection) from w ′ back to w . This entails that d G n ( v, w ) = h G n ( v ) + h G n ( w ) − G n (lca( v, w )) + R ( v, w )(4.52)for an error term R ( v, w ) satisfying | R ( v, w ) | ≤ | δ n (lca( v, w )) | . (4.53)Recall that in Equation (4.24) we constructed a special ordering( u , . . . , u | N n | ) := ( v , v , , . . . , v ,s , . . . , v | τ n | , v | τ n | , , . . . , v | τ n | ,s | τn | )(4.54)of the vertices of N n . Here v , . . . , v | τ n | is a depth-ﬁrst-search ordering of the verticesof τ n , where, say, we always proceed along the left-most unvisited vertex. This way,it holds for all 1 ≤ i ≤ j ≤ | G n | thath G n (lca( v i , v j )) = min i ≤ r ≤ j h G n ( v r ) . (4.55)By Proposition 4.1 and Proposition 4.2, it follows analogously as for (4.26) thatthere is a constant C ′ > v ∈ τ n | δ n ( v ) | ≤ τ n ) + k ) ≤ C ′ log n (4.56) BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 21 with a probability that tends to 1 as n → ∞ . It follows from Equation (4.52) thathence there is a constant C > n → ∞ (cid:12)(cid:12)(cid:12)(cid:12) d G n ( u i , u j ) − (cid:18) h G n ( u i ) + h G n ( u j ) − i ≤ r ≤ j h G n ( u r ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C log n (4.57)for all 1 ≤ i ≤ j ≤ | N n | . Multiplying both sides of (4.57) by b ′ k n − / , Thm. 1.4 nowfollows from (4.49) and the deﬁnition (1.11) of the metric of the Brownian continuumrandom tree.

5. The asymptotic local shape

Using the notation from Section 4.2, we may set(ˆ τ , ˆ δ ) := (ˆ τ ( ∞ ) , ˆ δ ( ∞ ) )(5.1)and ˆ N := Λ − (ˆ τ , ˆ δ ) . (5.2)Here, by a slight abuse of notation, we use Λ − to denote the canonical extensionof the blow-up procedure Λ − (deﬁned for ﬁnite decorated trees) to inﬁnite locallyﬁnite decorated trees. The inﬁnite network ˆ N is the local weak limit of N n as n → ∞ : Proof of Thm. 1.5 . — General results for the local convergence of conditionedGalton–Watson trees [ ] imply that τ n d −→ ˆ τ (5.3)in the local topology. That is, for any ﬁxed constant ℓ ≥ ℓ -neighbourhood of the root of τ n to assume a given shape converges as n → ∞ tothe probability for the ℓ -neighbourhood of the root of ˆ τ to assume that shape.By Skorokhod’s representation theorem we may assume that ˆ τ , τ , τ , . . . are cou-pled such that τ n a.s. −→ ˆ τ . (5.4)This entails ( τ n , δ n ) a.s. −→ (ˆ τ , ˆ δ ) . (5.5)Hence N n = Λ − ( τ n , δ n ) a.s. −→ Λ − (ˆ τ , ˆ δ ) = ˆ N . (5.6)This proves the local weak convergence in (1.6).Kersting [ , Thm. 5] described the asymptotic shape of o ( √ n )-neighbourhoodsof critical Galton–Watson trees conditioned on having n vertices (if the oﬀspringdistribution has ﬁnite variance). Using a transformation from [

12, 28 ], it follows thatthis also holds for Galton–Watson trees conditioned on having n leaves, yielding d TV ( U ℓ n ( τ n ) , U ℓ n (ˆ τ )) → BENEDIKT STUFLER for any sequence ℓ n of positive integers satisfying ℓ n = o ( √ n ). Consequently, wemay assume that ˆ τ , τ , τ , . . . are coupled such that U ℓ n ( τ n ) = U ℓ n (ˆ τ )(5.8)holds with a probability that tends to 1 as n → ∞ . Consequently, we may constructthe decorations in such a way that( U ℓ n ( τ n ) , ( δ n ( v )) v ∈ U ℓn ( τ n ) ) = ( U ℓ n (ˆ τ ) , (ˆ δ ( v )) v ∈ U ℓn (ˆ τ ) )(5.9)holds with a probability tending to 1 as n → ∞ . This entails that U ℓ n ( N n ) = U ℓ n (Λ − ( τ n , δ n )) = U ℓ n (Λ − (ˆ τ , ˆ δ )) = U ℓ n ( ˆ N )(5.10)again holds with probability tending to 1 as n → ∞ . This implies Equation (1.13)and completes the proof.The surplus vertices of the head structures inﬂuence the location of a uniformlyselected vertex. To take them into account, we form the tree T n by colouring thevertices of τ n blue and adding to each vertex v ∈ τ n additional S ( δ n ( v )) red children.We deﬁne T in same way by adding the surplus vertices of ( τ, δ ) as red children atthe appropriate vertices, making T a two-type Galton–Watson tree with oﬀspringdistribution ( ξ, κ ), with κ deﬁned as in Lemma 4.4. (This makes ξ and κ depen-dent on each other.) A uniformly selected vertex u n of N n hence corresponds to auniformly selected vertex of T n .By a general result [ , Thm. 3] for conditioned multi-type Galton–Watson trees,there is a random 2-type tree ˆ T ∗ such that( T n , u n ) d −→ ˆ T ∗ (5.11)in the local topology as n → ∞ . The tree ˆ T ∗ has a marked vertex with a ﬁnitenumber of descendants, and an inﬁnite number of ancestors, each having a randomﬁnite number of descendants in total. (This agrees with the intuition that mostvertices of T n are far from the root and have few descendants.) We may assignrandom decorations ˆ δ ∗ ( v ), v ∈ ˆ T ∗ in the same way as before (for δ , δ n , and ˆ δ ),allowing us to deﬁne the inﬁnite networkˆ N ∗ := Λ − ( ˆ T ∗ , ˆ δ ∗ ) . (5.12)The network ˆ N ∗ describes the asymptotic shape of the vicinity a random vertex of N n , similarly as ˆ N describes the vicinity of the ﬁxed root vertex of N n : Proof of Thm. 1.6 . — Applying Skorokhod’s representation theorem to (5.11), wemay assume that ˆ T ∗ , ( T , u ) , ( T , u ) , . . . are coupled such that( T n , u n ) a.s. −→ ˆ T ∗ . (5.13)This entails (( T n , u n ) , δ n ) a.s. −→ ( ˆ T ∗ , ˆ δ ∗ ) . (5.14)Hence ( N n , u n ) = (Λ − ( T n , δ n ) , u n ) a.s. −→ ˆ N ∗ . (5.15) BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 23

This proves the local weak convergence in (1.15). The proof of the extension (1.16)is entirely analogous to the proof of the corresponding statement (1.13) for the ﬁxedroot, by building on a limit [ , Eq. (6.44)] for the o ( √ n )-neighbourhood of randompoints in conditioned sesqui-type trees.In order to prove the quenched convergence in (1.17), note that stating (1.17) tohold for all ℓ ≥ G is equivalent to stating that therandom probability measure L (( N n , u n ) | N n ) given by the uniform measure on the | N n | many vertex-marked versions of N n satisﬁes L (( N n , u n ) | N n ) d −→ L ( ˆ N ∗ ) , (5.16)with L ( ˆ N ∗ ) denoting the deterministic law of ˆ N ∗ . Now, [ , Thm. 3] ensures such aquenched limit for the uniform measure L (( T n , u n ) | T n ) on the |T n | -many (= | N n | many) vertex marked versions of T n , with the deterministic limit measure given bythe law L ( ˆ T ∗ ) of ˆ T ∗ . That is, L (( T n , u n ) | T n ) d −→ L ( ˆ T ∗ ) . (5.17)Using the Chernoﬀ bounds, this implies such a limit for the decorated versions: L ((( T n , u n ) , δ n ) | T n ) d −→ L ( ˆ T ∗ , ˆ δ ∗ ) . (5.18)The limit (5.16) now follows from (5.18) by applying the blow-up procedure Λ − and the continuous mapping theorem. This completes the proof. References [1] L. van Iersel and V. Moulton, “Trinets encode tree-child and level-2 phylogeneticnetworks,”

Journal of Mathematical Biology (2014) 1707–1729.[2] C. Linder, B. Moret, L. Nakhleh, and T. Warnow, “Network (reticulate) evolution:Biology, models, and algorithms,” The Paciﬁc Symposium on Biocomputing (01, 2004).[3] M. Fuchs, G.-R. Yu, and L. Zhang, “Asymptotic enumeration and distributionalproperties of galled networks,” arXiv:2010.13324 [math.CO] .[4] F. Bienvenu, A. Lambert, and M. Steel, “Combinatorial and stochastic properties ofranked tree-child networks,” arXiv:2007.09701 [math.PR] .[5] M. Bouvel, P. Gambette, and M. Mansouri, “Counting phylogenetic networks of level1 and 2,”

Journal of Mathematical Biology (2020) 1357–1395.[6] M. Fuchs, B. Gittenberger, and M. Mansouri, “Counting phylogenetic networks withfew reticulation vertices: tree-child and normal networks,” Australas. J. Combin. (2019) 385–423.[7] C. McDiarmid, C. Semple, and D. Welsh, “Counting phylogenetic networks,” Ann.Comb. no. 1, (2015) 205–224. https://doi.org/10.1007/s00026-015-0260-2 .[8] G. Cardona and L. Zhang, “Counting and enumerating tree-child networks and theirsubclasses,” J. Comput. System Sci. (2020) 84–104. https://doi.org/10.1016/j.jcss.2020.06.001 .[9] M. Fuchs, B. Gittenberger, and M. Mansouri, “Counting phylogenetic networks withfew reticulation vertices: Exact enumeration and corrections,” arXiv:2006.15784[math.CO] .[10] B. Bollob´as,

Modern graph theory , vol. 184 of

Graduate Texts in Mathematics .Springer-Verlag, New York, 1998. https://doi.org/10.1007/978-1-4612-0619-4 . BENEDIKT STUFLER [11] I. Kortchemski, “Invariance principles for Galton-Watson trees conditioned on thenumber of leaves,”

Stochastic Process. Appl. no. 9, (2012) 3126–3172. http://dx.doi.org/10.1016/j.spa.2012.05.013 .[12] D. Rizzolo, “Scaling limits of Markov branching trees and Galton-Watson treesconditioned on the number of vertices with out-degree in a given set,”

Ann. Inst.Henri Poincar´e Probab. Stat. no. 2, (2015) 512–532. https://doi.org/10.1214/13-AIHP594 .[13] L. Addario-Berry, L. Devroye, and S. Janson, “Sub-Gaussian tail bounds for the widthand height of conditioned Galton-Watson trees,” Ann. Probab. no. 2, (2013)1072–1087. https://doi.org/10.1214/12-AOP758 .[14] D. Aldous, “The continuum random tree. I,” Ann. Probab. no. 1, (1991) 1–28. http://links.jstor.org/sici?sici=0091-1798(199101)19:1<1:TCRTI>2.0.CO;2-B&origin=MSN .[15] D. Aldous, “The continuum random tree. II. An overview,” in Stochastic analysis(Durham, 1990) , vol. 167 of

London Math. Soc. Lecture Note Ser. , pp. 23–70.Cambridge Univ. Press, Cambridge, 1991. http://dx.doi.org/10.1017/CBO9780511662980.003 .[16] D. Aldous, “The continuum random tree. III,”

Ann. Probab. no. 1, (1993) 248–289. http://links.jstor.org/sici?sici=0091-1798(199301)21:1<248:TCRTI>2.0.CO;2-1&origin=MSN .[17] N. Curien, B. Haas, and I. Kortchemski, “The CRT is the scaling limit of randomdissections,” Random Structures Algorithms no. 2, (2015) 304–327. http://dx.doi.org/10.1002/rsa.20554 .[18] K. Panagiotou, B. Stuﬂer, and K. Weller, “Scaling limits of random graphs fromsubcritical classes,” Ann. Probab. no. 5, (2016) 3291–3334.[19] B. Stuﬂer, “Limits of random tree-like discrete structures,” Probab. Surv. (2020)318–477. https://doi.org/10.1214/19-PS338 .[20] B. Stuﬂer, “Random enriched trees with applications to random graphs,” ElectronicJournal of Combinatorics no. 3, (2018) .[21] G. Miermont, “Tessellations of random maps of arbitrary genus,” Ann. Sci. ´Ec. Norm.Sup´er. (4) no. 5, (2009) 725–781. http://dx.doi.org/10.24033/asens.2108 .[22] V. Kurauskas, “On local weak limit and subgraph counts for sparse random graphs,” ArXiv e-prints (Apr., 2015) , arXiv:1504.08103 [math.PR] .[23] L. van Iersel, J. Keijsper, S. Kelk, L. Stougie, F. Hagen, and T. Boekhout,“Constructing level-2 phylogenetic networks from triplets,” in

Research incomputational molecular biology , vol. 4955 of

Lecture Notes in Comput. Sci. ,pp. 450–462. Springer, Berlin, 2008. https://doi.org/10.1007/978-3-540-78839-3_40 .[24] P. Gambette, V. Berry, and C. Paul, “The structure of level- k phylogenetic networks,”in Combinatorial pattern matching , vol. 5577 of

Lecture Notes in Comput. Sci. ,pp. 289–300. Springer, Berlin, 2009. https://doi.org/10.1007/978-3-642-02441-2_26 .[25] A. Joyal, “Une th´eorie combinatoire des s´eries formelles,”

Adv. in Math. no. 1,(1981) 1–82. http://dx.doi.org/10.1016/0001-8708(81)90052-9 .[26] F. Bergeron, G. Labelle, and P. Leroux, Combinatorial species and tree-like structures ,vol. 67 of

Encyclopedia of Mathematics and its Applications . Cambridge UniversityPress, Cambridge, 1998. Translated from the 1994 French original by MargaretReaddy, With a foreword by Gian-Carlo Rota.[27] P. Flajolet and R. Sedgewick,

Analytic combinatorics . Cambridge University Press,Cambridge, 2009. http://dx.doi.org/10.1017/CBO9780511801655 .[28] R. Ehrenborg and M. M´endez, “Schr¨oder parenthesizations and chordates,”

J.Combin. Theory Ser. A no. 2, (1994) 127–139. http://dx.doi.org/10.1016/0097-3165(94)90008-6 .[29] M. Drmota, Random trees . SpringerWienNewYork, Vienna, 2009. http://dx.doi.org/10.1007/978-3-211-75357-6 . An interplay betweencombinatorics and probability.

BRANCHING PROCESS APPROACH TO LEVEL- K PHYLOGENETIC NETWORKS 25 [30] R. Abraham and J.-F. Delmas, “Local limits of conditioned Galton-Watson trees: theinﬁnite spine case,”

Electron. J. Probab. (2014) no. 2, 19.[31] G. Kersting, “On the height proﬁle of a conditioned galton-watson tree,” arXiv:1101.3656 [math.PR] .[32] B. Stuﬂer, “Rerooting multi-type branching trees: The inﬁnite spine case,” Journal ofTheoretical Probability (2021) . https://link.springer.com/article/10.1007/s10959-020-01069-y .[33] B. Stuﬂer, “Graphon convergence of random cographs,”