Invariants for level-1 phylogenetic networks under the Cavendar-Farris-Neyman Model
aa r X i v : . [ q - b i o . P E ] F e b INVARIANTS FOR LEVEL-1 PHYLOGENETIC NETWORKS UNDER THECAVENDAR-FARRIS-NEYMAN MODEL
JOSEPH CUMMINGS, BENJAMIN HOLLERING, CHRISTOPHER MANON
Abstract.
Phylogenetic networks can model more complicated evolutionary phenomena that trees fail tocapture such as horizontal gene transfer and hybridization. The same Markov models that are used tomodel evolution on trees can also be extended to networks and similar questions, such as the identifiabilityof the network parameter or the invariants of the model, can be asked. In this paper we focus on findingthe invariants of the Cavendar-Farris-Neyman (CFN) model on level-1 phylogenetic networks. We do thisby reducing the problem to finding invariants of sunlet networks , which are level-1 networks consisting ofa single cycle with leaves at each vertex. We then determine all quadratic invariants in the sunlet networkideal which we conjecture generate the full ideal. Introduction
The field of phylogenetics aims to determine the evolutionary relationships between species which are oftenrepresented with trees. There are some evolutionary phenomena that trees are unable to capture though.Non-treelike evolutionary processes include horizontal gene transfer where genetic material is passed laterallywithin a generation or hybridization [32, 41].
Phylogenetic networks have emerged as a tool to model eventsin the evolutionary history of organisms that tree models are unable to represent. This has spurred an effortto study networks and develop methods to reconstruct them from data. Many results have already beenobtained on the combinatorial properties of networks and many current methods for constructing networksare combinatorial in nature [25, 36]. Other methods that have been used to infer trees have also beenextended to networks such as maximum parsimony [26], maximum likelihood [27], and neighbor joining [7].Recently, there has been work on the algebraic structure of network models motivated by the advancesthat algebraic methods achieved for tree models which include many identifiability results [1, 2, 5, 30, 33]and descriptions of the phylogenetic invariants of many tree-based models [3, 12, 13, 31, 38]. Algebraicmethods have also led to competitive methods for reconstructing trees such as those described in [11, 14, 16]which all utilize invariants. Gross and Long began the study of the algebraic and geometric structure ofnetwork models in [19] and obtained some identifiability results for a certain class of Jukes-Cantor (JC)network models. Further identifiability results have since been obtained for networks using algebraic andcombinatorial methods. These include level-1 networks under the coalescent model [4], large-cycle networksunder the Kimura 2-Parameter (K2P) and Kimura 3-Parameter (K3P) models [24], and level-1 networksunder the JC, K2P, and K3P models [21]. There have also been some results obtained on the invariants ofnetwork models such as those in [9].In this paper we focus on finding the invariants of the Cavendar-Farris-Neyman (CFN) model on level-1phylogenetic networks. The discrete Fourier transform, which is used to simplify the parameterization ofgroup-based models, such as the CFN model, can also be applied to network models as well [19]. Afterapplying this transform, CFN tree models become toric varieties but the same is not true for CFN networkmodels which makes analyzing their algebraic structure more difficult. As observed in [19], the toric fiberproduct of [39] can still be applied to group-based network models. Our approach leverages this toric fiberproduct structure to reduce the problem to that of finding the invariants for sunlet networks which consistof only a single cycle. While sunlet network varieties are still not toric, they do have a lower-dimensional
Date : March 2020. orus action on them meaning they are T-varieties [22]. We use this torus action to break up the ideal ofinvariants of a n -leaf sunlet network into homogeneous graded pieces we call gloves . As a result, we arriveat the following theorem. Theorem.
A quadratic f is an invariant of the n -sunlet network if and only if it is an invariant for both ofthe underlying trees obtained by deleting a reticulation edge. We then explicitly produce all quadratic generators of the sunlet network ideal that lie in a given graded piecewhich gives a complete set of quadratic generators of the sunlet network ideal under the CFN model. Weconjecture that the sunlet network ideal is generated by quadratics which would imply our set of quadraticgenerators actually generate the entire ideal.We have also studied the 4- and 5-leaf sunlet networks in more detail. We have shown through explicitcomputation that their corresponding varieties are normal and Gorenstein. This means that any level-1network that can be built by gluing together 4- and 5-leaf sunlets along trees is normal and Cohen-Macaulaysince these properties are preserved by the toric fiber product. Level-1 networks built from gluing 4- and5-sunlets along leaves that are not adjacent to the reticulation vertex of the respective networks are alsoGorenstein for the same reason but this may not hold if networks are glued together along leaves adjacent tothe reticulation vertex. Lastly, we compute the multigraded Hilbert function of the 4-leaf sunlet network. Allof these computational results along with an implementation of our algorithm to find quadratic generatorsand computational evidence for our conjectures can be found at: https://github.com/bkholler/CFN Networks .This paper is organized as follows. In Section 2, we provide some background on phylogenetic models with aparticular emphasis on the CFN model and the ideal of invariants for CFN tree models. We also describe thetoric fiber product. In Section 3, we show that studying the CFN model on level-1 networks can be reducedto understanding the CFN model on n -sunlets. In Section 4, we give a complete description for quadraticinvariants for any sunlet network. In Section 5, we focus on 4- and 5-leaf sunlet networks and describe somealgebraic properties of their ideals. In Section 6, we discuss some open problems and conjectures concerningnetwork ideals and give some possible directions for approaching them. In particular, we conjecture that theCFN sunlet network ideal is generated by quadratics and is dimension 2 n when the network has n leaves. Contents
1. Introduction 12. Preliminaries 33. Reduction to Sunlet Networks 94. Quadratic Invariants of Sunlet Networks 115. Algebraic Properties of Small Sunlet Networks 236. Open Problems 26Acknowledgments 28References 28 . Preliminaries
In this section, we provide some background on phylogenetic networks and phylogenetic Markov models onthem. We then discuss toric fiber products which will be useful tools for describing the ideal of phylogeneticinvariants for the CFN model on a phylogenetic network.2.1.
Phylogenetic Networks.
In this section, we review the basics of phylogenetic networks and definesome network structures that we will use throughout the paper. Our notation and terminology is adaptedfrom [19, 20]. For additional information on the combinatorial properties of networks and definitions asso-ciated to them we refer the reader to [20, 36].
Definition 2.1.
A phylogenetic network N on leaf set r n s “ t , , . . . n u is a rooted acyclic digraph with noedges in parallel and satisfying the following properties:(1) the root has out-degree two;(2) a vertex with out-degree zero has in-degree one, and the set of vertices with out-degree zero is r n s ;(3) all other vertices have either in-degree one and out-degree two, or in-degree two and out-degree one.Vertices with in-degree one and out-degree two are called tree vertices while vertices with in-degree two andout-degree one are called reticulation vertices . Edges directed into a reticulation vertex are called reticulationedges and all other edges are called tree edges . This paper focuses on the CFN model which is group-based andhence time-reversible . This means that it is impossible to identify the location of the root under this modelso we are only interested in the underlying semi-directed network structure of the phylogenetic network.The underlying semi-directed network of a phylogenetic network is obtained by suppressing the root andundirecting all tree edges in the network. The reticulation edges remain directed though. This is illustratedin Figure 1.As the number of reticulation vertices in the network increases, the parameterization of the model becomesincreasingly complicated. A common restriction is to limit the number of reticulation vertices in eachbiconnected component of the network. A network is called level- k if there is a maximum of k reticulationvertices in each biconnected component of the network. In this paper we will focus on level-1 networks anda special subclass of these networks called sunlet networks which were first studied in [19]. Definition 2.2. A n -sunlet network is a semi-directed network with one reticulation vertex and whoseunderlying graph is obtained by adding a leaf to every vertex of a n -cycle. We denote with S n the n -sunletnetwork with reticulation vertex adjacent to the leaf 1 and the other leaves labelled clockwise from 1 inincreasing order.Note that any level-1 network can be constructed by gluing sunlets of possibly different sizes along trees.It was noted in [19] that this corresponds to a toric fiber product of their ideals. We develop this furtherin Section 3. We end this section with an example that corresponds to the 4-sunlet, S , which we will usethroughout this paper. Example 2.3.
Consider the network pictured on the left in Figure 1. This is a 4 leaf, level-1 network. Thereticulation edges are dashed and the reticulation vertex is the vertex adjacent to the leaf labelled 1. It’sunderlying semi-directed network is pictured on the right. This semi-directed network is a 4-sunlet withreticulation vertex 1. Observe that deleting either of the reticulation edges in the sunlet network yields anunrooted binary tree with 4 leaves but that these two trees are not the same.2.2.
Phylogenetic Markov Models.
In this section, we review the basics of phylogenetic Markov modelsfor trees and networks. For additional information we refer the reader to [34, 36]. Phylogenetic Markovmodels on networks are determined by the trees that result from deleting reticulation edges in the network.This means we first need to describe phylogenetic Markov models on trees. Figure 1.
A four leaf, level-1 network pictured on the left with all edges directed away from the root. Onthe right is the associated semidirected network obtained by suppressing the root and undirecting all treeedges. The edges are implicitly assumed to be directed into the vertex adjacent to the leaf 1. A κ -state phylogenetic Markov model on a n -leaf, leaf-labelled rooted binary tree T gives us a joint distribu-tion on the states of the leaves of T . This joint distribution is determined by associating a random variable X v with state space r κ s to each internal vertex v of T and a κ ˆ κ transition matrix M e to each directededge e “ p u, v q of T such that M ei,j “ P p X v “ j | X u “ i q . Also associate a root distribution π is to the root ρ of T . Let X i be the random variable associated to the leaf labelled i for i P r n s . Then the probability ofobserving a configuration p x , . . . x n q P r κ s n of states at the leaves is P p X “ x , . . . , X n “ x n q “ ÿ j Pr κ s Int p T q π j ρ ź p u,v qP E p T q M p u,v q j u ,j v . Note that the joint distribution of p X , . . . X n q is given by polynomials in the entries of π and the M e . Thismeans that the model can be thought of as a polynomial map ψ T : Θ T Ñ ∆ κ n ´ where Θ T is the stochastic parameter space of the model (the space of transition matrices and root distri-butions) and ∆ κ n ´ is the probability simplex. Since this map is a polynomial map, tools from algebraicgeometry can be used to study the model. This is one of the key takeaways from algebraic statistics and werefer the reader to [40] for additional information.We ignore the restrictions of the stochastic parameter space and extend ψ T to be a complex polynomial mapand study the variety V T “ im p ψ T q Zar which is called the phylogenetic variety associated to T . Polynomialsin the vanishing ideal I T “ I p V T q are called phylogenetic invariants and a major problem for any phylogeneticmodel is to describe this ideal. Characterizing the invariants of phylogenetic models began with [10, 29] andhas been continued by many including but not limited to [3, 12, 13, 31, 38].We can now use the Markov models we have for trees to define phylogenetic Markov models on networks. Let N be a network with reticulation vertices v , . . . v m and let e i and e i be the reticulation edges adjacent to v i . Associate a transition matrix to each edge of N . Independently at random we delete e i with probability λ i and otherwise delete e i and record which edge is deleted with a vector σ P t , u m where σ i “ e i was deleted. Each σ corresponds to a different tree T σ . Then the parameterization ψ N is givenby(1) ψ N “ ÿ σ Pt , u m ˜ m ź i “ λ ´ σ i i p ´ λ i q σ i ¸ ψ T σ where ψ T σ is the parameterization corresponding to the tree T σ with transition matrices inherited fromthe original network N . Note that this is similar to a mixture model but with many additional relationsamong the parameters. The parameterization ψ N is still a polynomial map though which means we can stillconsider the Zariski closure of the image ψ N and the corresponding ideal of phylogenetic invariants, I N . Asmentioned previously, if the phylogenetic model is time-reversible then we get the same model by consideringthe Markov process on the underlying semi-directed network. We end this section with our running example. e e e e e e e e (a) S e e e e e e e (b) T e e e e e e e (c) T Figure 2.
A 4 leaf 4-cycle network N and the two trees T and T that are obtained by deleting thereticulation edges e and e respectively. Example 2.4.
Consider the 4-sunlet S pictured in figure 2 with reticulation vertex adjacent to the leaf 1and reticulation edges e and e . The trees T and T are obtained by deleting edges e and e respectively.Since there is only one reticulation vertex in S , the sum in Equation 1 simplifies to ψ S “ λψ T ` p ´ λ q ψ T . The transition matrices used in the parameterization maps ψ T i are inherited from the original network. Forinstance the edge e in the original network has a transition matrix M e associated to it and thus the edge e that appears in T and the edge e that appears in T both use the same transition matrix M e .2.3. The CFN Model.
In this section, we review the CFN model, sometimes called the binary Jukes-Cantor model, and some known results about the ideal of phylogenetic invariants of trees under this model.In particular, we describe the discrete Fourier transform which turns the map ψ T into a monomial mapin the transformed parameters and thus the ideal I T becomes a toric ideal [38]. This vastly simplifies thenetwork parameterization as well and will make it much easier to define the parameterization explicitly. Webegin with a description of general group-based models and then discuss the CFN model in particular. Definition 2.5.
Let G be a finite abelian group of order κ and T a rooted binary tree. The state spaceof the random variables X v is identified with the elements of the group G . A group-based model on T is aphylogenetic Markov model on T such that for each transition matrix M e , there exists a function f e : G Ñ R such that M eg,h “ f p g ´ h q .The CFN model is a 2 state phylogenetic Markov model where the states are purine (adenine and guanine)and pyrimidine (thymine and cytosine), that is the DNA bases are grouped into two groups correspondingto their chemical structure. It is a group-based model for the group G “ Z { Z with the states purine andpyrimidine arbitrarily associated to the elements of Z { Z . This means the transition matrices in the modelhave the form M e “ ˆ α ββ α ˙ and the associated function f e : Z { Z Ñ R is simply f e p q “ α and f e p q “ β .Group-based models allow for a linear change of coordinates that makes ψ T a monomial map in the trans-formed parameters. This means many group-based models (such as the CFN, JC, K2P, and K3P models)are toric varieties in the transformed coordinates [38]. This change of coordinates is called the discreteFourier transform and was first utilized in [15, 23]. The new image coordinates, commonly called the Fouriercoordinates, are denoted with q g ,...,g n for g , . . . , g n P G . For the CFN model, that is we have G “ Z { Z ,the parameterization ψ T can be given in terms of the edges of the tree and their corresponding splits whichwe briefly describe first.A split of r n s is a set partition A | B of the set r n s . A split A | B is valid for an unrooted binary tree T leaf-labelled by r n s if it can be obtained as the leaf sets of the two connected components of T z e for some edge e of T . So we let Σ p T q be the set of edges of T and to each edge e we associate the split A e | B e that deleting he edge e yields. Now for each edge e P Σ p T q and each group element g P Z { Z we have a parameter a eg .The parameterization of the model ψ T in the Fourier coordinates is given by(2) q g ,...g n “ $&% ź A e | B e P Σ p T q a e ř i P Ae g i if ř i Pr n s g i “
00 otherwise . Note in the parameterization we are utilizing the natural identification between the edge e and its associatedsplit A e | B e . We can now think of the variety V T as being the closure of the map given in Equation 2 wherethe parameters are allowed to range freely over the complex numbers.We now introduce two different interpretations of the toric ideal I T that will be useful in building the sunletideal I S n . Sturmfels and Sullivant showed in [38] that the ideal of phylogenetic invariants for a tree T under the CFN model can be constructed in the following way. Let A | B be a split of T and let | A | “ j so | B | “ n ´ j . For each i P Z { Z we form a matrix M i with rows indexed by sequences r P p Z { Z q A andcolumns indexed by sequences c P p Z { Z q B such that ř a P A r a “ ř b P B c b “ i . The entry of M i in row r andcolumn c is q g such that g | A “ r and g | B “ c . Then the ideal of phylogenetic invariants for the tree T isgiven by all of the 2 ˆ M i as A | B ranges over all the splits of T . The followingexample illustrates this construction. Example 2.6.
Let T be the unrooted binary tree determined by the split 12 |
34. Then M “
00 11 ˆ ˙ q q q q and M “
01 10 ˆ ˙ q q q q . So the ideal of phylogenetic invariants for T is I T “ x q q ´ q q , q q ´ q q y .Essentially, their construction shows that the ideal I T is given by rank constraints on matrices that comefrom slicing and flattening the tensor p q g : g P p Z { Z q n q according to the splits of T . This determinantalrepresentation is also amenable to computation since determining whether or not a point is in the variety canbe done by verifying that the rank of the associated matrices, M i , is at most one. Another representationof relations in I T was given by Buczy´nska and Wi´sniewski in [8]. They use systems of paths on the tree T to describe these binomials instead. Note that any g P p Z { Z q n defines a unique system of disjoint paths on T that connects the leaves ℓ such that g ℓ “ e such that for the associated split A e | B e it holds that ř a P A e g a “ ř b P B e g b “
1. Thefollowing example illustrates their construction.
Example 2.7.
Let T again be the 4 leaf tree defined by the single split 12 |
34. Note that each g P p Z { Z q corresponds to a unique system of disjoint paths between the leaves ℓ P r s such that g ℓ “
1. For instance q corresponds to the red path 12 34 . We saw in Example that q q ´ q q P I T . Using the interpretation of the variables as paths, wecan see this relation as encoding that two systems of paths are equivalent. The paths are pictured below inred. 12 34 q
12 34 q = 12 34 q
12 34 q ince the discrete Fourier transform gives a linear change of coordinates it can also be applied to group-basedmodels on phylogenetic network models [19]. This means the parameterization of S n is(3) q g ,...g n “ $&% ź A e | B e P Σ p T q a e ř i P Ae g i ` ź A e | B e P Σ p T q a e ř i P Ae g i if ř i Pr n s g i “
00 otherwise . Example 2.8.
Let S n be the 4-sunlet pictured in Figure 2. As we saw in the previous example, the trees T and T that are also pictured in Figure 2 are obtained from S n by deleting the reticulation edges e and e respectively. We denote the Fourier parameter corresponding to the edge e i and group element g j by a ig j .The parameterization ψ S n in the Fourier coordinates is q g ,g ,g ,g “ a g a g a g a g a g a g ` g a g ` a g a g a g a g a g a g ` g a g if ř i Pr s g i “
00 otherwiseThe first term in the above parameterization comes from the parameterization ψ T in the Fourier coordinatesand the second term comes from ψ T .This new parameterization is easier to work with than the previous parameterization but I S n is still nota toric ideal in the new coordinates. This means the techniques used to analyze the ideal I T can not bedirectly used to analyze I S n . One of our goals in this paper is to develop new techniques to describe theinvariants in I S n that are reminiscent of the original constructions for trees.2.4. Toric Fiber Products.
In this section we recall the toric fiber product operation on multigraded idealsfirst defined by Sullivant in [39].We first consider a polynomial ring C r ¯ x s : “ C r x , . . . , x n s equipped with a grading by elements of a lattice M . This means that there is a linear map deg : N n Ñ M , and a direct sum decomposition of C r ¯ x s as a C -vector space into isotypical components indexed by M : C r ¯ x s “ à m P M C r ¯ x s m , where C r ¯ x s m has a basis of the set of monomials x α where deg p α q “ m . The support semigroup S p deg q Ď M is defined to be the set of m such that C r ¯ x s m ‰
0. It is straightforward to show that S p deg q is closed underunder addition, contains 0 P M , and is generated by the set t d , . . . , d n u Ă M , where d i “ deg p e i q .A polynomial f P C r ¯ x s is said to be M -homogeneous if f P C r ¯ x s m for some m P S p deg q . Equivalently, f is M -homogeneous if and only if each non-zero monomial term C α x α appearing in f satisfies deg p α q “ m . Apolynomial ideal I Ď C r ¯ x s is M -homogeneous if it satisfies the following equivalent conditions:(1) I “ x f , . . . , f ℓ y for M -homogeneous polynomials f i P C r ¯ x s m i ,(2) I “ À m P M I m , where I m “ I X C r ¯ x s m .Next we consider two M -graded polynomial rings C r ¯ x s , C r ¯ y s with homogeneous ideals I Ă C r ¯ x s and J Ă C r ¯ y s .Let deg : N n Ñ M and deg : N m Ñ M be the linear maps corresponding to the M -gradings on C r ¯ x s and C r ¯ y s , respectively. We make the technical assumption that the set of degrees A “ t d , . . . , d r u Ă M obtainedby applying the functions deg , deg to the generators of C r ¯ x s and C r ¯ y s form a linearly independent set in M , and we assume without loss of generality that rank p M q “ r . We also assume that each degree d i isrealized by an element of ¯ x and an element of ¯ y . These conditions are satisfied by the toric fiber products ofcycle networks we consider in this paper, where M “ Z and the degree set can be taken to be tp , q , p , qu .We define S Ă C r ¯ x, ¯ y s to be the subalgebra spanned by those monomials x α y β such that deg p α q “ deg p β q .It is a straightforward consequence of the linear independence assumption on A that S is generated as an lgebra by the monomials x i y j where x i and y j have the same M -degree. We let C r ¯ z s be the polynomialring on variables z ij , where ij corresponds to a monomial x i y j with this property. We let φ denote thecomposition of the following ring homomorphisms: C r ¯ z s Ñ C r ¯ x, ¯ y s Ñ C r ¯ x, ¯ y s{x I, J y z ij Ñ x i y j Ñ r x i y j s The following is the main definition of [39].
Definition 2.9.
Let I Ă C r ¯ x s and J Ă C r ¯ y s be M -graded ideals as above, then the toric fiber product I ˆ M J Ă C r ¯ z s is defined to be the kernel of φ .A key feature of the toric fiber product construction in the linearly independent case we consider here is thata Gr¨obner basis for I ˆ M J can be assembled from Gr¨obner bases for I and J . Recall that a weight vector w P Q n defines an initial ideal in w p I q Ă C r ¯ x s (see [37]). In particular, in w p I q is generated by the initialforms in w p f q for f P I , where in w p f q is the polynomial obtained from f by taking only those monomialterms whose monomial power is minimized on the inner product with w . We say that G Ă I is a Gr¨obnerbasis for I with respect to w if the initial forms t in w p g q | g P G u Ă in w p I q are a generating set.The kernel of the map C r ¯ z s Ñ C r ¯ x, ¯ y s is a binomial ideal with a distinguished generating set Quad M .Following the description in [39, Proposition 2.6], we suppose x i , x i , y j , y j all have the same degree d ,then we get a relation: z i ,j z i ,j ´ z i ,j z i ,j . Ranging over all d P A we obtain a set of binomial quadratic relations Quad M Ă I ˆ M J .Let w P Q n and w P Q m be weights for C r ¯ x s and C r ¯ y s , respectively. We obtain a weight φ ˚ p w , w q for C r ¯ z s by setting φ ˚ p w , w qr z ij s “ w r x i s ` w r y j s . For an M -homogeneous polynomial g P C r ¯ x s , Lif t p g q Ă C r ¯ z s is obtained as follows. Let g “ ř C a x a ¨ ¨ ¨ x a n n , where ř a i deg p x i q “ u P M is fixed for all monomials with C a ‰
0. Linear independence of A implies that for each d i P A , the total contribution of d i in each monomialterm is independent of a . Now, for each x i select κ p i q P r m s such that deg p x i q “ deg p y κ p i q q and κ p i q “ κ p j q when deg p x i q “ deg p x j q . This choice κ defines a set of monomial generators z ,κ p q , . . . , z n,κ p n q P C r ¯ z s .The κ -lift of g is the polynomial g κ “ ř C a z a ,κ p q ¨ ¨ ¨ z a n n,κ p n q P C r ¯ z s . The set Lift p g q is then defined to be theset of all such lifts. The lift of an M -homogeneous polynomial in C r ¯ y s , and the lift of a set of a polynomialsare defined similarly. The following is [39, Theorem 2.8]. Proposition 2.10.
Let G Ă I and G Ă J be Gr¨obner bases with respect w and w respectively, then t Lift p G q , Lift p G q , Quad M u is a Gr¨obner basis with respect to φ ˚ p w , w q , and in w p I q ˆ M in w p J q “ in φ ˚ p w ,w q p I ˆ M J q . Corollary 2.11. If I and J have weights w w , respectively, with Gr¨obner bases with degrees bounded aboveby k , then there is a Gr¨obner basis of I ˆ M J with respect to φ ˚ p w , w q of degree greater than and boundedabove by k . If the initial ideals in w p I q , in w p J q are toric, then in φ ˚ p w ,w q p I ˆ M J qq is toric.Proof. If in w p I q , in w p J q are toric, then Proposition 2.10 implies that in φ ˚ p w ,w q p I ˆ M J qq is the kernel ofa map to a domain, and possesses a binomial Gr¨obner basis. (cid:3) The assumption that I and J are M -homogeneous ideals implies that their factor rings A “ C r ¯ x s{ I and B “ C r ¯ y s{ J are M -graded as well: “ à m P M A m B “ à m P M B m . The linear independence of the set A Ă M implies that the factor ring C r ¯ z s{ I ˆ M J is isomorphic to thesubalgebra À m P M A m b B m Ă A b C B . We let p A b C B q T M denote this subalgebra. This notation isexplained as follows. The spectrum of the group algebra C r M s is an algebraic torus T M , and the M -gradingon A and B naturally corresponds to an action by T M , where the graded components A m and B m are the m P M -isotypical spaces of A and B , respectively, when these rings are regarded as T M representations.Consequently, we can define an “anti-diagonal” T M -action on the tensor product A b C B by giving B m isotypical degree ´ m . The subring p A b C B q T M Ă A b C B is the ring of invariants with respect to theantidiagonal action. In the following we use the invariant-theoretic interpretation of the toric fiber product. Proposition 2.12.
With I , J , and A , A , and B as above, if A and B are normal, then p A b C B q T M isnormal. If there exist w and w such that C r ¯ x s{ in w p I q and C r ¯ y s{ in w p J q are normal toric algebras, then p A b C B q T M is normal and Cohen-Macaulay.Proof. The invariant ring of a normal algebra is normal, and the invariant ring of a Cohen-Macaulay ringis Cohen-Macaulay. If C r ¯ x s{ in w p I q and C r ¯ y s{ in w p J q are normal toric algebras, then both are normal andCohen-Macaulay. It follows that the algebras A and B are normal and Cohen-Macaulay, and also that C r ¯ z s{ in φ ˚ p w ,w q p I ˆ M J q is normal and Cohen-Macaulay. We conclude that p A b B q T M is normal andCohen-Macaulay as well. (cid:3) We recall the characterization of Gorenstein normal toric algebras [6, Corollary 6.3.8]. Let P Ď R n be apolyhedral cone with affine semigroup S P “ P X Z n and relative interior int p P q . For w P P let r w s denotethe associated element of the affine semigroup algebra K r S P s , The canonical module of K r S P s is isomorphicto the ideal xr w s | w P int p P q X Z n y “ Ω P Ă K r S P s . The algebra K r S P s is Gorenstein if and only ifΩ P “ r w s K r S P s for some w P int p P q . Proposition 2.13.
Let I , J , and A , A , and B be as above. Suppose there exist w and w such that K r ¯ x s{ in w p I q and K r ¯ y s{ in w p J q are Gorenstein normal toric algebras isomorphic to K r S P s and K r S Q s forcones P Ă R n and Q Ď R m , respectively. Finally, suppose that the canonical module generators u P P and v P Q have the same M -degree. Then K r ¯ z s{ in φ ˚ p w ,w q p I ˆ M J q and p A b B q T M are normal Gorensteinalgebras.Proof. If p p, q q P S P ˆ M S Q is not of the form p p , q q`p u, v q for p P S P and q P S Q , then it follows that p P S P or q P S Q is not an interior point of P or Q , respectively. Say p is not an interior point of P . It follows thatfor some linear function ℓ : R n Ñ R , ℓ p p q “ ℓ p w q ą
0. We extend ℓ to ℓ : R n ˆ R m Ñ R . It follows that ℓ p p, q q “ ℓ p u, v q ą
0, so that p p, q q must be on the relative boundary of P ˆ M Q . By contrapositive,if p p, q q is a relative interior point of P ˆ M Q , then p p, q q “ p p , q q ` p u, v q , for some p p , q q P S P ˆ M S Q .This implies that K r ¯ z s{ in φ ˚ p w ,w q p I ˆ M J q is normal and Gorenstein. It follows that p A b B q T M is normaland Cohen-Macaulay, with the same Hilbert function as its initial algebra K r ¯ z s{ in φ ˚ p w ,w q p I ˆ M J q . Now[35, Theorem 4.4] implies that p A b B q T M is Gorenstein. (cid:3) Reduction to Sunlet Networks
In this section, we show that gluing level-1 networks together along a leaf corresponds to a toric fiber productof their corresponding ideals. This was pointed out in [19] but the authors do not prove it. We include amore detailed discussion and the proof here for completeness. This means that the ideal of invariants forany network can be constructed by taking toric fiber products of sunlet networks and trees.Let N be a level-1 network and observe that we can either find an edge e such that when e is cut, N issplit into two new networks N ´ and N ` where N ´ and N ` are level-1 networks with fewer leaves or that o such e exists in which case N is a sunlet network or 3-leaf tree. We can of course recover the network N by gluing N ´ and N ` along the edge e which is a leaf of both new networks. We denote the operation ofgluing these networks along a leaf edge as N “ N ´ ˚ N ` . This operation is pictured in Figure 3.We now assume N does admit a decomposition N “ N ´ ˚ N ` and denote the ambient polynomial rings ofthese networks with C r q s , C r q s ´ , C r q s ` . Note that their corresponding ideals I N , I N ´ , I N ` are all homoge-neous in the grading determined by deg p q g q “ e g e where e g e is the corresponding standard basis vector. Example 3.1.
Let N ´ be the corresponding network pictured in Figure 3 then C r q s ´ “ C r q g | g “ p g , g , g , g e q P Z and g ` g ` g ` g e “ s and one can compute explicitly that I N ´ “ x q q ´ q q ` q q ´ q q y Ď K r q ´ s . We can clearly see that this polynomial is homogeneous of degree e ` e “ ˆ ˙ by simply examining thelast entry of the label sequence of each monomial. Proposition 3.2.
Assume N is not a sunlet network or 3-leaf tree and let N “ N ´ ˚ N ` be a decompositionof N into two smaller level-1 networks. Let each variable q g in C r q s , C r q s ´ , C r q s ` have degree e g e . Then I N is the toric fiber product: I N “ I N ´ ˆ A I N ` with A “ t e , e u linearly independent.Proof. We prove this by slightly modifying the parameterization ψ N and then factoring it which is a standardtechnique introduced in [39]. Recall that for a tree T , I T can be thought of as the kernel of the map ψ T : C r q s Ñ C r a ig : g P Z { Z , i P E p N qs given by Equation 2 and I N is then the kernel of the map ψ N “ ÿ σ Pt , u m ˜ m ź i “ λ ´ σ i i p ´ λ i q σ i ¸ ψ T σ . Note that squaring the variables associated to the edge e , which are a eg e , everywhere they appear does notchange the parameterization. Furthermore, the edge e which we have glued along is an edge in every tree T σ and so we can also split each T σ along this edge to get two new trees T ` σ and T ´ σ . Then we have from[39, Theorem 3.10] that(4) ψ T σ p q g q “ ψ T ´ σ p q g q ψ T ` σ p q g q . That is the parameterization for the tree T σ factors as a product of the parameterizations for the trees T ` σ and T ´ σ .Without loss of generality let v , . . . , v ℓ be the reticulation vertices of N that lie in N ´ and v l ` , . . . , v m bethose that lie in N ` . Then we can substitute Equation 4 into ψ N and regroup to get ψ N p q g q “ ÿ σ Pt , u m «˜ ℓ ź i “ λ ´ σ i i p ´ λ i q σ i ¸ ψ T ´ σ p q g q ff «˜ m ź i “ ℓ ` λ ´ σ i i p ´ λ i q σ i ¸ ψ T ´ σ p q g q ff “ ¨˝ ÿ σ Pt , u ℓ ˜ ℓ ź i “ λ ´ σ i i p ´ λ i q σ i ¸ ψ T ´ σ p q g q ˛‚¨˝ ÿ σ Pt , u m ´ ℓ ˜ m ź i “ ℓ ` λ ´ σ i i p ´ λ i q σ i ¸ ψ T ` σ p q g q ˛‚ “ ψ N ´ p q g q ψ N ` p q g q since trees T ´ σ and T ` σ are exactly the trees that appear in the parameterization of ψ N ´ and ψ N ` respectively. e (a) N ´ e (b) N ` e
32 654 (c) N Figure 3.
We can glue two four leaf networks along identified leaves to get a six leaf network. Thiscorresponds to taking a toric fiber product of the corresponding ideals.
This implies that ψ N factors through the map φ : C r q s Ñ C r q s ´ b C r q s ` q g ÞÑ q g ´ b q g ` and thus I N is the desired toric fiber product. (cid:3) Remark 3.3.
The exact same proof can be used to extend the above proposition to all group-based modelson level-1 phylogenetic networks. We present it in terms of the CFN model here since that is the main focusof our paper.The above proposition gives an immediate algorithm for constructing the ideal I N if the ideals for all sunletnetworks and trees are known. The original network N is recursively decomposed into sunlet networks andtrees. One then builds the ideal back up by taking toric fiber products of the sunlet network ideals and treeideals. Since the ideals corresponding to trees are completely known, the problem of finding the ideal I N now amounts to understanding the sunlet network ideals I S n . This is our main focus for the remainder ofthis paper. 4. Quadratic Invariants of Sunlet Networks
Sunlet Networks are Graded.
Let S n be the n -sunlet network. The leaf edges are labelled e , . . . , e n ,the reticulation edges are e n ` and e n , and all the other edges are listed sequentially around the cycleclockwise. Let R n “ C r q g ,...,g n | p g , . . . , g n q P p Z { Z q n and n ÿ i “ g i “ s and let S n “ C r a ig | g P Z { Z and 1 ď i ď n s . Then the defining ideal of S n is given by the kernel of ψ n : R n Ñ S n defined by q g ,...,g n ÞÑ n ź j “ a jg j ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ . We grade R n by Z n ` as follows: deg p q g ,...,g k q “ p , h , . . . , h k q here h i “ g i “ h i “ g i “
0. Similarly, we grade S n by Z n ` as follows:deg p a jg j q “ $’&’% if j ą n p , h , , . . . , q if j “ p , , . . . , h j , . . . , q if 2 ď j ď n where the h j ’s are defined as above and it occurs in the p j ` q st position. In this way, we see that ψ n is Z n ` -graded C -algebra homomoprhism; thus, the kernel of ψ n inherits the grading on R n . Remark 4.1.
We have shown that the coordinate ring of the n -sunlet variety is graded by Z n ` . Inparticular, this makes the variety into a T -variety: there is a T – p C ˆ q n ` -action on the variety. We notethat S n does not yield a toric variety since in general dim p T q ă dim S n .4.2. Quadratic phylogenetic invariants for sunlet networks.
In this subsection, we will leverage thegrading from Section 4.1 to find all quadratic invariants of S n . At first glance, this procedure might feelslightly unnatural; however, as we will see in Section 4.3, our approach produces all quadratic invariants forany phylogenetic tree. Since these ideals are generated by quadratics this completely describes all invariantsfor trees, and so we argue that this is a natural procedure to try on networks. At the end of this subsection,we will also give a visual representation of the quadratic invariants in terms of paths in the network.Throughout this section, let ψ n : R n Ñ S n be the parameterization of the network variety S n as defined inSection 4.1, and let J n “ ker ψ n . We begin with a definition. Definition 4.2.
Fix F Ď r n s and a P p Z { Z q F . The glove , G p n, F , a q , is the C -vector space spanned by allquadratic monomials q g q h in R n so that g | F “ h | F “ a and g | F c ` h | F c “ where is the all ones vectorin p Z { Z q F c . If F “ H , then we simply write G p n, Hq . Remark 4.3.
It is not efficient to consider all possible gloves since for some choices of F and a , thecorresponding glove intersects J n trivially. In fact, given a glove, G p n, F , a q Ď R n , if G p n, F , a q X J n ‰ t u ,then |r n sz F | ě |r n sz F | is odd, then G p n, F , a q “ t u . Indeed, if one considers a monomial q g q h P G p n, F , a q , then it is not possible for ř ni “ g i and ř ni “ h i to both be 0; hence, no such monomial exists. Now, suppose that |r n sz F | “ C p G p n, F , a qq “
1. Then as ψ n p q g q ‰ g and as S n is an integral domain, all non-trivialpolynomials from G p n, F , a q lie outside the kernel of ψ n . Remark 4.4.
Note that when n ě C p G p n, Hqq “ n ´ . One way to see this is to notethat the indices t g , h u that appear in the chosen basis for G p n, Hq are exactly the cosets of x y ď t g Pp Z { Z q n | ř ni “ g i “ u .With respect to the Z n ` grading from Section 4.1, each glove G p n, F , a q is p R n q c where c “ c i ` “ i R F and c i ` “ a i when i P F . Moreover, this encompasses all graded components whose total degreeis 2. Therefore, in order to describe all quadratic phylogenetic invariants of S n , it is enough to find a basisfor G p n, F , a q X J n for each choice of F and a where |r n sz F | “ k for all k in t , . . . , t n u u .In order to state the main result of this section, we need to define two linear maps obtained out of a glove G p n, F , a q . Consider two following subsets of r n s : E p n, F q “ t i | |r i sz F | is even and 2 ď i ď n ´ u O p n, F q “ t i | |r i sz F | is odd and 2 ď i ď n ´ u . When n and F are clear from context, we will just write E and O , respectively. Using these subsets of t , . . . , n ´ u , we color the monomials lying in G p n, F , a q in two ways. If we have a monomial lying in G p n, F , a q , and we know that one of the factors is q g , then the other factor is determined by g . Thus, it isconvenient for us to only record “half” of each term, so we set L p n, F , a q “ t g | q g q h P G p n, F , a q and g ă lex h u . f q g q h P G p n, F , a q and g P L p n, F , a q , we define our two colorings as follows. c E p q g q h q “ ˜ j ÿ i “ g i ¸ j P E P p Z { Z q E c O p q g q h q “ ˜ j ÿ i “ g i ¸ j P O P p Z { Z q O Now we can define our two maps M n, F , a E : G p n, F , a q Ñ C p Z { Z q E and M n, F , a O : G p n, F , a q Ñ C p Z { Z q O . Withrespect to the bases t q g q h | g P L p n, F , a u , t e c | c P p Z { Z q E u , and t e c | c P p Z { Z q O u , these maps have thefollowing matrix representations. p M n, F , a E q p c ,q g q h q “ c “ c E p q g q h q p M n, F , a O q p c ,q g q h q “ c “ c O p q g q h q . At this point, we are fully equipped to state the main theorem of this section; however, we will delay theproof until Section 4.4.
Theorem 4.5.
Let G p n, F , a q be a glove so that either is not in F or is in F but a “ . Then J n X G p n, F , a q “ ker M n, F , a E X ker M n, F , a O . On the other hand, if is in F and a “ , then J n X G p n, F , a q “ ker M n, F , a E . Remark 4.6.
As we shall see in Section 4.3, Theorem 4.5 can be reformulated as follows: f P J n X G p n, F , a q if and only if f is a phylogenetic invariant for both underlying trees. If we let I T and I T be the definingideals for the two underlying trees, then it is always true that J n is contained in the intersection of I T and I T ; however, in general, J n is not the intersection of these two toric ideals as can be seen even when n “ I T “ x q q ´ q q , q q ´ q q y I T “ x q q ´ q q , q q ´ q q y . However, I T X I T is generated by one quadratic and one quartic, while J is generated by just the quadratic. Proposition 4.7. If n is at least 4 and is even, then dim C p J n X G p n, Hqq “ p n { ´ ´ q . Moreover, aslong as R F or P F but a “ , J n X G p n, F , a q – J n ´| F | X G p n, Hq .Proof. For the first claim, note that n must be even; otherwise, G p n, Hq is trivial. By Theorem 4.5, J n X G p n, Hq is the intersection of ker M n, H E and ker M n, H O . Let M n, H be the map M n, H E ‘ M n, H O : G p n, Hq Ñ C p Z { Z q E ‘ C p Z { Z q O , so J n X G p n, Hq “ ker M n, H .We will demonstrate that dim C p J n X G p n, Hqq “ p n { ´ ´ q by showing that the rank of M n, H is 2 n { ´ C p G p n, Hqq “ n ´ , we will see by rank-nullity that dim C p J n X G p n, Hqq “ n ´ ´ n { ` “p n { ´ ´ q .Note that | E | “ | O | “ n ´
1. If we think of M n, H as a matrix, its columns are indexed by monomials q g q h P G p n, Hq , and its first 2 n { ´ rows are indexed by the elements of p Z { Z q E and the last 2 n { ´ rowsare indexed by p Z { Z q O . We claim that the matrix for M n, H takes the following form: (1) every columnis of the form e c ` e c where c P p Z { Z q E and c P p Z { Z q O , (2) each column is distinct, and (3) everypossible combination of e c ` e c occurs. he first point is clear by the definition of the maps M n, H E and M n, H O . For the second and third points, wewill show that for any c P p Z { Z q E and c P p Z { Z q O there is a unique q g q h P G p n, Hq so that c E p q g q h q “ c and c O p q g q h q “ c . Note that uniqueness will follow immediately since if there were two monomials whosecolors are c and c , then they must be the same since c and c record all the partial sums of each of theircorresponding group elements. We will build up g P L p n, F , a q whose partial sums are given by c and c .Let c P p Z { Z q t ,...,n ´ u be the unique vector with c | E “ c and c | O “ c . If we let r c “ p , c , q P p Z { Z q n ,then we set g i “ r c i ` r c i ´ for i ě g “
0. One can see that ř ji “ g j “ c j for any 2 ď j ď n ´
1. Inorder to get a monomial in the glove, we consider q g q ` g P G p n, Hq . By construction, c E p q g q ` g q “ c and c O p q g q ` g q “ c .Now, we can show that the row rank of M n, H is one less than the number of rows. Up to scaling there isonly one linear relation among the rows which is given by adding up the first 2 n { ´ rows and subtracting offthe last 2 n { ´ rows. Points (2) and (3) above guarantee that this is the only relation among the rows. Sincethe rank of M n, H is 2 n { ´ C G p n, Hq “ n ´ , we have that dim C p J n X G p n, Hqq “ p n { ´ ´ q .For the second statement fix a glove G p n, F , a q . First, suppose that ř i P F a i “
0. Then for any g Pp Z { Z q n ´| F | , define g p F , a q P p Z { Z q n as g p F , a q| F “ a and g p F , a q| F c “ g . Then define a linear map T : G p n, Hq Ñ G p n, F , a q defined by T p q g q h q “ q g p F , a q q ` g p F , a q . T is an isomorphism, and it is not hard tosee that there is a map which makes the diagram commute and is an isomorphism when restricted to theimages of the horizontal maps. G p n ´ | F | , Hq C E p n ´| F | , Hq ‘ C O p n ´| F | , Hq G p n, F , a q C E p n, F q ‘ C O p n, F q M n ´| F | , H T M n, F , a It then follows that J n X G p n, F , a q – J n ´| F | X G p n, Hq in this case. The other case, when ř i P F a i “
1, isexactly the same except g p F , a q is defined as g p F , a q| F “ a and g p F , a q| F c “ g ` e n ´| F | . (cid:3) By the propsoition, in order to find a basis for J n X G p n, F , a q when 1 is not in F or 1 is in F but a “
1, itis enough to find a basis for J n ´| F | X G p n ´ | F | , Hq and then apply the map T . In the next proposition, weprovide an explicit basis for J n X G p n, Hq for any even n greater than or equal to 4. Theorem 4.8.
Fix an even integer n P Z ě , and a group element c P p Z { Z q t ,...,n ´ u so that c | E p n, Hq ‰ and c | O p n, Hq ‰ . Then we define the polynomial f c “ q g p , q q h p , q ´ q g p c | E , q q h p c | E , q ` q g p c | E , c | O q q h p c | E , c | O q ´ q g p , c | O q q h p , c | O q in J n X G p n, Hq . Here g p c , c q is defined by setting g “ and for i ě we have that g i “ c i ´ ` c i where c P p Z { Z q n has c “ c n “ and c | E “ c and c | O “ c , and h p c , c q “ ` g p c , c q . Then B n “ t f c | c P p Z { Z q t ,...,n ´ u and c | E ‰ , c | O ‰ u is a basis for J n X G p n, Hq .Proof. Note that by definition f c P G p n, F , a q . To see that f c P J n , note that M n, H p f c q “ e | E ` e | O ´ e c | E ´ e | O ` e c | E ` e c | O ´ e | E ´ e c | O “ . By Theorem 4.5, f c P J n .Since | B n | is p n { ´ ´ q , it is enough show that B n is independent. Consider any linear combination ofthe elements of B n “ ÿ c a c f c . rojecting ř c a c f c onto span C t q g p c | E , c | O q q h p c | E , c | O q u , yields a c q g p c | E , c | O q q h p c | E , c | O q from which it follows that a c “ c . (cid:3) Remark 4.9.
Let I n be the ideal generated by all quadratics in J n . Then Propositions 4.7 and 4.8 give arecipe for obtaining generators of J n X G p n, F , a q where either 1 R F or 1 P F but a “
0. The case when1 P F and a “ T in S n where all the edges containing the reticulation vertex are deleted. Of course, this tree only has n ´ q g ÞÑ q p , g q . These facts along withPropositions 4.7 and 4.8 allow us to find all quadratic generators of the sunlet network ideal very quickly.Our implementation of this can be found in the macaulay2 file sunletQuadGens.m2 .Similar to the tree case, each variable q g can be thought of as a system of paths on the network. The pathsconnecting the vertices ℓ such that g ℓ “ q g , we consider all the edgesin the network which are supported in either of these two path systems. Now, we fix a glove G p n, F , a q sothat 1 R F . For any monomial q g q h P G p n, F , a q , we take the symmetric difference of the collection of edgesobtained from each monomial. Below is an example with q q P G pt , u , p , qq Ă R . ˆ =Note that in this example, the leaves which are omitted correspond to F “ t , u . Note E “ t , u , O “ t , u , c E p q q q “ p , q P p Z { Z q t , u , and c O p q q q “ p , q P p Z { Z q t , u . Puttingthese two colorings together gives us p , , , q P p Z { Z q t , , , u . We see that the 1 in the coloring indicatesthat e ` should be removed while the zeros in positions 2 ,
4, and 5 indicate that the edges e ` , e ` ,and e ` should remain in the resulting diagram. In fact, these observations hold true as long as 1 R F .Therefore, we define the diagram for q g q h P G p n, F , a q (for any F ) by omitting any leaves which are in F and any edge e n ` k when the coloring of the monomial in position k is 1. These diagrams gives us a visualinterpretation of the colorings c E and c O . Example 4.10.
These diagrams give us an easy way to tell if an element f P G p n, F , a q is in an invariant.For example, take f “ q q ´ q q ` q q ´ q q P G pt , u , p , qq Ă R .Here E “ t u and O “ t , , u . Then c E and c O on each monomial is as follows. c E p q q q “ c O p q q q “ p , , q c E p q q q “ c O p q q q “ p , , q c E p q q q “ c O p q q q “ p , , q c E p q q q “ c O p q q q “ p , , q Pictorially, this is as follows: ψ p q q ` q q q “ ψ p q q ` q q q` “ ` We can tell that f P J X G pt , u , p , qq by noting that the odd colors, p , , q and p , , q , and the evencolors, 0 and 1, appear once on each side of the equation, i.e. M t , u , p , q E p f q and M t , u , p , q O p f q are both 0.On the other hand, one can also see that J X G pt , u , p , qq contains no binomials of the form z q g q h ´ z q g q h or any suitable group elements and complex numbers z i P C zt u . The reason being that if this were tovanish under ψ that would mean that z “ z and the colorings of each q g i q h i would need to be identical,but this would imply that g “ g and h “ h . Example 4.11.
Let n “ F “ H . In this case, G p , Hq is spanned by the following 16 monomials. q q , q q , q q , q q ,q q , q q , q q , q q ,q q , q q , q q , q q ,q q , q q , q q , q q We have the following matrices where the columns are indexed by the monomials above and the rows areindexed by elements of p Z { Z q lexicographically. M , H E “ ¨˚˚˝ ˛‹‹‚ M , H O “ ¨˚˚˝ ˛‹‹‚ Then J X G p , Hq is a 9 dimensional C -vector space spanned by the following polynomials. q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q ` q q ´ q q Let us consider the colorings of the monomials in the last polynomial. Note E “ t , u and O “ t , u c E p q q q “ p , q c O p q q q “ p , q c E p q q q “ p , q c O p q q q “ p , q c E p q q q “ p , q c O p q q q “ p , q c E p q q q “ p , q c O p q q q “ p , q This relation can be viewed pictorially as ψ p q q ` q q q “ ψ p q q ` q q q + = +We also note that the dimension of S is 12, its codimension is 20, and J is minimally generated by 79polynomials; thus, contrary to say the 4-leaf case, S is not a complete intersection even set-theoretically. .3. Quadratic phylogenetic invariants of trees.
Let T be a binary tree with leaf set r n s , and let I T bethe defining ideal for the corresponding variety. As was discussed in Section 2.3, the phylogenetic invariantsfor this model are given purely in terms of 2 ˆ I T is also graded by Z n ` ; hence, the quadratic generatorscan be described by the C -vector spaces I T X G p n, F , a q where we can again restrict to when r n sz F is evenhas cardinality at least 4. Recall that for any edge e P Σ p T q , the edge induces a split of the tree A e | B e . Inthis section, given a glove G p n, F , a q , we define E T p F q “ t e P E p T q|| A e z F | is even u . When it is clear from context, we will simply write E T . Similarly, we let the linear map M F , a E T : G p n, F , a q Ñ C p Z { Z q E T be defined by the following matrix as in the previous subsection. p M n, F , a E T q c ,q g q h “ e P E T , c e “ ř i P A e g i g ă lex h . Then we have the following theorem which is analogous to Theorem 4.5, but for trees. Theorem 4.12.
Given a glove G p n, F , a q and a phylogenetic tree T , the Z n ` -graded piece I T X G p n, F , a q is the kernel of M n, F , a E T .Proof. Let S T “ C r a eg | g P Z { Z and e P E p T qs . Recall that I T is the kernel of ψ T : R n Ñ S T defined by q g ÞÑ ź A e | B e P Σ p T q a e ř i P Ae g i Now, fix a glove G p n, F , a q , and note that if q g q h P G p n, F , a q , then ř i P A e g i “ ř i P A e h i if and only if e P E T .Consider any polynomial f “ ř g P L p n, F , a q c g q g q h P G p n, F , a q . If we apply ψ T , we get the following. ψ T p f q “ ÿ g P L p n, F , a q c g ¨˝ ź e P E p T q a e ř i P Ae g i ˛‚¨˝ ź e P E p T q a e ř i P Ae h i ˛‚ “ ÿ g P L p n, F , a q c g ˜ź e R E a e a e ¸ ˜ ź e P E T p a e ř i P Ae g i q ¸ “ ˜ ź e R E T a e a e ¸ ÿ g P L p n, F , a q c g ˜ź e P E p a e ř i P Ae g i q ¸ The monomials, ś e P E T p a e ř i P Ae g i q , can be identified as standard basis vectors in C p Z { Z q E T . After makingthis identification, it becomes evident that ψ T p f q “ M F , a E T p f q “ (cid:3) Consider S n and its two underlying trees T and T , and fix any glove G p n, F , a q where either 1 is not in F or1 is in F but a “
1. Recall that T is obtained by deleting the reticulation edge that lies between the leaves e and e , and T is obtained by deleting the reticulation edge that lies between the leaves e and e n . Thedefining ideals for T and T are generated by quadratic binomials. Here we will show that the polynomials f c from Proposition 4.8 are either sums or differences of binomials coming from I T and I T . In the followingproposition, we only consider the case when n is even, at least 4, and F “ H since any other glove of theform stated can be obtained from this case. Proposition 4.13.
Let n P Z ě be even, and consider any polynomial f c “ q g p , q q h p , q ´ q g p c | E , q q h p c | E , q ` q g p c | E , c | O q q h p c | E , c | O q ´ q g p , c | O q q h p , c | O q in J n X G p n, Hq from Proposition 4.8. Then q g p c | E , q q h p c | E , q ´ q g p c | E , c | O q q h p c | E , c | O q g p , q q h p , q ´ q g p , c | O q q h p , c | O q are in I T X G p n, Hq , and q g p , q q h p , q ´ q g p c | E , q q h p c | E , q q g p c | E , c | O q q h p c | E , c | O q ´ q g p , c | O q q h p , c | O q are in I T X G p n, Hq . E is the set of even numbers between 2 and n ´ and O is the odd numbers in thesame range.Proof. Note that E “ E T and O “ E T . Then the claim follows by Theorem 4.12. (cid:3) Proof of Theorem 4.5.
Let f “ ř q g q h P G p n, F , a q c g , h q g q h with c g , h P C . We want to identify necessaryand sufficient conditions on the coefficients c g , h for f P J n . We analyze the gloves in three cases.(1) 1 R F (2) 1 P F and a “ P F and a “ Case 1: R F . First, note that for each monomial, q g q h , in G p n, F , a q , either g or h is 0. We will alwaysassume that g “
0, and we also remark that h is completely determined by g ; therefore, we will write c g instead of c g , h . We have the set L p n, F , a q as defined in Section 4 which in this case simplifies to L p n, F , a q “ t g P p Z { Z q n | there exists h so that q g q h P G p n, F , a q and g “ u . Now, we compute ψ n p f q . ψ n p f q “ ÿ g P L p n, F , a q c g ψ n p q g q h q“ ÿ g P L p n, F , a q c g ˜ n ź j “ a jg j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ź j “ a jh j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ “ ÿ g P L p n, F , a q c g ˜ź j P F p a jg j q ¸ ˜ź j R F a j a j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ The monomial ´ś j P F p a jg j q ¯ ´ś j R F a j a j ¯ depends only on F and a , so this can be factored out and denotedby m F , a . ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` g n ´ ź j “ a n ` j ř jℓ “ g ℓ ` a ng n ´ ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ a n ` h n ´ ź j “ a n ` j ř jℓ “ h ℓ ` a nh n ´ ź j “ a n ` j ř jℓ “ h ℓ ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` n ´ ź j “ a n ` j ř jℓ “ g ℓ ` a n n ´ ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ a n ` n ´ ź j “ a n ` j ř jℓ “ h ℓ ` a n n ´ ź j “ a n ` j ř jℓ “ h ℓ ¸ We proceed by multiplying these two binomials and make the following observations about the various sumsin the subscripts. ‚ ř jℓ “ g ℓ “ ř jℓ “ g ℓ since g “ ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has even cardinality. ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has odd cardinality. his yields the following. ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` a n ` ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ` a n ` a n ź j P E a n ` j a n ` j ź j P O p a n ` j ř jℓ “ g ℓ q ` a n ` a n ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ` a n a n ź j P E a n ` j a n ` j ź j P O p a n ` j ř jℓ “ g ℓ q ¸ Note that the following products depend only on F . We make these substitutions and proceed. m F , E : “ ź j P E a n ` j a n ` j m F , O : “ ź j P O a n ` j a n ` j ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` a n ` m F , O ź j P E p a n ` j ř jℓ “ g ℓ q ` a n ` a n m F , E ź j P O p a n ` j ř jℓ “ g ℓ q ` a n ` a n m F , O ź j P E p a n ` j ř jℓ “ g ℓ q ` a n a n m F , E ź j P O p a n ` j ř jℓ “ g ℓ q ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ p a n ` a n ` ` a n ` a n q m F , O ź j P E p a n ` j ř jℓ “ g ℓ q `p a n ` a n ` a n a n q m F , E ź j P O p a n ` j ř jℓ “ g ℓ q ¸ “ m F , a m F , O p a n ` a n ` ` a n ` a n q ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q ` m F , a m F ,e p a n ` a n ` a n a n q ÿ g P L p n, F , a q c g ź j P O p a n ` j ř jℓ “ g ℓ q In the last line, we note that the superscripts appearing in the sums are completely disjoint. Since the c g P C for every g P L p n, F , a q , the only way for ψ n p f q “ M n, F , a E and M n, F , a O from Section 4. By the definition of M n, F , a E the first sum vanishes if and only if f P ker M n, F , a E , andsimilarly the second sum vanishes if and only if f P ker M n, F , a O . It then follows that f P J n X G p n, F , a q ifand only if f lies in the intersection of these two kernels. Case 2: P F and a “ . First, note that for each monomial, q g q h , in G p n, F , a q , both g and h are 1.We will always assume that g ă lex h , i.e. g P L p n, F , a q . Again, we remark that h is completely determined y g ; therefore, we will write c g instead of c g , h . Now, we compute ψ n p f q . ψ n p f q “ ÿ g P L p n, F , a q c g ψ n p q g q h q“ ÿ g P L p n, F , a q c g ˜ n ź j “ a jg j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ź j “ a jh j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ “ ÿ g P L p n, F , a q c g ˜ź j P F p a jg j q ¸ ˜ź j R F a j a j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ The monomial ´ś j P F p a jg j q ¯ ´ś j R F a j a j ¯ depends only on F and a , so it can be factored out of the sum,and it will be denoted as m F , a . ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` g n ´ ź j “ a n ` j ř jℓ “ g ℓ ` a ng n ´ ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ a n ` h n ´ ź j “ a n ` j ř jℓ “ h ℓ ` a nh n ´ ź j “ a n ` j ř jℓ “ h ℓ ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` n ´ ź j “ a n ` j ř jℓ “ g ℓ ` a n n ´ ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ a n ` n ´ ź j “ a n ` j ř jℓ “ h ℓ ` a n n ´ ź j “ a n ` j ř jℓ “ h ℓ ¸ Now, we will proceed by multiplying all these terms out and regrouping using the following observationsabout the various sums in the subscripts. ‚ ř jℓ “ g ℓ “ ` ř jℓ “ g ℓ since g “ ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has even cardinality. ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has even cardinality. ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has odd cardinality. ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has odd cardinality.Then we get the following. ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` a n ` ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ` a n ` a n ź j P E a n ` j a n ` j ź j P O p a n ` j ř jℓ “ g ℓ q ` a n ` a n ź j P E a n ` j a n ` j ź j P O p a n ` j ř jℓ “ g ℓ q ` a n a n ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ¸ The following products depend only on F , so we give them names. m F , E : “ ź j P E a n ` j a n ` j m F , O : “ ź j P O a n ` j a n ` j hen we have the following. ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` a n ` m F , O ź j P E p a n ` j ř jℓ “ g ℓ q ` a n ` a n m F , E ź j P O p a n ` j ř jℓ “ g ℓ q ` a n ` a n m F , E ź j P O p a n ` j ř jℓ “ g ℓ q ` a n a n m F , O ź j P E p a n ` j ř jℓ “ g ℓ q ¸ “ m F , a m F , O p a n ` q ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q ` m F , a m F , O p a n q ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q ` m F , a m F , E a n ` a n ÿ g P L p n, F , a q c g ź j P O p a n ` j ř jℓ “ g ℓ q ` m F , a m F , E a n ` a n ÿ g P L p n, F , a q c g ź j P O p a n ` j ř jℓ “ g ℓ q In the final expression of the equation above, there are four sums. The monomials in the first two sums havethe same superscripts, and the monomials in the second two sums have the same superscripts. Moreover,these two sets of supersctipts are disjoint, so there can be no cancellation among these pairs of sums. Thus, ψ n p f q “ “ p a n ` q ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q ` p a n q ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q (1) 0 “ ÿ g P L p n, F , a q c g ź j P O p a n ` j ř jℓ “ g ℓ q ` ÿ g P L p n, F , a q c g ź j P O p a n ` j ř jℓ “ g ℓ q (2)In (1), there can be no cancellation among these two sums because of the coefficients p a n ` q and p a n q infront of the sums. The subscripts in each of these sums are all off by exactly 1; therefore, the first term is 0if and only if the second term is 0. In (2), the subscripts in each sum are also again off by exactly 1. In orderto show there is no cancellation among these sums, we will show that the monomials appearing in each sumare distinct. Lemma 4.14.
There are no distinct g , g P L p n, F , a q so that ř jℓ “ g ℓ “ ř jℓ “ g ℓ for all ď j ď n ´ sothat r j sz F has odd cardinality. In other words, in (2), the monomials in the two sums above are disjoint.Proof. Let t i , . . . , i m u “ r n ´ sz F . Suppose g , g P L p n, F , a q and ř jℓ “ g ℓ “ ř jℓ “ g ℓ for all j P t i , . . . , i m u .Since g “
1, we have ř jℓ “ g ℓ “ ` ř jℓ “ g ℓ for all j P t i , . . . , i m u . Since g | a “ g | a , we see that g i “ ` g i .However, this contradicts that g P L p n, F , a q . Since L p n, F , a q “ t g | q g q h P G p n, F , a q and g ă lex h u ,there is some h so that q g q h P G p n, F , a q , and since i R F , h i “ h ă lex g and g R L p n, F , a q . (cid:3) All this is to show that equations (1) and (2) reduce to the following equations. Thus, ψ n p f q “ “ ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q “ ÿ g P L p n, F , a q c g ź j P O p a n ` j ř jℓ “ g ℓ q ecalling the definitions of M n, F , a E and M n, F , a O , we see that f P J n X G p n, F , a q if and only if it lies in theintersection of ker M n, F , a E and ker M n, F , a O . Case 3: P F and a “ . Note that for each monomial, q g q h , in G p n, F , a q , g and h are 0. We willalways assume that g P L p n, F , a q .Now, we compute ψ n p f q . ψ n p f q “ ÿ g P L p n, F , a q c g ψ n p q g q h q“ ÿ g P L p n, F , a q c g ˜ n ź j “ a jg j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ź j “ a jh j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ “ ÿ g P L p n, F , a q c g ˜ź j P F p a jg j q ¸ ˜ź j R F a j a j ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ The monomial ´ś j P F p a jg j q ¯ ´ś j R F a j a j ¯ depends only on F and a , so we note this can be factored outand we denote it by m F , a . ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ n ´ ź j “ a n ` j ř jℓ “ g ℓ ` n ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ n ´ ź j “ a n ` j ř jℓ “ h ℓ ` n ź j “ a n ` j ř jℓ “ h ℓ ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` g n ´ ź j “ a n ` j ř jℓ “ g ℓ ` a ng n ´ ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ a n ` h n ´ ź j “ a n ` j ř jℓ “ h ℓ ` a nh n ´ ź j “ a n ` j ř jℓ “ h ℓ ¸ “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` n ´ ź j “ a n ` j ř jℓ “ g ℓ ` a n n ´ ź j “ a n ` j ř jℓ “ g ℓ ¸ ˜ a n ` n ´ ź j “ a n ` j ř jℓ “ h ℓ ` a n n ´ ź j “ a n ` j ř jℓ “ h ℓ ¸ Now, we will go through the tedious task of multiplying these two binomials. In order to simplify thecomputation, we make the following obsevations about the various sums in the subscripts. ‚ ř jℓ “ g ℓ “ ř jℓ “ g ℓ since g “ ‚ ř jℓ “ h ℓ “ ř jℓ “ h ℓ since h “ ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has even cardinality. ‚ ř jℓ “ g ℓ “ ř jℓ “ h ℓ if and only if r j sz F has even cardinality.With these observations, we get the following: ψ n p f q “ m F , a ÿ g P L p n, F , a q c g ˜ a n ` a n ` ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ` a n ` a n ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ` a n ` a n ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ` a n a n ź j P E p a n ` j ř jℓ “ g ℓ q ź j P O a n ` j a n ` j ¸ Note that the product ś j P O a n ` j a n ` j depends only on F and O , so we set it equal to m F , O . Then we havethe following. ψ n p f q “ m F , a m F , O p a n ` ` a n q ÿ g P L p n, F , a q c g ź j P E p a n ` j ř jℓ “ g ℓ q ecalling the definition of M n, F , a E , we see that ψ n p f q “ f P ker M n, F , a E .5. Algebraic Properties of Small Sunlet Networks
The -Sunlet Network. In this section, we use a toric initial ideal of J to show that S is normaland Gorenstein.We consider a monomial weighting w “ p w , w , w , w , w , w , w , w q , w ijkl P Z ofthe generators of the polynomial ring R “ C r q , q , q , q , q , q , q , q s which satisfiesthe following equalities and inequalities: w ` w “ w ` w ą w ` w , w ` w The associated initial ideal of J “ x q q ´ q q ` q q ´ q q y is generated by thebinomial q q ´ q q . Definition 5.1.
Let ∆ Ă R be the convex hull of the points p , , , , , q , p , , , , , q , p , , , , , q , p , , , , , q , p , , , , , q , p , , , , , q , p , , , , , q , and p , , , , , q . Let G Ă Z ` be thegraded semigroup obtained by taking the integral points in the cone P Ă R ` over ∆ ˆ t u Ă R ` . Proposition 5.2.
The initial algebra R { in w p J q is isomorphic to the affine semigroup algebra C r G s . Thelatter is normal and Gorenstein with a ´ invariant equal to ´ Proof.
The algebra C r G s is a polynomial ring in four variables t , t , t , and t over thesubalgebra A “ K r t , t , t , t s . The relations among the generators of the algebra A are generated by the relation t t ´ t t . It follows that K r G s is normal and Gorenstein.The canonical module of C r G s is isomorphic to the ideal generated by G X int p P q . In turn, this ideal isprincipal and generated by the degree 6 element t “ t t t t t t .We define a map φ : R Ñ C r G s as follows: q Ñ t q Ñ t q Ñ t q Ñ t q Ñ t q Ñ t q Ñ t q Ñ t The kernel of φ is seen to be in w p J q “ x q q ´ q q y . (cid:3) We can compute the weight of each generator of G along each edge of the four leaf network by mapping itto a monomial in R { in w p J q with φ . Let π i : G Ñ Z ě e ` Z ě e be the map which assigns an element u P G the weight along the i -th edge. The generator of the canonical module of K r G s corresponds to themonomial q q q q q q . This monomial has weight 3 e ` e on each edge in the 4-cycle.The algebra R { J is multigraded by the group p Z ě e ` Z ě e q . The multigrading is shared by thedegeneration C r G s , where it corresponds to the linear projection¯ π “ p π , π , π , π q : G Ñ p Z ě e ` Z ě e q . The image of ¯ π is the set Q Ă p Z ě e ` Z ě e q of p A e ` A e , B e ` B e , C e ` C e , D e ` D e q where A ` A “ B ` B “ C ` C “ D ` D and A ` B ` C ` D P Z . emark 5.3. Note that the multigrading by p Z ě e ` Z ě e q coincides with the grading by Z describedin Section 4.1 by sending p A e ` A e , B e ` B e , C e ` C e , D e ` D e q to p A ` A , A , B , C , D q .Fix p P Q , then the number of elements of G which map to p under ¯ π coincides with the value h R { J p p q of the multigraded Hilbert function of R { J . This value can be computed as follows. Let A “ A ` MIN t , p C ` D ´ A ´ B qu , B “ B ` MIN t , p C ` D ´ A ´ B qu , C “ C ´ MIN t , p C ` D ´ A ´ B qu , D “ D ´ MIN t , p C ` D ´ A ´ B qu , and E “ A ` A ` MIN t´ A ´ B , ´ C ´ D u , then h R { J p p q “ p MIN t A , B , C , D u ` MIN t , E u ` qp t A , B , C , D u ´ MIN t , E u ` q . The Hilbert series is given by H R { J p T q “ ` T p ´ T q . Now fix a 4-valent tree T , and let N be the network optained by gluing 4-sunlet networks together accordingto T . Let G T be the toric fiber product of E p T q according to the topology of T . The next propositionestablishes the basic properties of the semigroup algebra C r G T s and the network algebra C r q s{ I N . Proposition 5.4.
The semigroup G T is generated in degree . Its generators are the lattice points in anormal polytope ∆ T obtained as a fiber product polytope of E p T q copies of ∆ over the topology of T . Withthese generators, the semigroup algebra C r G T s is presented by a quadratic ideal, and is Gorenstein with a ´ invariant equal to ´ . Moreover, the algebra C r q s{ I N is normal, presented by quadratics, and Gorensteinwith a -invariant equal to ´ , and its Hilbert function agrees with Ehrhart polynomial of ∆ T .Proof. This is a consequence of Propositions 2.12 and 2.13. (cid:3)
The 5-Sunlet Network.
In this section, we focus on the 5-sunlet network S and its correspondingideal J . We describe the structure of its generating set and also discuss some properties of the ideal. Allcomputations for this section can be found in the macaulay2 file sunlet5.m2 .We first computed the ideal J by elimination with a degree bound. We computed a Gr¨obner basis for theelimination ideal up to degree 2 and then verified that the result was prime and of the correct dimensionwhich is 10. The dimension is obtained by computing the rank of the Jacobian of ψ S symbolically. As aresult we get that J “ x q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ` q q ´ q q ,q q ´ q q ,q q ´ q q y . e also computed the tropical variety explicitly. It has 252 maximal cones. Using sunlet5.m2 , we foundthat 116 of these maximal cones give prime toric initial ideals [18]. The toric varieties corresponding to these116 cones are all normal which was checked using normaliz. The following example showcases one of thesetoric degenerations. Example 5.5.
Consider the weight vector w “ p w , w , w , w , w , w , w , w ,w , w , w , w , w , w , w , w q“ p , , , , , , ´ , ´ , , ´ , ´ , ´ , ´ , ´ , ´ , ´ q Using gfan, we found that with respect to this weight vector, the polynomials in the left column form aGr¨obner basis for J , and the terms with the lowest weights are underlined. The polynomials in the rightcolumn are the corresponding initial forms which generate in w p J q . q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q ` q q ´ q q q q ´ q q q q ´ q q q q ´ q q q q ´ q q q q ´ q q The ideal, in w p J q , defines a toric variety which is parameterized by monomials whose exponent vectors arethe columns in the matrix below. This matrix was found using [28, Theorem 4]. In particular, the Fouriercoordinate generators of S are a Khovanskii basis of a valuation associated to the cone containing w , andthe convex hull of the columns of A in R is a Newton-Okounkov body of the sunlet variety V Ă P . A “ ¨˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˝ ˛‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‚ Using normaliz, we were able to show that the semigroup generated by the columns of A is saturated withrespect to the rank 10 sublattice of Z that they span; hence, this is a normal toric variety from which we an conclude that S is normal and Cohen-Macaulay. Moreover, the Hilbert series is given by H R { J p T q “ ` T ` T ` T ` T p ´ T q . Since the numerator is symmetric and since it is Cohen-Macaulay, [35, Theorem 4.4] shows that S isGorenstein. These computations can be found in sunlet5.m2 . One can also check S is Gorenstein bynoting that that the canonical module of C r N A s – R { in w p J q is generated by the following vector: p , , , , , , , , , , , , , , , q t . This exponent vector corresponds to the degree 6 monomial q q q q q q .In this last proposition, we record all the algebraic properties of S that we investigated in the previousexample, and we record that level-1 networks built from 4- and 5-sunlets are Cohen-Macaulay Proposition 5.6. S is a normal, Gorenstein variety. Its tropicalization has 252 maximal cones, 116 ofwhich yield prime binomial initial ideals. Corollary 5.7.
Any level-1 network built out of 4- and 5-sunlet networks is a normal Cohen-Macualayvariety.Proof.
Since 4- and 5-sunlet varieties are normal and Cohen-Macualay, combining Proposition 2.12 withProposition 2.10 shows that any level-1 network built out from 4- and 5-sunlet networks is normal andCohen-Macualay. (cid:3) Open Problems
In this section, we discuss some conjectures for which we have computational evidence and suggest somepossible techniques for solving them. We also provide some interesting open problems surrounding sunletnetwork ideals.One of the main drawbacks to the techniques used in Section 4 is that it only yields quadratic generatorsfor J n . For n -sunlet networks with 4 ď n ď
7, we have verified that their ideals are quadratically generated.This was done in Macaulay2 by showing that over Q , ker ψ n “ I n for n “ , , , and 7. Since we hadequality over Q , the ideals must still be equal after extending to the complex numbers. While we haveverified that J n is generated by quadratics for 4 ď n ď
7, it remains open as to whether these generate J n for n ě
8. For the CFN model, the ideals for trees are always generated by quadratics, and as we have seenthe quadratic invariants obtained for the sunlet ideals are built from invariants from the underlying trees;hence, we suspect that J n is always quadratically generated. Conjecture 6.1.
Let I n be the ideal generated by all quadratic invariants in J n . Then I n “ J n for all n ě . In order to prove Conjecture 6.1, it would be enough to show that I n is prime and of the correct dimension.To this end, we have the following conjecture which would prove Conjecture 6.1. Conjecture 6.2.
For n ě , dim J n “ n “ dim I n and I n is prime. A possible approach to proving that I n is prime is that taken in [31]. The main workhorse of their techniqueis the following lemma which was originally stated in [17, Proposition 23]. Lemma 6.3. [31, Lemma 2.5]
Let k be a field and J Ă k r x , . . . x n s be an ideal containing a polynomial f “ gx ` h with g, h not involving x and g a non-zero divisor modulo J . Let J “ J X k r x , . . . x n s be theelimination ideal. Then J is prime if and only if J is prime. his lemma can be used to create a descending chain of ideals each one involving one less variable. As longas a polynomial f of the required form can be found, then one can prove that the original ideal is prime byverifying that the last ideal in the chain is prime. For 4 ď n ď I n by repeatedlyeliminating variables in reverse lexicographic order until we are left with an ideal in only the variables q g such that g “
0. That is we build a chain I n Ą I p q n Ą ¨ ¨ ¨ Ą I p n ´ q n where I p j q n is obtained by eliminating the j th variable in reverse lexicographic order from I p j ´ q n and at eachstep we ensure that a polynomial f of the form described in Lemma 6.3 exists. Typically one would then needto verify that I p n ´ q n is prime but the following lemma shows there is no need for this. Our implementationof this can be found in the macaulay2 file primeDescent.m2 . Lemma 6.4.
Let I p n ´ q n “ I n X C r q g : g “ s . Then I p n ´ q n – I T where T is the tree obtained by deletingthe reticulation vertex of S n and all adjacent edges. This lemma implies that if one can always find a polynomial f of the desired form in each of the intermediateelimination ideal I p j q n then I n is prime since the last ideal I p n ´ q n is isomorphic to a tree ideal; thus, it mustbe prime.For the question of the dimension of J n , we have the following bound. Proposition 6.5.
For n ě it holds that n ´ ď dim p J n q ď n ` .Proof. First we note that J n is properly contained in the ideals I T and I T for the trees T and T that areobtained from S n by deleting reticulation edges. It is well known that each of these ideals has dim p I T i q “ n ´ J n is a prime ideal properly contained in these two primeideals which are not equal, we get the lower bound 2 n ´ ď dim p J n q . For the other bound recall that V S n can also be thought of as a projective variety the map ψ S n parameterizing J n can be thought of as a map ψ S n : ź e P E p S n q P Ñ P n ´ ´ where each copy of P in the domain corresponds to an edge of S n . This immediately implies that theprojective variety corresponding to S n has dimension at most E p S n q “ n and so dim p J n q ď n ` (cid:3) We also have that dim J n ď dim I n as I n Ď J n . Moreover, using the rank of Jacobian of ψ S n , we have shownfor 5 ď n ď J n is 2 n . We’ve also computed the rank of the Jacobian with randomvalues substituted in for the parameters for n up to 17. In each case we’ve found that the rank is also 2 n which means that dim p J n q “ n with probability 1 for 9 ď n ď
17. These computations can be found in thefile sunletDim.m2 .As we have seen in Section 5, the 4- and 5-sunlet networks are normal, Gorenstein varieties. We have notbeen able to show that S is Gorenstein; however, we have computed its Hilbert series which suggests it isindeed Gorenstein. H R { J p T q “ ` T ` T ` T ` T ` T ` T ` T ` T p ´ T q Therefore, to show that S has the Gorenstein property, it would be enough to show that it is Cohen-Macaulayby [35, Theorem 4.4]. Question 6.6. Is S n normal, Cohen-Macaulay, and Gorenstein for n ě ? n Example 5.5, we also saw that the generator of the canonical module had degree 3 e ` e for eachnon-reticulation leaf, while at the reticulation edge, it had degree 2 e ` e . Then Propositions 2.12 and2.13 imply the following proposition. Proposition 6.7.
Let N be a level-1 network obtained by gluing 4- and 5-sunlets along trees under thecondition that nothing is glued to a reticulation edge in a 5-sunlet. Then the phylogenetic variety V N isGorenstein. The fact that the reticulation edge for a 5-sunlet has a different degree than the 4-sunlet case does not meanthat other level-1 networks built out of 4- and 5-sunlets are not Gorenstein. It just means that some otherproof would be needed to show the Gorenstein property.As we have seen in Section 5, there are very well-behaved toric degenerations of S and S . In the case when n “
5, there are 116 cones in the tropical variety which yield normal toric varieties; however, most of themare somewhat less well-behaved than the one shown. For example, using the weight given in Example 5.5,one sees that the quadratic invariants produced in Section 4 actually form a Gr¨obner basis with respect tothis weight. This is a property that does not happen for most of the weights in the tropical variety. Moreover,the initial forms of these quadratic invariants are always invariants for at least one of the underlying trees T or T . To this end, we ask the following. Question 6.8.
For n ě , is there a weight vector w on R n for which in w p J n q is a prime binomial ideal?If so, can it be shown that there is a combinatorial rule for finding such a w where a Gr¨obner basis of J n with respect to w can be deduced combinatorially? This question is interesting even in the case when n “
6. If one was able to find a toric degeneration of S to a normal toric variety, then since the numerator of the Hilbert series is symmetric, one would also knowthat S is Gorenstein. Acknowledgments
Joseph Cummings and Christopher Manon were partially supported by Simons Collaboration Grant (587209).Benjamin Hollering was partially supported by the US National Science Foundation (DMS 1615660) andwould like to thank Seth Sullivant for many helpful conversations.
References [1] E. S. Allman and J. A. Rhodes. The identifiability of covarion models in phylogenetics.
IEEE/ACM Transactions onComputational Biology and Bioinformatics , 6(1):76–88, 2009.[2] Elizabeth S Allman, Sonia Petrovic, John A Rhodes, and Seth Sullivant. Identifiability of two-tree mixtures for group-basedmodels.
IEEE/ACM transactions on computational biology and bioinformatics , 8(3):710–722, 2010.[3] Elizabeth S. Allman and John A. Rhodes. Phylogenetic ideals and varieties for the general Markov model.
Adv. in Appl.Math. , 40(2):127–148, 2008.[4] Hector Ba˜nos. Identifying species network features from gene tree quartets under the coalescent model.
Bull. Math. Biol. ,81(2):494–534, 2019.[5] Hector Ba˜nos, Nathaniel Bushek, Ruth Davidson, Elizabeth Gross, Pamela E. Harris, Robert Krone, Colby Long, AllenStewart, and Robert Walker. Dimensions of group-based phylogenetic mixtures.
Bull. Math. Biol. , 81(2):316–336, 2019.[6] W. Bruns and J. Herzog.
Cohen-Macaulay rings , volume 39 of
Cambridge Studies in Advanced Mathematics . CambridgeUniversity Press, Cambridge, 1993.[7] David Bryant and Vincent Moulton. Neighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Net-works.
Molecular Biology and Evolution , 21(2):255–265, 02 2004.[8] Weronika Buczy´nska and Jaros law A. Wi´sniewski. On geometry of binary symmetric models of phylogenetic trees.
J. Eur.Math. Soc. (JEMS) , 9(3):609–635, 2007.[9] Marta Casanellas and Jes´us Fern´andez-S´anchez. Rank conditions on phylogenetic networks, 2020.[10] James A Cavender and Joseph Felsenstein. Invariants of phylogenies in a simple case with discrete states.
Journal ofclassification , 4(1):57–71, 1987.
11] Julia Chifman and Laura Kubatko. Quartet Inference from SNP Data Under the Coalescent Model.
Bioinformatics ,30(23):3317–3324, 08 2014.[12] Jane Ivy Coons and Seth Sullivant. Toric geometry of the Cavender-Farris-Neyman model with a molecular clock.
Adv. inAppl. Math. , 123:102119, 54, 2021.[13] Jan Draisma and Jochen Kuttler. On the ideals of equivariant tree models.
Math. Ann. , 344(3):619–644, 2009.[14] Nicholas Eriksson. Tree construction using singular value decomposition. In
Algebraic statistics for computational biology ,pages 347–358. Cambridge Univ. Press, New York, 2005.[15] Steven N. Evans and T. P. Speed. Invariants of some probability models used in phylogenetic inference.
Ann. Statist. ,21(1):355–377, 1993.[16] Jes´us Fern´andez-S´anchez and Marta Casanellas. Invariant Versus Classical Quartet Inference When Evolution is Hetero-geneous Across Sites and Lineages.
Systematic Biology , 65(2):280–291, 11 2015.[17] Luis David Garcia, Michael Stillman, and Bernd Sturmfels. Algebraic geometry of Bayesian networks.
J. Symbolic Comput. ,39(3-4):331–355, 2005.[18] Daniel R. Grayson and Michael E. Stillman. Macaulay2, a software system for research in algebraic geometry.[19] Elizabeth Gross and Colby Long. Distinguishing phylogenetic networks.
SIAM Journal on Applied Algebra and Geometry ,2(1):72–93, 2018.[20] Elizabeth Gross, Colby Long, and Joseph Rusinko. Phylogenetic Networks. arXiv e-prints , page arXiv:1906.01586, Jun2019.[21] Elizabeth Gross, Leo van Iersel, Remie Janssen, Mark Jones, Colby Long, and Yukihiro Murakami. Distinguishing level-1phylogenetic networks on the basis of data generated by markov processes, 2020.[22] J¨urgen Hausen, Christoff Hische, and Milena Wrobel. On torus actions of higher complexity.
Forum Math. Sigma , 7:PaperNo. e38, 81, 2019.[23] Michael D Hendy and David Penny. Complete families of linear invariants for some stochastic models of sequence evolution,with and without the molecular clock assumption.
Journal of Computational Biology , 3(1):19–31, 1996.[24] Benjamin Hollering and Seth Sullivant. Identifiability in phylogenetics using algebraic matroids.
J. Symbolic Comput. ,104:142–158, 2021.[25] Daniel H. Huson and Celine Scornavacca. A Survey of Combinatorial Methods for Phylogenetic Networks.
Genome Biologyand Evolution , 3:23–35, 11 2010.[26] Guohua Jin, Luay Nakhleh, Sagi Snir, and Tamir Tuller. Inferring Phylogenetic Networks by the Maximum ParsimonyCriterion: A Case Study.
Molecular Biology and Evolution , 24(1):324–337, 10 2006.[27] Guohua Jin, Luay Nakhleh, Sagi Snir, and Tamir Tuller. Maximum likelihood of phylogenetic networks.
Bioinformatics ,22(21):2604–2611, 08 2006.[28] Kiumars Kaveh and Christopher Manon. Khovanskii bases, higher rank valuations, and tropical geometry.
SIAM J. Appl.Algebra Geom. , 3(2):292–336, 2019.[29] James A Lake. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony.
Molecularbiology and evolution , 4(2):167–191, 1987.[30] Colby Long and Seth Sullivant. Identifiability of 3-class Jukes-Cantor mixtures.
Adv. in Appl. Math. , 64:89–110, 2015.[31] Colby Long and Seth Sullivant. Tying up loose strands: defining equations of the strand symmetric model.
J. Algebr. Stat. ,6(1):17–23, 2015.[32] Wayne P Maddison. Gene trees in species trees.
Systematic biology , 46(3):523–536, 1997.[33] John A. Rhodes and Seth Sullivant. Identifiability of large phylogenetic mixture models.
Bull. Math. Biol. , 74(1):212–231,2012.[34] Charles Semple, Mike Steel, et al.
Phylogenetics , volume 24. Oxford University Press on Demand, 2003.[35] R. P. Stanley. Hilbert functions of graded algebras.
Advances in Math. , 28(1):57–83, 1978.[36] Mike Steel.
Phylogeny: discrete and random processes in evolution . SIAM, 2016.[37] Bernd Sturmfels.
Gr¨obner bases and convex polytopes , volume 8 of
University Lecture Series . American MathematicalSociety, Providence, RI, 1996.[38] Bernd Sturmfels and Seth Sullivant. Toric ideals of phylogenetic invariants.
Journal of Computational Biology , 12(2):204–228, 2005.[39] Seth Sullivant. Toric fiber products.
J. Algebra , 316(2):560–577, 2007.[40] Seth Sullivant.
Algebraic statistics , volume 194 of
Graduate Studies in Mathematics . American Mathematical Society,Providence, RI, 2018.[41] Michael Syvanen. Horizontal gene transfer: evidence and possible consequences.
Annual review of genetics , 28(1):237–261,1994., 28(1):237–261,1994.