[PDF] Reciprocal Maximum Likelihood Degrees of Brownian Motion Tree Models

Abstract

We give an explicit formula for the reciprocal maximum likelihood degree of Brownian motion tree models. To achieve this, we connect them to certain toric (or log-linear) models, and express the Brownian motion tree model of an arbitrary tree as a toric fiber product of star tree models.

Full PDF

RRECIPROCAL MAXIMUM LIKELIHOOD DEGREES OFBROWNIAN MOTION TREE MODELS

TOBIAS BOEGE - JANE IVY COONS - CHRISTOPHER EURAIDA MARAJ - FRANK R ¨OTTGER

We give an explicit formula for the reciprocal maximum likelihooddegree of Brownian motion tree models. To achieve this, we connect themto certain toric (or log-linear) models, and express the Brownian motiontree model of an arbitrary tree as a toric ﬁber product of star tree models.

1. Introduction

Let T be a rooted tree on leaves 0 , . . . , n with the leaf labeled 0 as the root andwith all edges directed away from the root. We denote the set of leaves of T by Lv ( T ) = { , . . . , n } and the set of internal vertices of T by Int ( T ) . The out-degree of vertex v , denoted outdeg ( v ) , is the number of edges directed out of v .For two leaves i and j , denote their most recent common ancestor by lca ( i , j ) .We assume that T does not have any vertices of degree two.The Brownian motion tree model on T identiﬁes the non-root leaves of thetree with random variables that are jointly distributed according to a multivariate AMS 2010 Subject Classiﬁcation:

Keywords:

Brownian motion tree model, maximum likelihood degree, toric ﬁber productT.B. is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)– 314838170, GRK 2297 MathCoRe. J.I.C. is partially supported by the US National ScienceFoundation (DGE 1746939). C.E. is partially supported by the US National Science Foundation(DMS-2001854).Acknowledgements: We thank Piotr Zwiernik and Carlos Am´endola for suggesting the problem,and Tim Seynnaeve and Carlos Am´endola for helpful discussions. We also thank the organizersof the Linear Spaces of Symmetric Matrices working group at MPI MiS Leipzig. a r X i v : . [ m a t h . S T ] S e p T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER

Gaussian distribution with mean 0. To each vertex v , it assigns a parameter t v such that the covariance of two non-root leaves i and j is t lca ( i , j ) . In other words,this model is a linear Gaussian covariance model M T = L T ∩ S n > , where S n > is the set of n × n positive-deﬁnite matrices and L T is the subspace of the spaceof symmetric n × n matrices S n deﬁned by L T = { Σ ∈ S n | σ i j = σ kl if lca ( i , j ) = lca ( k , l ) } . An example tree and its induced covariance pattern are shown in Figure 1. Thismodel is a Wiener process along T , and was ﬁrst introduced by Felsenstein [3]to model trait evolution along phylogenetic trees. For background material onthis model and other methods for comparative phylogenetics, see [5]. See [10]for a detailed analysis of the geometry of this model.Figure 1: The given Brownian Motion Tree Model has reciprocal ML-degree . In this paper we study properties of the reciprocal maximum likelihood esti-mation problem for Brownian motion tree models. The log-likelihood function of a linear Gaussian covariance model with an empirical covariance S is thefunction (cid:96) S : S n > → R deﬁned by (cid:96) S ( Σ ) = − log det ( Σ ) − trace ( S Σ − ) . The maximum likelihood estimator is obtained by maximizing this log-likeli-hood function, which is equivalent to minimizing the Kullback-Leibler diver-gence KL ( Σ , S ) . To this optimization problem, one can associate a reciprocalproblem which minimizes the “wrong” KL divergence KL ( S , Σ ) . This is equiv-alent to maximizing the reciprocal log-likelihood function : (cid:96) ∨ S ( Σ ) = log det ( Σ ) − trace ( S − Σ ) . ECIPROCAL ML-DEGREES OF BROWNIAN MOTION TREE MODELS

Deﬁnition 1.1 (ML degree) . The maximum likelihood degree of the model M T ,denoted mld ( M T ) , is the number of non-singular complex critical points of (cid:96) S in parameters from the model M T , counted with multiplicity, for genericsymmetric S . The reciprocal maximum likelihood degree , denoted rmld ( M T ) ,is deﬁned analogously using the reciprocal likelihood (cid:96) ∨ S in place of (cid:96) S . Remark 1.2.

In a Brownian motion tree model, the space of covariance matri-ces has a linear structure in the model parameters. Linear concentration models,where the space of inverse covariance matrices is linear, are also popular inapplications; for instance, see [4]. This differing choice of perspective inter-changes the notions of mld and rmld. As a result, our terminology of mld andrmld agrees with [8], and is the opposite of [4].Knowledge of the ML-degree is useful for numerical methods in maximumlikelihood estimation [7, 8]. Our main result is a formula for the reciprocalML-degree for Brownian motion tree models.

Theorem 1.3.

The reciprocal ML-degree of the Brownian motion tree model M T is rmld ( M T ) = ∏ v ∈ Int ( T ) ( outdeg ( v ) − outdeg ( v ) − ) . For example, the reciprocal ML-degree of the tree model in Figure 1 is 16,since the out-degrees of its two internal vertices are both 3.Our proof of Theorem 1.3 broadly consists of three steps. In Section 2,we give preliminary deﬁnitions and theorems regarding toric models and thetoric structure of the Brownian motion tree model as described in [10]. Then weshow that the reciprocal maximum likelihood estimation problem in a Brownianmotion tree model is equivalent to the standard maximum likelihood estimationproblem of a toric model. In Section 3 we show that this toric model has a toricﬁber product structure as described in [11], which implies that its ML-degree isthe product of the ML-degrees of the models associated to two subtrees [2]. InSection 4 we show that the reciprocal ML-degree of the Brownian motion treemodel on a star tree with n + n − n −

1, which serves as the basecase for induction that completes the proof of Theorem 1.3.

2. Toric Models A toric model , also known as a log-linear model , is a discrete statistical modelwhose Zariski closure is a toric variety [12, Deﬁnition 6.2.1]. As such, it has a T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER monomial parametrization, which is represented by an integral matrix A ∈ Z d × m called its design matrix . Its columns a , . . . , a m deﬁne the monomial map φ A : C [ p , . . . , p m ] → C [ t ± , . . . , t ± d ] which sends p i (cid:55)→ t a i . (1)We denote by I ( A ) ⊂ C [ p ] the kernel of this map, and write V ( I ( A )) ⊆ C m forthe toric afﬁne subvariety deﬁned by I ( A ) .The maximum likelihood degree of a discrete statistical model is the numberof complex critical points of the log-likelihood function counted with multiplic-ity [1]. In the case of toric models, it is the number of intersection points ofthe toric variety V ( I ( A )) with a speciﬁc afﬁne linear space of complementarydimension. Proposition 2.1. [1, Proposition 7]

The maximum likelihood degree of a toricmodel M ( A ) with the design matrix A ∈ Z d × m is the number of solutions p ∈ V ( I ( A )) \ V ( p . . . p m ( ∑ mi = p i )) satisfying A p = A u for generic data u ∈ C m , counted with multiplicity. In this section, we show that the reciprocal ML-degree of a Brownian motiontree model is equal to the ML-degree of a toric model. Let L − T be the Zariskiclosure of { Σ − ∈ S n | Σ ∈ L T invertible } . Our starting point is a result from[10] which states that L − T is toric under a linear change of coordinates.Let L − T ⊂ S n with coordinates K = ( k i j ) ≤ i ≤ j ≤ n . Deﬁne new coordinates p = ( p i j ) ≤ i < j ≤ n with change of coordinates p ( K ) given by p i j = − k i j for 1 ≤ i < j ≤ n , and p i = n ∑ j = k i j for 1 ≤ i ≤ n . (2)Let A T ∈ Z ( | Vert ( T ) |− ) × ( n + ) be the matrix with rows corresponding to non-rootvertices of T and columns to pairs of leaves in T , deﬁned by A T ( v , { i , j } ) =  v = i or v = j , v = lca ( i , j ) , Theorem 2.2. [10, Theorem 1.2 & Equation (11)]

Let L − T be the Zariski clo-sure of { Σ − ∈ S n | Σ ∈ L T invertible } . After the linear change of coordinates p ( K ) , the variety L − T is toric with deﬁning matrix A T . ECIPROCAL ML-DEGREES OF BROWNIAN MOTION TREE MODELS A T of the tree T in Figure 1. We can nowstate the main result of this section. Theorem 2.3.

For a rooted tree T , the reciprocal ML-degree of the Brownianmotion tree model on T and the ML-degree of the toric model M ( A T ) are bothequal to the degree of V ( I T ) ∩ V ( (cid:104) A T p − A T u (cid:105) ) for a generic choice of u . We prepare the proof with two lemmas. The ﬁrst lemma is a standard com-putation in the maximum likelihood estimation of linear covariance models. Fora proof, see [8, Proposition 3.3] or [9, Equation (11)]. Endow the space of sym-metric matrices S n with the standard inner product (cid:104) A , B (cid:105) = trace ( AB ) . For alinear subspace L ⊆ S n , denote by L ⊥ its orthogonal complement. Lemma 2.4.

The reciprocal ML-degree of the linear covariance model speciﬁedby L is the number of solutions, counted with multiplicity, to the equations Σ ∈ L , Σ K = Id , and K − S − ∈ L ⊥ in the · (cid:0) n + (cid:1) entries of Σ and K, for a generic choice of a sample concentrationmatrix S − . The next lemma is a general geometric observation.

Lemma 2.5.

Let X ⊂ C n be a d-dimensional afﬁne subscheme such that no d-dimensional irreducible component of X is contained in a hypersurface H ⊂ C n .Let L ⊂ C n be a linear subspace of dimension n − d. Then, for a general w ∈ C n / L, the intersection X ∩ ( L + w ) lies in X \ H.Proof.

Since no d -dimensional component of X is contained in H , we havedim ( X ∩ H ) < d . The algebraic subset Z : = { w ∈ C n / L | ( X ∩ H ) ∩ ( L + w ) (cid:54) = /0 } is the image of the restriction π | X ∩ H of the projection map π : C n → C n / L to X ∩ H , since π | X ∩ H maps x ∈ X ∩ H to the w ∈ C n / L satisfying x ∈ ( X ∩ H ) ∩ ( L + w ) . Hence, we have dim Z ≤ dim ( X ∩ H ) < d = dim ( C n / L ) . Thus, the set ( C n / L ) \ Z is a nonempty Zariski dense subset of C n / L , and any general w ∈ C n such that w ∈ ( C n / L ) \ Z satisﬁes X ∩ ( L + w ) ⊂ X \ H . Proof of Theorem 2.3.

Lemma 2.4 states that the reciprocal ML-degree of M T is the number of invertible matrices K such that K ∈ L − T and K − W ∈ L ⊥ T for aﬁxed generic W ∈ S n . By Theorem 2.2, the ﬁrst condition K ∈ L − T is equivalentto p ( K ) ∈ V ( I T ) . The second condition K − W ∈ L ⊥ T is equivalent to ∑ ≤ i ≤ j ≤ n lca ( i , j )= v ( k i j − w i j ) = v ∈ Vert ( T ) \ { } . T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER

Let u = p ( W ) . This linear system is equivalent to ∑ ≤ i ≤ j ≤ n lca ( i , j )= v ( p i j − u i j ) = v ∈ Int ( T ) , and n ∑ j = j (cid:54) = i ( p i j − u i j ) = i ∈ Lv ( T ) \ { } . (4)This can be written as A T p − A T u = with A T as deﬁned in Equation (3). There-fore the reciprocal ML-degree of the Brownian motion tree model on T is thedegree of the subscheme (cid:0) V ( I T ) ∩ V ( (cid:104) A T p − A T u (cid:105) ) (cid:1) \ V ( det K ) ⊂ C ( n + ) for a generic u where det K is written as a polynomial in the p coordinates.Similarly, writing H for the union of hyperplanes V (( ∑ i , j p i j ) ∏ i , j p i j ) , we havefrom Proposition 2.1 that the ML-degree of the toric model M ( A T ) is the degreeof the subscheme (cid:0) V ( I T ) ∩ V ( (cid:104) A T p − A T u (cid:105) ) (cid:1) \ H ⊂ C ( n + ) . Note that V ( I T ) is contained in neither V ( det K ) nor H . Indeed, the matrix of allones is in V ( I T ) \ H and the identity matrix is in V ( I T ) \ V ( det K ) . Lemma 2.5thus implies that for a generic u , the hypersurfaces V ( det K ) and H do not in-tersect V ( I T ) ∩ V ( (cid:104) A T p − A T u (cid:105) ) . Therefore the reciprocal ML-degree of theBrownian motion tree model of T and the ML-degree of M ( A T ) are both equalto the degree of V ( I T ) ∩ V ( (cid:104) A T p − A T u (cid:105) ) .

3. Toric Fiber Products

To compute the ML-degree of the toric model M ( A T ) , we show in this sectionthat I T can be written as a toric ﬁber product of the ideals of two smaller trees,and consequently deduce that the ML-degree of M ( A T ) is a product of the ML-degrees of the toric models on these subtrees. For background on the toric ﬁberproduct construction, see [11].We start by introducing a new parametrization of I T that makes the toric ﬁberproduct structure more apparent. This parametrization is given by the matrix B T deﬁned as follows. Since every vertex of T except for the root has in-degree 1,we label each edge of T by e ( v ) where v is the vertex of T that e ( v ) is directedinto. Let E ( T ) denote the edge set of T , and let P ( i , j ) ⊂ E ( T ) denote the set ECIPROCAL ML-DEGREES OF BROWNIAN MOTION TREE MODELS T between two leaves i and j . Deﬁne thematrix B T ∈ Z E ( T ) × ( n + ) by B T ( e , { i , j } ) = (cid:40) e ∈ P ( i , j ) ,0 otherwise. Proposition 3.1.

For a rooted tree T , one has rowspan ( A T ) = rowspan ( B T ) . Inparticular, the ideals I ( A T ) and I ( B T ) are equal.Proof. We show that matrix B T can be obtained by applying elementary rowoperations to A T . Let a vT denote the row of A T corresponding to vertex v , andlet b e ( v ) T be the row in B T for edge e ( v ) . For vertex v , let desLv ( v ) be the setof all leaves descended from v , and let desInt ( v ) be the set of internal verticesdescended from v . The following holds. b e ( v ) T = ∑ k ∈ desLv ( v ) a kT − ∑ k ∈ desInt ( v ) a kT . (5)Note that when v is a leaf, b e ( v ) T = a vT . The reader may wish to consult Example3.2 at this time.Indeed, the edge e ( v ) is in the unique shortest path between leaves i and j if and only if exactly one of these leaves is a descendent of v . Without loss ofgenerality, let i be this leaf. Then i is in fact the only vertex descended from v with nonzero i j -coordinate in row vectors a iT appearing in Equation (5). So the i j -coordinate of the right-hand side of Equation (5) is equal to 1. Now, supposethat e ( v ) is not in the unique shortest path between leaves i and j . There are twocases to consider; either both i and j are descended from v , or neither of themare. In the former case, the vertices k descended from v with non-zero entriesin the i j -coordinate of a kT are i , j and lca ( i , j ) . Hence, the i j -coordinate of theright-hand side of Equation (5) is 0. In the latter case, if both i and j are notdescended from v , their least common ancestor is not in desInt ( v ) . Hence, theright-hand side of Equation (5) is 0. Example 3.2.

The matrix A T for the tree in Figure 1 is

01 02 03 04 05 12 13 14 15 23 24 25 34 35 45   . T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER

The matrix B T for the tree in Figure 1 is

01 02 03 04 05 12 13 14 15 23 24 25 34 35 45   e ( ) e ( ) e ( ) e ( ) e ( ) e ( ) e ( ) . The following are the linear combinations of Equation (5). b e ( i ) T = a iT for i = , , , , , b e ( ) T = a T + a T + a T − a T = b T + b T + b T − a T , and b e ( ) T = a T + a T + a T + a T − a T − a T = b T + b T + b T − a T . In our computation of toric ﬁber products, it will be necessary to considerthe ideal I ( B T ) ⊂ C [ p i j | ≤ i < j ≤ n ] in a ring with one extra variable. Moreprecisely, let B (cid:63) T be the matrix with rows indexed by E ∪ { (cid:63) } and columns in-dexed by pairs of elements of { , . . . , n } and the symbol (cid:63) , whose entries aregiven by B (cid:63) T ( e , { i , j } ) = B T ( e , { i , j } ) , B (cid:63) T ( e , (cid:63) ) = e ∈ E , B (cid:63) T ( (cid:63), { i , j } ) = { i , j } ⊂ { , . . . , n } and B (cid:63) T ( (cid:63), (cid:63) ) =

1. In other words, B (cid:63) T is obtainedfrom B T by adding a column of all zeros and then a row of all ones. Remark 3.3.

Since the all-ones row vector is in rowspan ( B T ) , the all-onesrow b (cid:63) T in B (cid:63) T can be replaced by the row consisting of all zeros except for the1 in the (cid:63) column without changing the ideal I ( B (cid:63) T ) . Thus, the ideal I ( B (cid:63) T ) isthe extension of the ideal I ( B T ) ⊂ C [ p i j | i , j ∈ Lv ( T )] in the ring with one extravariable C [ p (cid:63) , p i j | i , j ∈ Lv ( T )] . Consequently, the ML-degree of I ( B (cid:63) T ) is equalto that of I ( B T ) .Let us now consider a rooted tree T built from two smaller trees in the fol-lowing way. Let S m be the rooted star tree ; that is, S m is a tree with a uniqueinternal vertex on m + T (cid:48) be an arbitrary rooted tree. Let T beobtained from T (cid:48) and S m by identifying a distinguished leaf edge of T (cid:48) with theroot edge of S m . More precisely, let (cid:96) be a distinguished leaf of T (cid:48) with directancestor h . Label the root leaf of S m by h and let (cid:96) label the unique internalvertex of S m . We obtain T from T (cid:48) and S m by identifying the vertices labeled h and (cid:96) and the edge between them. Figure 2 illustrates such a procedure. Byidentifying vertices 6 and 7 in the two trees, one obtains the tree in Figure 1. ECIPROCAL ML-DEGREES OF BROWNIAN MOTION TREE MODELS

Identifying vertices and in these trees produces the tree in Figure 1 Let C [ p ] = C [ p i , j | i , j ∈ ( Lv ( T (cid:48) ) ∪ Lv ( S m )) \ { h , (cid:96) } , i (cid:54) = j ] , C [ q ] = C [ q i , j | i , j ∈ Lv ( T (cid:48) ) , i (cid:54) = j ] and C [ r ] = C [ r i , j | i , j ∈ Lv ( S m ) , i (cid:54) = j ] We will show that theideal, I ( B T ) ⊂ C [ p ] is a toric ﬁber product of the two ideals I ( B (cid:63) T (cid:48) ) ⊂ C [ q (cid:63) , q ] and I ( B (cid:63) S m ) ⊂ C [ r (cid:63) , r ] . Following the deﬁnition of the toric ﬁber product in [11], weassign a multigrading to the indeterminates of the polynomial rings associatedto T (cid:48) and S m as follows. Assign the following multidegrees to the variables of C [ q (cid:63) , q ] deg ( q (cid:63) ) = [ , , ] , deg ( q i , j ) = (cid:40) [ , , ] if i , j (cid:54) = (cid:96), [ , , ] if i = (cid:96) or j = (cid:96). Similarly, letdeg ( r (cid:63) ) = [ , , ] , deg ( r i , j ) = (cid:40) [ , , ] if i , j (cid:54) = h , [ , , ] if i = (cid:96) or j = h . Finally, let deg ( p i , j ) =  [ , , ] if i , j ∈ L ( T (cid:48) ) , [ , , ] if i , j ∈ L ( S m ) , [ , , ] otherwise.Then the matrix A whose rows are these multigrading vectors is the 3 × R T (cid:48) = C [ q (cid:63) , q ] / I ( B (cid:63) T (cid:48) ) and let R S m = C [ r (cid:63) , r ] / I ( B (cid:63) S m ) . With respect tothese multigradings, the toric ﬁber product of I ( B (cid:63) T (cid:48) ) and I ( B (cid:63) S m ) , denoted as0 T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER I ( B (cid:63) T (cid:48) ) × A I ( B (cid:63) S m ) is the kernel of the map, ψ T (cid:48) , S m : C [ p ] → R T (cid:48) ⊗ C R S m  p i , j (cid:55)→ q i , j ⊗ r (cid:63) if i , j ∈ Lv ( T (cid:48) ) \ { (cid:96) } , p i , j (cid:55)→ q (cid:63) ⊗ r i , j if i , j ∈ Lv ( S m ) \ { h } , and p i , j (cid:55)→ q i ,(cid:96) ⊗ r h , j , if i ∈ Lv ( T (cid:48) ) \ { (cid:96) } and j ∈ Lv ( S m ) \ { h } . Remark 3.4.

Combinatorially, this operation corresponds to including pathsbetween leaves of the smaller trees T (cid:48) and S m into T . Paths whose leaves areboth in T (cid:48) or S m remain the same, whereas we glue together paths in T (cid:48) and S m with endpoints (cid:96) and h respectively along their common edge. Theorem 3.5.

With the notation as above, we have I ( B T ) = I ( B (cid:63) T (cid:48) ) × A I ( B (cid:63) S m ) . Proof.

We may rewrite the map deﬁning the toric ﬁber product as ψ T (cid:48) , S m : C [ p ] → C [ t (cid:63) , t e | e ∈ E ( T )] p i , j (cid:55)→ t (cid:63) (cid:0) ∏ e ∈ P ( i , j ) ∩ E ( T (cid:48) ) t e (cid:1) t (cid:63) (cid:0) ∏ e ∈ P ( i , j ) ∩ E ( S m ) t e (cid:1) . Note that t (cid:63) and t e ( (cid:96) ) are always squared in the image of this map. Indeed, t (cid:63) isa factor of each p i j . The parameter t e ( (cid:96) ) does not appear as a factor of p i j whenthe path P ( i , j ) lies entirely within T (cid:48) or S m . When i is a leaf of T (cid:48) and j is aleaf of S m (or vice versa), t e ( (cid:96) ) divides p i j . So we may replace the parameters t (cid:63) and t e ( (cid:96) ) with their square roots without changing the kernel of ψ T (cid:48) , S m . Sincethe row of all ones is in rowspan ( B T ) , this is equal to the kernel of the map φ B T associated to B T as in Equation (1). Corollary 3.6.

The ML-degree of I ( B T ) is equal to the product of the ML-degrees for I ( B T (cid:48) ) and I ( B S m ) .Proof. From [2, Theorem 5.5], the ML-degree of the toric ﬁber product of twotoric models is the product of the ML-degrees of the models. Thus, Theorem 3.5implies that the ML-degree of I ( B T ) is equal to the product of the ML-degreesof I ( B (cid:63) T (cid:48) ) and I ( B (cid:63) S m ) . This is equal to the product of the ML-degrees of I ( B T (cid:48) ) and I ( B S m ) by Remark 3.3.

4. Reciprocal ML-degree of star tree models A star tree S n is a tree on leaves { , . . . , n } with a unique internal vertex. Wecompute the reciprocal ML-degree of star tree models in the following theorem.This serves as the basis of induction in the proof of the main theorem. ECIPROCAL ML-DEGREES OF BROWNIAN MOTION TREE MODELS Theorem 4.1.

The reciprocal maximum likelihood degree of the Brownian mo-tion star tree model on n + leaves is equal to n − n − . In preparation of the proof, let I n be the deﬁning ideal of the toric variety L − S n in the p coordinates as given in Equation (2). By Proposition 3.1, the ideal I n is equal to the ideal I ( B S n ) , where the matrix B S n ∈ Z ( n + ) × ( n + ) as deﬁnedin Section 3 has columns { e i + e j ∈ Z n + | ≤ i < j ≤ n } . In other words, theideal I n is the toric ideal of the second hypersimplex, for which the followingfacts are well-known. Theorem 4.2.

The following hold for the toric ideal I n .(a) [6, Theorem 2.1] The ideal I n ⊂ C [ p ] is generated by the quadricsp i j p kl − p ik p jl , for distinct i , j , k , l ∈ { , . . . , n } . (b) [6, Theorem 2.3] The degree of V ( I n ) , as a projective variety in P ( n + ) − ,is equal to n − n − . Along with the above Theorem 4.2, the following will be a key step in theproof of Theorem 4.1.

Lemma 4.3.

The varieties L ⊥ S n and L − S n in S n intersect only at the zero matrix.Proof. Let K ∈ S n be in the intersection L ⊥ S n ∩ L − S n , and write ( p i j ) ≤ i < j ≤ n forthe resulting coordinates after the change of coordinates in Equation (2). Let P be an n × n symmetric matrix with diagonal entries p , . . . , p n and the off-diagonal entries p i j for 1 ≤ i < j ≤ n .The equations for K ∈ L ⊥ S n in terms of coordinates in P , as previously com-puted in Equation (4), are equivalent to p + · · · + p n = n ∑ i = i (cid:54) = j p i j = , for j = , . . . , n . In other words, the trace of P and every row sum of P are zero.The condition K ∈ L − S n is equivalent to P ∈ V ( I n ) , again by Theorem 2.2.The explicit set of generators for I n given in Theorem 4.2.(a) can be describedin the following way: For 1 ≤ i < j ≤ n , deﬁne Q i j to be the 2 × ( n − ) matrixobtained by(i) taking the i-th and j-th row of P to make a 2 × n matrix,(ii) then converting the square submatrix (cid:20) p i p i j p i j p j (cid:21) to (cid:20) p i p i j p j p i j (cid:21) ,2 T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER (iii) and then erasing the column (cid:20) p i j p i j (cid:21) .The generators for I n in Theorem 4.2 are the 2 × Q i j as i j rangesover for all 1 ≤ i < j ≤ n . Since the row sums of P must be zero, we have thatboth row sums of Q i j are equal to − p i j . Thus, that the rank of Q i j is at most1 implies that if p i j (cid:54) =

0, then p il = p jl for all l = , . . . , n . As a result, if weconsider the graph G on vertices { , . . . , n } where ( i , j ) is an edge in G if andonly if p i j (cid:54) =

0, we have:1. Connected components of G are complete graphs, and2. for any i (cid:54) = j belonging to a common connected component of G , all the p i j share a common value.Thus, after relabeling, the matrix P is a block diagonal matrix, each block havingthe form of a ( m + ) × ( m + ) matrix:  − ma a . . . aa − ma . . . a ... ... . . . ... a a . . . − ma  . Suppose there are many blocks, say of sizes m + , . . . , m (cid:96) +

1. Take Q i j with i = m a and j = m b , for 1 ≤ a < b ≤ (cid:96) . Then Q i j = (cid:20) . . . a . . . a m a a . . . . . . . . . . . . m b b b . . . b . . . (cid:21) . For Q i j to have all vanishing 2 × a and b need be zero.Hence, there can be at most one block with non-zero entries. If there is only oneblock, then trace ( P ) = a = P is the zero matrix. Wethus conclude that P is the zero matrix. Proof of Theorem 4.1.

For T = S n , Theorem 2.3 states that the reciprocal ML-degree of M S n is equal to to degree of V ( I n ) ∩ V ( (cid:104) A T p − A T u (cid:105) ) as an afﬁnesubscheme of C ( n + ) for a generic u . Let us consider the intersection of theirrespective projective closures. That is, we homogenize the ideals I n and (cid:104) A T p − A T u (cid:105) ⊂ C [ p i j | ≤ i < j ≤ n ] by an extra variable p (cid:63) . As the ideal I n is al-ready homogeneous, the resulting homogenization I n is the extension of I n in C [ p (cid:63) , p i j | ≤ i < j ≤ n ] , and (cid:104) A T p − A T u (cid:105) homogenizes to (cid:104) A T p − p (cid:63) A T u (cid:105) . Asprojective varieties in P ( n + ) , the intersection of V ( I n ) with the linear subvariety V ( (cid:104) A T p − p (cid:63) A T u (cid:105) ) is the degree of V ( I n ) . Since V ( I n ) is the projective cone ECIPROCAL ML-DEGREES OF BROWNIAN MOTION TREE MODELS V ( I n ) considered as a projective variety in P ( n + ) − , we thus conclude fromTheorem 4.2.(b) that the degree of the intersection V ( I n ) ∩ V ( (cid:104) A T p − p (cid:63) A T u (cid:105) ) is 2 n − n − V ( I n ) ∩ V ( (cid:104) A T p − p (cid:63) A T u (cid:105) ) hasno point in the hyperplane at inﬁnty { p (cid:63) = } . When p (cid:63) =

0, the equationsdeﬁning the intersection are exactly the ones deﬁning intersection L − S n ∩ L ⊥ S n ,which only consists of the zero matrix by Lemma 4.3. Hence, the intersection V ( I n ) ∩ V ( (cid:104) A T p − p (cid:63) A T u (cid:105) ) is empty if p (cid:63) =

0, as desired.We can now prove the main result of the paper.

Proof of Theorem 1.3.

We induct on the number of internal vertices of T . When T has one internal vertex v , it is a star tree. So by Theorem 4.1, the dual ML-degree of M T is 2 outdeg ( v ) − outdeg ( v ) − T with at least two internal vertices. Choose (cid:96) to be one ofthe internal vertices of T that has only leaves as direct descendants. Let h be theunique direct ancestor of (cid:96) . Take S outdeg ( (cid:96) ) to be the rooted star tree with internalvertex h , root leaf (cid:96) , and the remaining leaves are exactly the descendants of (cid:96) in T . Take T (cid:48) to be the rooted tree obtained by removing from T all leavesdescendent of (cid:96) . Identifying h and (cid:96) in S outdeg ( h ) and T (cid:48) gives back the tree T . Moreover, we have that Int ( T ) = Int ( T (cid:48) ) ∪ { (cid:96) } . By Theorem 3.6 and theinductive hypothesis, the dual ML-degree of M T isrmld ( M T ) = ( outdeg ( (cid:96) ) − outdeg ( (cid:96) ) − ) ∏ v ∈ Int ( T (cid:48) ) ( outdeg ( v ) − outdeg ( v ) − )= ∏ v ∈ Int ( T ) ( outdeg ( v ) − outdeg ( v ) − ) , as desired. REFERENCES [1] C. Am´endola, N. Bliss, I. Burke, C. R. Gibbons, M. Helmer, S. Hos¸ten, E. D.Nash, J. I. Rodriguez, and D. Smolkin. The maximum likelihood degree of toricvarieties. Journal of Symbolic Computation , 92:222–242, May 2019.[2] C. Am´endola, D. Kosta, and K. Kubjas. Maximum likelihood estimation of toricfano varieties. arXiv:1905.07396 , 2019.[3] J. Felsenstein. Maximum-likelihood estimation of evolutionary trees from contin-uous characters.

American journal of human genetics , 25(5):471, 1973. T. BOEGE - J.I. COONS - C. EUR - A. MARAJ - F. R ¨OTTGER [4] C. Fevola, Y. Mandelshtam, and B. Sturmfels. Pencils of quadrics: Old and new. arXiv:2009.04334 , 2020.[5] P. H. Harvey and M. D. Pagel.

The Comparative Method in Evolutionary Biology .Oxford University Press, 1991.[6] J.A. De Loera, B. Sturmfels, and R.R. Thomas. Gr¨obner bases and triangulationsof the second hypersimplex.

Combinatorica , 15(3):409–424, 1995.[7] A. J. Sommese, C. W. Wampler, et al.

The Numerical solution of systems ofpolynomials arising in engineering and science . World Scientiﬁc, 2005.[8] B. Sturmfels, S. Timme, and P. Zwiernik. Estimating linear covariance modelswith numerical nonlinear algebra. arXiv:1909.00566 , 2019.[9] B. Sturmfels and C. Uhler. Multivariate Gaussian, semideﬁnite matrix completion,and convex algebraic geometry.

Annals of the Institute of Statistical Mathematics ,62(4):603–638, 2010.[10] B. Sturmfels, C. Uhler, and P. Zwiernik. Brownian motion tree models are toric. arXiv:1902.09905 , 2019.[11] S. Sullivant. Toric ﬁber products.

Journal of Algebra , 316(2):560–577, 2007.[12] S. Sullivant.

Algebraic statistics , volume 194 of

Graduate Studies in Mathematics .American Mathematical Society, Providence, RI, 2018.

TOBIAS BOEGEFakult¨at f¨ur MathematikOtto-von-Guericke-Universit¨at Magdeburge-mail: [email protected]

JANE IVY COONSDepartment of MathematicsNorth Carolina State Universitye-mail: [email protected]

CHRISTOPHER EURDepartment of MathematicsStanford Universitye-mail: [email protected]

AIDA MARAJMax Planck Institute for Mathematics in the Sciencese-mail: [email protected]