[PDF] Maximum Likelihood for Dual Varieties

Abstract

Maximum likelihood estimation (MLE) is a fundamental computational problem in statistics. In this paper, MLE for statistical models with discrete data is studied from an algebraic statistics viewpoint. A reformulation of the MLE problem in terms of dual varieties and conormal varieties will be given. With this description, the dual likelihood equations and the dual MLE problem are defined. We show that solving the dual MLE problem yields solutions to the MLE problem, so we can solve the MLE problem without ever determining the defining equations of the model.

Full PDF

aa r X i v : . [ m a t h . S T ] M a y Maximum Likelihood for Dual Varieties

Jose Israel Rodriguez ∗

19 May 2014

Abstract

Maximum likelihood estimation (MLE) is a fundamental computational problemin statistics. In this paper, MLE for statistical models with discrete data is studiedfrom an algebraic statistics viewpoint. A reformulation of the MLE problem interms of dual varieties and conormal varieties will be given. With this description,the dual likelihood equations and the dual MLE problem are deﬁned. We show thatsolving the dual MLE problem yields solutions to the MLE problem, so we can solvethe MLE problem without ever determining the deﬁning equations of the model.

Maximum likelihood estimation (MLE) is a fundamental problem in statistics that hasbeen extensively studied from an algebraic viewpoint [3, 4, 5, 9, 10, 12]. We continueto follow an algebraic approach to MLE in this paper considering statistical models fordiscrete data in the probability simplex as irreducible varieties X in complex projectivespace P n .An algebraic statistical model X in P n will be deﬁned by the vanishing of homoge-neous polynomials in the unknowns p , p , . . . , p n . We assume that X is an irreduciblegenerically reduced variety. When the coordinates p , p , . . . , p n of a point p in X arepositive and sum to one, we interpret p as a probability distribution, where the prob-ability of observing event i is p i . We let u = ( u , u , . . . , u n ) ∈ ( C ∗ ) n +1 be a vectorof length n + 1 called data . When each entry u i of the vector is a positive integer weinterpret u i as the number of observations of event i . We use the notation u + := u + · · · + u n and p + := p + · · · + p n , always assuming u + = 0 . ∗ Department of Mathematics, University of California at Berkeley, Berkeley, CA 94720; [email protected]. u is deﬁned as l u ( p ) := p u p u · · · p u n n /p u + + . When u and p are interpreted as data and a probability distribution respectively, thelikelihood of observing u with respect to the distribution p is l u ( p ) divided by a multi-nomial coeﬃcient depending only on u .For ﬁxed data u , to determine local maxima of l u ( p ) on a statistical model and give asolution to the MLE problem, we determine all complex critical points of l u ( p ) restrictedto X . Of these critical points, we ﬁnd the one with positive coordinates and greatestlikelihood to determine the maximum likelihood estimator ˆ p . The (algebraic) maximumlikelihood estimation problem is solved by determining all critical points of l u ( p ) on X and maximizing l u ( p ) on this set.To ﬁnd the complex critical points, we determine when the gradient of l u ( p ) is or-thogonal to the tangent space of X at p . So the set of critical points is { p ∈ X reg such that ∇ l u ( p ) ⊥ T p X } . The gradient of the likelihood function equals h u p − u + p + , u p − u + p + , . . . , u n p n − u + p + i , up toscaling by l u ( p ) /p u + + . So the critical points of l u ( p ) are p ∈ X reg such that (cid:20) u p − u + p + , u p − u + p + , . . . , u n p n − u + p + (cid:21) ⊥ T p ( X ) , implicitly forcing the condition p p · · · p n ( p + · · · + p n ) = 0 . Deﬁnition 1.

Given an algebraic statistical model X in P n , the maximum likelihooddegree (ML degree) of X is the number of critical points of l u ( p ) restricted to X forgeneric choices of data u , MLdegree( X ) = { p ∈ X : ∇ l u ( p ) ⊥ T p ( X ) } . The main result of this paper is to give a formulation that relates maximum likelihoodestimation to a conormal variety derived from X [Theorem 4]. With this perspective,we use the dual likelihood equations [Theorem 8] to solve the MLE problem for X whenonly given the deﬁning equations of its dual variety X ∗ .The computations in this paper were done using Bertini [1] and

Macaulay2 [7].

In this section, we consider an algebraic statistical model X in P n and will deﬁne X ′ tobe an embedding of X in P n +1 . We will present our ﬁrst result in Theorem 4 giving a2ormulation of the MLE problem in terms of conormal varieties and dual varieties. InCorollary 1 we give a bijection between critical points of the likelihood function on twodiﬀerent varieties. In Corollary 2 we give equations to solve the MLE problem if we haveequations that deﬁne a conormal variety. We will also recall how to compute conormalvarieties and dual varieties of X and X ′ .Let X ⊂ P n be a codimension c algebraic statistical model deﬁned by homogenouspolynomials f , f , . . . , f k . We let Jac( X ) denote the k × ( n + 1) matrix of partialderivatives of f , . . . , f k with respect to p , . . . , p n , and we say this is the Jacobian of X .To keep track of the sum of the coordinates p , p , . . . , p n we introduce the coordinate p s and a hyperplane in P n +1 deﬁned by the vanishing of the polynomial H ( p ) := − p − p + · · · − p n + p s . (1)If X is deﬁned by f , . . . , f k then X ′ in the coordinates p , p , . . . , p n , p s is deﬁned bythe vanishing of f , . . . , f k and H . With this deﬁnition, we get the following proposition. Proposition 1. If X is deﬁned by the homogeneous polynomials f , f , . . . , f k then theJacobian of X ′ is given by the ( k + 1) × ( n + 2) -matrix Jac( X ′ ) =  − − · · · − X ) 0 ...  . The important fact about the construction of X ′ is that there is a bijection betweenthe critical points of the function l u ( p ) on X and the critical points of the monomial l ′ u ( p ) := p u p u · · · p u n n p − u + s on X ′ given by Lemma 2.By a slight abuse of notation the “ p ” in l u ( p ) and the “ p ” in l ′ u ( p ) represent twodiﬀerent things. The ﬁrst p represents a point [ p : p : · · · : p n ] ∈ X , while the secondrepresents a point [ p : p : · · · : p n : p s ] ∈ X ′ . Lemma 2.

There is a bijection between the critical points of the function l u ( p ) on X and the critical points of l ′ u ( p ) on X ′ . Under this bijection, [ p : p : · · · : p n ] ∈ P n is acritical point of l u ( p ) on X if and only if [ p : p : · · · : p n : p s ] ∈ P n +1 is a critical pointof l ′ u ( p ) on X ′ .Proof. To prove this we need to show that [ p : · · · : p n : p s ] ∈ X ′ reg satisﬁes ∇ l ′ u ( p ) ⊥ T p X ′ if and only if [ p : · · · : p n ] ∈ X reg satisﬁes ∇ l u ( p ) ⊥ T p X.

3y Proposition 1, it follows [ p : · · · : p n : p s ] ∈ X ′ reg if and only if [ p : · · · : p n ] ∈ X reg ,so it remains to show that ∇ l ′ u ( p ) ⊥ T p X ′ if and only if ∇ l u ( p ) ⊥ T p X . So we need toshow that ∇ l ′ u ( p ) being in the row space of Jac( X ′ ) implies that ∇ l u ( p ) is in the rowspace of Jac( X ) and vice versa. To see this, observe that (cid:20) ∇ l ′ u ( p )Jac ( X ′ ) (cid:21)  . . . · · ·  =  u p − u + p s u p − u + p s · · · u n p n − u + p s · · · − u + p s X ) 0 ...  Since p s = p + , we have completed the proof because the top row in the matrix above is h ∇ l u ( p ) , − u + p + i .The conormal variety of X is deﬁned to be the Zariski closure in P n × P n of the set N X := { ( p, q ) : q ⊥ T p X reg } . To determine the deﬁning equations of N X , we let M denote a ( k + 1) × ( n + 1) matrixthat is an extended Jacobian whose top row is [ q , q , . . . , q n ] and whose bottom rowsare Jac( X ) . The deﬁning equations of the conormal variety can be computed by takingthe ideal generated by f , . . . , f k and the ( c + 1) × ( c + 1) -minors of M and saturatingby the c × c -minors of Jac( X ) .The dual variety X ∗ is the projection of the conormal variety N X to the dual pro-jective space P n associated to the q -coordinates. To compute the equations of the dualvariety, one eliminates the unknowns p , p , . . . , p n from the equations deﬁning N X . Foradditional information on computing conormal varieties and dual varieties see [13].Since X ′ is contained in a hyperplane deﬁned by H , the dual variety of X ′ is knownto be a cone of X ∗ over the point h = [ − − · · · : − [6] (Proposition 1.1). So X ′∗ in P n +1 is given by X ′∗ = { [ q − b s : q − b s : · · · : q n − b s : b s ] : [ q : · · · : q n ] ∈ X ∗ } It is easy to go between the coordinates of X and coordinates of X ′ because therethere is birational map between these two varieties. But there does not have to be abirational map between X ∗ and X ′∗ . For this reason, the coordinates of the former are4n q , . . . , q n , and the coordinates of the latter are in b , . . . , b n , b s . Our notation is tolet q denote a point [ q : q : · · · : q n ] ∈ X ∗ and to let b denote a point [ b : b : · · · : b n : b s ] ∈ X ′∗ .The next proposition shows that given the deﬁning equations of X ∗ in the unknowns q , . . . , q n , we can determine the deﬁning equations of X ′∗ in the unknowns b , . . . , b n , b s using the relations q = b + b s , q = b + b s , . . . , q n = b n + b s . (2)Meaning, if g ( q , q , . . . , q n ) vanishes on X ∗ , then g ( b + b s , b + b s , . . . , b n + b s ) vanisheson X ′∗ . Moreover, given the Jacobian of X ∗ , we can easily determine the Jacobian of X ′∗ as well using the relations in (2). Proposition 2. If g ( q ) , . . . , g l ( q ) deﬁne the variety X ∗ ⊂ P n in coordinates q , q , . . . , q n ,then the deﬁning equations of X ′∗ in coordinates b , b , . . . , b n , b s are g ( b + b s , b + b s , . . . , b n + b s ) = 0 ... g l ( b + b s , b + b s , . . . , b n + b s ) = 0 . Moreover, the Jacobian of X ′∗ is given by Jac (cid:0) X ′∗ (cid:1) = Jac ( X ∗ ) | ( b + b s ,...,b n + b s )  .... . .

11 1  . Proof.

The ﬁrst part of proposition follows immediately from the relations in (2). By

Jac ( X ∗ ) | ( b + b s ,...,b n + b s ) we mean evaluate the Jacobian of X ∗ at ( b + b s , . . . b n + b s ) . Since the deﬁning equa-tions of X ′∗ are gotten by evaluating each g i ( q ) at ( b + b s , . . . b n + b s ) , it follows by thechain rule that Jac( X ′∗ ) equals the desired matrix product. Example 3.

Consider X in P , a variety deﬁned by f = 2 p p p + p p + p p − p p + p p p . The Jacobian of X and the deﬁning polynomial g ( q ) of the dual variety X ∗ are Jac( X ) = [2 p p − p p , p p + 2 p p + p + p p , p p + p + 2 p p + p p , − p + p p ] g ( q ) = q − q q q + 16 q q − q q +16 q q q + 16 q q q − q q q q . The variety X ′ is deﬁned by the two equations, f ( p ) = 0 and p s = p + p + · · · + p n , but the dual variety X ′∗ is deﬁned by one equation g ( b + b s , b + b s , b + b s , b + b s ) =( b + b s ) − b + b s ) ( b + b s )( b + b s )+16( b + b s ) ( b + b s ) − b + b s ) ( b + b s )+16( b + b s ) ( b + b s )( b + b s )+16( b + b s ) ( b + b s )( b + b s ) − b + b s )( b + b s )( b + b s )( b + b s ) . The Jacobian of X ∗ is  q − q q q − q ( q − q q − q q + q q ) − q q + 32 q q + 16 q q − q q q − q q + 32 q q + 16 q q − q q q − q + 16 q q + 16 q q − q q q  T . We get the Jacobian of X ′∗ by evaluating Jac( X ∗ ) at ( b + b s , . . . b n + b s ) and multiplying the evaluated Jac( X ∗ ) on the right by the matrix   . Now we are ready to state our ﬁrst result.

Theorem 4.

Fix an algebraic statistical model X . A point (cid:0) [ p : p : · · · : p n : p s ] , [ b : b : · · · : b n : b s ] (cid:1) ∈ N X ′ satisﬁes the relation [ p b : p b : · · · : p n b n : p s b s ] = [ u : u : · · · : u n : − u + ] if and only if [ p : p : · · · : p n : p s ] is a critical point of l ′ u ( p ) = p u p u · · · p u n n p − u + s on X ′ . roof. To determine critical points of l ′ u ( p ) on X ′ we ﬁnd when ∇ l ′ u ( p ) = (cid:2) ∂l ′ u /∂p : · · · : ∂l ′ u /∂p s (cid:3) is orthogonal to the tangent space of X ′ at the point p . This is the same as determin-ing when (cid:0) [ p : p : · · · : p s ] , ∇ l ′ u ( p ) (cid:1) ∈ N X ′ . As a point in projective space, we have ∇ l ′ u ( p ) = (cid:20) u p , . . . , u n p n , − u + p s (cid:21) whenever p p · · · p s = 0 . So we immediately have that a critical point of l ′ u ( p ) satisﬁesthe desired relations when we take the coordinate-wise product of [ p : p : · · · : p s ] and ∇ l ′ u ( p ) .With Lemma 2, Theorem 4 says that if [ p, b ] ∈ N X ′ and the coordinate-wise productof p and b is [ p b : · · · : p n b n : p s b s ] = [ u : · · · : u n : − u + ] , (3)then [ p : · · · : p n ] is a critical point of l u ( p ) on X . Deﬁnition 5.

The likelihood locus of X for the data u is deﬁned as the set of points in N X ′ satisfying the relations in (3), notated L X ( u ) . We deﬁne P X ( u ) and B X ( u ) to be P X ( u ) := { p : ( p, b ) ∈ L X ( u ) } and B X ( u ) := { b : ( p, b ) ∈ L X ( u ) } . For additional clariﬁcation, note that points in L X ( u ) are contained in the conormalvariety N X ′ ⊂ P n +1 × P n +1 . These points are expressed as ( p, b ) = (cid:0) [ p : p : · · · : p s ] , [ b : b : · · · : b s ] (cid:1) ∈ L X ( u ) . In regards to ML degree, we have for generic choices of u MLdegree( X ) = L X ( u ) = P X ( u ) = B X ( u ) . There are two corollaries to Theorem 4. The ﬁrst corollary gives a bijection betweencritical points of l ′ u ( p ) on X ′ and critical points of l ′ u ( b ) on X ′∗ . The second corollarygives equations to determine critical points of l ′ u ( p ) on X ′ . Corollary 1.

There is a bijection between critical points of l ′ u ( p ) on X ′ and criticalpoints of l ′ u ( b ) on X ′∗ given by [ p b : p b : · · · : p n b n : p s b s ] = [ u : u : · · · : u n : − u + ] . Moreover, the product l ′ u ( p ) l ′ u ( b ) remains constant over the set of critical points. roof. The ﬁrst part follows by noticing that the relation forces us to have [ p : p : · · · : p s ] = [ u /b : u /b : · · · : − u + /b s ] which is also the gradient of l ′ u ( b ) . The second part follows as l ′ u ( p ) l ′ u ( b ) = u u u u · · · u u n n ( − u + ) − u + . When u , . . . , u n are positive integers, the bijection in Corollary 1 pairs positivecritical points of l ′ u ( p ) ordered by increasing likelihood with positive critical points of l ′ u ( b ) ordered by decreasing likelihood! Example 6.

We will compute the ML degree of X in Example 3 to be . We ﬁx thedata vector ( u , u , u , u ) = (2 , , , , and determine the points of L X ( u ) p p p p p s . . . . − . . . . − . . . − . b b b b b s . . . . − − . . . . − − . . . − . − . The eliminants for p , p , p , and p are (100 p + 290 p + 74 p − , (62700 p − p + 314358 p − , (1900 p − p + 4886 p − , (62700 p + 447650 p − p + 136125) . The eliminants for b , b , b , b of L X ( u ) are (1680 b − b − b − , (34151040 b − b + 27271868 b − , (28800 b − b + 25100 b − , (272250 b − b + 223825 b + 15675) . Note that we are not saying that the ML degree of X equals the ML degree of X ∗ .In general, MLdegree( X ) = MLdegree( X ∗ ) . b + b + · · · + b n − b s does not vanish on X ′∗ . So there is no analogue of Lemma 2 involving X ′∗ and X ∗ .In terms of previous literature, one should think of Corollary 1 as a generalization ofTheorem 2 of [4]. Corollary 2.

Fix a point [ p, b ] of N X ′ such that p s b s = 0 . The following are equivalent:1. The point [ p, b ] is in L X ( u ) .2. For i = 0 , , , . . . , n , the point [ p, b ] satisﬁes u i p s b s = − u + p i b i .

3. There exists [ q : · · · : q n ] ∈ X ∗ such that for i = 0 , , . . . , nu i p s b s = − u + p i ( q i − b s ) . Proof.

It is immediate that parts 1 and 2 are equivalent. To see 2 and 3 are equivalent,recall q i = b i + b s for i = 0 , , . . . , n , from the deﬁnition of X ′∗ .A consequence of these equations is that it removes the need for saturation by p p · · · p n with Grobner basis computations that involve the likelihood equations when-ever the u i are nonzero. In addition, if we restrict to the aﬃne charts deﬁned by p s = 1 and b s = − u + , then the condition p s b s = 0 is immediately satisﬁed. In this section we will deﬁne a system of equations whose solutions are precisely B X ( u ) = { b : ( p, b ) ∈ L X ( u ) } . Once we know the set B X ( u ) , we determine the critical points of l u ( p ) = p u · · · p u n n /p u + + on X using Lemma 2 and Corollary 2. Concretely, if b ∈ B X ( u ) then [ p : · · · : p n ] = [ u /b : · · · : u n /b n ] is a critical point of l u ( p ) on X . For this reason we make the following deﬁnition. Deﬁnition 7.

The dual maximum likelihood estimation problem for the algebraic statis-tical model X and data u is to determine B X ( u ) , the set of critical points of l ′ u ( b ) on X ′∗ .9y Corollary 1, we ﬁnd the critical points of l ′ u ( b ) = b u b u · · · b u n n b − u + s on X ′∗ todetermine the set B X ( u ) . That is, we determine the points b ∈ X ′∗ such that the gradient ∇ l ′ u ( b ) = (cid:20) u b : u b : · · · : u n b n : − u + b s (cid:21) is orthogonal to the tangent space of X ′∗ at b .If X ∗ has codimension c , which also means X ′∗ has codimension c , then the duallikelihood equations are obtained by taking the sum of ideals generated by • the polynomials deﬁning X ′∗ , and • the ( c +1) × ( c +1) minors of an extended Jacobian multiplied by a diagonal matrixwith entries b , b , . . . , b n , b s , (cid:20) ∇ l u ( b )Jac ( X ′∗ ) (cid:21)  b . . . b s  , (4)and saturating by the product of two ideals, • the principal ideal generated by b b · · · b n b s , and • the ideal generated by the c × c -minors of Jac( X ′∗ ) . This gives us a formulation of the dual likelihood equations. Now we make some sim-pliﬁcations to these equations to get Theorem 8.By Euler’s relations of partial derivatives the columns of the matrix product in (4)are linearly dependent. Indeed the columns sum to zero, so we may drop the last columnof the product without changing the rank.By Proposition 2, if g ( q ) , . . . , g l ( q ) deﬁne the variety X ∗ , then the deﬁning equationsof X ′∗ are g ( b + b s , b + b s , . . . , b n + b s ) = 0 ... g l ( b + b s , b + b s , . . . , b n + b s ) = 0 . and the Jacobian of X ′∗ is Jac (cid:0) X ′∗ (cid:1) = Jac ( X ∗ ) | ( b + b s ,...b n + b s )  .... . .

11 1  . Since the last column of

Jac( X ′∗ ) is the sum of the ﬁrst columns, it follows the duallikelihood equations can be reformulated by the next theorem.10 heorem 8. Let g ( q ) , . . . , g l ( q ) deﬁne X ∗ ⊂ P n with codimension c . Then, B X ( u ) isvariety of the ideal calculated by taking the sum of the ideals generated by • g ( b + b s , . . . , b n + b s ) , . . . , g l ( b + b s , . . . , b n + b s ) and • the ( c + 1) × ( c + 1) minors of  u b u b . . . u n − b n − u n b n Jac ( X ∗ ) | ( b + b s ,...b n + b s )   b . . . b n  , (5) and saturating by the product of two ideals, • the principal ideal generated by b b · · · b n b s , and • the ideal of c × c -minors of Jac ( X ∗ ) | ( b + b s ,...b n + b s ) . The point of Theorem 8 is that the dual likelihood equations deﬁne a homogeneousideal in the polynomial ring C [ b , b , . . . , b n , b s ] whose variety is B X ( u ) , the set ofcritical points of l ′ u ( b ) on X ′∗ . Theorem 8 can be used to determine the ML degree of X because L X ( u ) = B X ( u ) .Since Theorem 8 is constructive, we express it below as an algorithm. Algorithm 9.

Suppose X ∗ in P n has codimension c . • Input:

Polynomials g ( q ) , g ( q ) , . . . , g l ( q ) deﬁning X ∗ , and a vector u ∈ N n +1 . • Output:

The ML degree of X . • Procedure:

Step 1. Let G q be the ideal generated by g ( q ) , . . . , g l ( q ) , and let G b be the idealobtained by substituting q , . . . , q n for b + b s , . . . , b n + b s , respectively, in theideal G q .Step 2. Let M b,u denote the ( c + 1) -minors of (5).Step 3. Let S b be the ideal generated by the c × c minors of Jac ( X ∗ ) | ( b + b s ,...b n + b s ) . Step 4. Let W b,u be the saturation ( M b,u + G b ) : ( b b · · · b n b s · S b ) ∞ Step 5. Return the degree of W b,u . 11 xample 10. Let X be deﬁned by f ( p ) = 4 p p − p in P . Then X ∗ is deﬁned by g ( q ) = q q − q in the P . So f ( p ) = det (cid:20) p p p p (cid:21) and g ( q ) = det (cid:20) q q q q (cid:21) . The ML degree of X is computed by taking the ideal generated by • g ( b + b s , b + b s , b + b s ) = ( b + b s )( b + b s ) − ( b + b s ) , and • × minors of (cid:20) u b u b u b ( b + b s ) − b + b s ) ( b + b s ) (cid:21)  b b b  and saturating by the product of two ideals • the principal ideal ( b b b b s ) and • the × minors of (cid:2) ( b + b s ) − b + b s ) ( b + b s ) (cid:3) . We ﬁnd that there is a unique critical point of l ′ u ( b ) on X ′∗ whose coordinates can bederived from the matrix equality b s (cid:20) b b b b (cid:21) = " u u + (2 u + u ) u u + u +2 u )(2 u + u )4 u u + u +2 u )(2 u + u ) 4 u u + (2 u + u ) . So by Corollary 2, the critical point of l u ( p ) on X is given by p s (cid:20) p p p p (cid:21) = 12 u (cid:20) (2 u + u )( u + 2 u ) (cid:21) (cid:20) (2 u + u )( u + 2 u ) (cid:21) T . In this section, we compare the standard formulation of solving the likelihood equations,Algorithm 6 of [10], to the dual formulation presented here, Algorithm 9. All compu-tations in this subsection were done on a 2.8 GHz Intel Core i7 MacBook Pro using12 acaulay2 . I = h q + 2 q + 3 q + 5 q i I = h q − q q , q q − q q , q − q q i I = h q + q + q + q i I = h q + 15 q + 10 q + 6 q i I = h q q − q q − q q + 18 q q q q − q q i I = h q + q q q − q i I = 2 × of  q , q , q q , q , q q , q , q  I = 2 × of  q , q , q q , q , q q , q , q  I = * det  q , q , q q , q , q q , q , q + The second column in the table below is a list of ML degrees of varieties whose dualvariety is given by the ﬁrst column. The third column is the time (in seconds) it takes tocalculate the ML degree using the standard formulation, while the fourth column is thetime (in seconds) it takes to calculate the ML degree using the dual likelihood equations. X ⋆ ML degree Standard Dual I

14 0 .

008 0 . I .

062 0 . I

57 166.872 1.447 I

14 0 .

038 0 . I .

017 0 . I

22 32 . . I

13 4 .

808 33 . I .

349 13 . I . . The most notable discrepancy is in row in bold. In this case, the ideal of X ∗ isgenerated by a cubic, but X is generated by a degree polynomial with terms. To calculate new ML degrees when X ∗ is not a complete intersection [Computation 12],we will work with an adjusted formulation of the dual likelihood equation. This for-mulation introduces codimension X ∗ auxiliary unknowns (Lagrange multipliers). Also,instead of working with every generator of the ideal of X ∗ , we work with codim( X ∗ ) X ∗ . Example 11.

Consider × × -tensors of the form [ p ijk ] with i, j, k, ∈ { , } . If X is the hyperdeterminant of these tensors, then X ∗ is deﬁned by the × minors of allﬂattenings of the tensor [ q ijk ] . The codimension of X ∗ is . The ﬂattenings belowdeﬁne X ∗ after saturating by q . g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q So by introducing auxiliary unknowns λ , λ , . . . , λ we create a square system of equations in the homogeneous variable groups ( b , . . . , b , b s ) and ( λ , . . . , λ ) . g = ( b + b s )( b + b s ) − ( b + b s )( b + b s ) g = ( b + b s )( b + b s ) − ( b + b s )( b + b s ) g = ( b + b s )( b + b s ) − ( b + b s )( b + b s ) g = ( b + b s )( b + b s ) − ( b + b s )( b + b s )[ λ , λ , λ , . . . , λ ] (cid:20) ∇ l ′ u ( b )Jac( g ) (cid:21)  b . . . b  = 0 . The solutions with λ b s = 0 give the critical points. We ﬁnd that there are criticalpoints of l ′ u ( b ) on X ∗ , agreeing with [5], page 53.The next example is a new computational result to determine the ML degree of ahyperdeterminant. Computation 12.

Let X denote the hyperdeterminant of × × tensors of the form [ p ijk ] for i ∈ { , } , j ∈ { , } , k ∈ { , , } . Then the ML degree of X is . Proof.

The variety X is dual to the variety X ∗ deﬁned by the × -minors of theﬂattenings of the × × tensor [ q ijk ] with i ∈ { , } , j ∈ { , } , k ∈ { , , } . Thevariety X ∗ has codimension , degree , and generators. We consider of the generators, g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q q we recover the dual variety X ∗ . We solve the followingsquare system of equations: the seven equations g ( b + b s , . . . , b n + b s ) = · · · = g ( b + b s , . . . , b n + b s ) = 0 and the equations [1 , λ , λ , . . . , λ ] M  b . . . b  = 0 , with M being (6) where ( q , . . . , q ) = ( b + b s , . . . , b + b s ) and u consisting ofrandom complex numbers (random positive integers) to determine the ML degree of X numerically (symbolically).  u b u b u b u b u b u b u b u b u b u b u b u b − q q q − q − q q q − q − q q q − q − q q q − q − q q q − q − q q q − q − q q q − q  (6)The ML degree was obtained using exact methods in Macaulay2 in 10,943 seconds andusing numerical methods in

Bertini in 5,796 seconds. Both computations were performed onthe UC Berkeley server apppsa which has four 16-core 2.3GHz AMD Opteron 6276 CPUs. TheBertini computation was done in parallel using 20 of the 64 cores.One could have attempted to compute the number using Algorithm 6 of [10]. However, todo so, we must have the deﬁning equations of X . We were not able to compute these equationsourselves, but the hyperdeterminant of × × tensors is listed on page 7 of [2]. This is a degree6 polynomial with 66 terms. We were unable to determine the with the standard likelihoodequations and with the Lagrange likelihood equations (page 4 of [8]). The next interesting case is when X is the hyperdeterminant of × × × tensors. Inthis case, X is deﬁned by a polynomial of degree in unknowns that has , , terms [11]. There is no way we will be able to eﬀectively write down the standardlikelihood equations for X . However, it’s dual variety X ∗ is a binomial ideal consistingof the × -minors of all of its ﬂattenings, and we may have a chance of solving the duallikelihood equations both numerically and symbolically.15 The Dual MLE Problem vsMaximum Likelihood Duality

In this section we introduce an example and show how the results presented in this paperﬁt in context with previous work on Maximum Likelihood Duality. In [4, 9] the notion ofMaximum likelihood duality (ML-duality) was introduced. ML-duality gave a bijectionbetween critical points of l u ( p ) on two diﬀerent statistical models. Deﬁnition 13.

A pair of algebraic statistical models X and Y in P n are said to be ML-dual if for generic u there is an involutive bijection between points of L X ( u ) andpoints of L Y ( u ) . Moreover, this bijection pairs points of L X ( u ) with points of L Y ( u ) such that the coordinate-wise product of each pair can be expressed in terms of the data u alone. Example 14.

Suppose r ≤ m ≤ n , and let V m,n,r denote the Zariski closure in P mn − of rank r matrices of the form  p p . . . p n p p ... . . . p m p mn  . Then V ∗ m,n,r is known to be the Zariski closure in P mn − of rank m − r matrices ofthe form  q q . . . q n q q ... . . . q m q mn  . Fix a choice of m, n, r . If we take X = V m,n,r , then points in X ′ will be represented as [ p ij : p s ] ∈ X ′ ⊂ P mn and points in X ′∗ will be represented as [ b ij : b s ] ∈ X ′∗ ⊂ P mn . With Corollary 1, it follows there is a bijection between P X ( u ) and B X ( u ) . On theother hand, by Theorem 1 in [4] we know that V m,n,r and V m,n,m − r are ML-dual. Thismeans if we take Y to be V m,n,m − r there is an involutive bijection between critical points L X ( u ) and L Y ( u ) for generic choices of u . In particular, the bijection is such that thecoordinate-wise product of the paired points is (cid:18)(cid:20) u i + u + j u ij u : 1 (cid:21) , (cid:20) u ij u ++ u i + u + j : 1 (cid:21)(cid:19) ∈ P mn × P mn . Here u ++ := P i,j u ij , u i + := P j u ij , and u + j := P i u ij , and likewise for p ++ , p i + , p + j .16 Conclusion

In this paper we have given an elegant formulation of the MLE problem involving conor-mal varieties. This formulation allows us to avoid the computation of saturation bythe product of unknowns. We also deﬁne the dual likelihood equations that allow us tocompute critical points on X even if we only know the deﬁning equations of its dual X ∗ [Algorithm 9]. The important feature of the dual likelihood equations comes from thefact that the deﬁning equations of X may be too diﬃcult to work with. In addition,we showed that if we solve the dual equations, we can recover the critical points on X [Theorem 4]. More broadly, we showed that if there is a bijection between criticalpoints of a function restricted to a variety and critical points of a monomial restrictedto a diﬀerent variety, then we can formulate a new set of “dual" equations to determinethese points. Acknowledgements

The author would like to thank Elizabeth Gross, Kim Laine, Zvi Rosen, and BerndSturmfels for their comments and suggestions to improve earlier versions of the paper.

References [1] D. J. Bates, J. D. Hauenstein, A. J. Sommese, and C. W. Wimpier. Bertini: Soft-ware for numerical algebraic geometry. Available at https://bertini.nd.edu/.[2] M. R. Bremner. A hyperdeterminant for 2 x 2 x 3 arrays. arXiv:1106.2988, 2011.[3] F. Catanese, S. Hoşten, A. Khetan, and B. Sturmfels. The maximum likelihooddegree.

Amer. J. Math. , 128(3):671–697, 2006.[4] J. Draisma and J. I. Rodriguez. Maximum likelihood duality for determinantalvarieties.

To appear in International Mathematics Research Notices , 2013.[5] M. Drton, B. Sturmfels, and S. Sullivant.

Lectures on algebraic statistics , volume 39of

Oberwolfach Seminars . Birkhäuser Verlag, Basel, 2009.[6] L. Ein. Varieties with small dual varieties, i.

Inventiones mathematicae

To appear in the Journal of Algebraic Statistics , 2013.[10] S. Hoşten, A. Khetan, and B. Sturmfels. Solving the likelihood equations.

Found.Comput. Math. , 5(4):389–407, 2005.[11] P. Huggins, B. Sturmfels, J. Yu, and D. S. Yuster. The hyperdeterminant andtriangulations of the 4-cube.

Math. Comput. , 77(263):1653–1679, 2008.[12] J. Huh. The maximum likelihood degree of a very aﬃne variety.

Compos. Math. ,149(8):1245–1266, 2013.[13] P. Rostalski and B. Sturmfels. Dualities. In

Semideﬁnite optimization and convexalgebraic geometry , volume 13 of