aa r X i v : . [ m a t h . S T ] M a y Maximum Likelihood for Dual Varieties
Jose Israel Rodriguez ∗
19 May 2014
Abstract
Maximum likelihood estimation (MLE) is a fundamental computational problemin statistics. In this paper, MLE for statistical models with discrete data is studiedfrom an algebraic statistics viewpoint. A reformulation of the MLE problem interms of dual varieties and conormal varieties will be given. With this description,the dual likelihood equations and the dual MLE problem are defined. We show thatsolving the dual MLE problem yields solutions to the MLE problem, so we can solvethe MLE problem without ever determining the defining equations of the model.
Maximum likelihood estimation (MLE) is a fundamental problem in statistics that hasbeen extensively studied from an algebraic viewpoint [3, 4, 5, 9, 10, 12]. We continueto follow an algebraic approach to MLE in this paper considering statistical models fordiscrete data in the probability simplex as irreducible varieties X in complex projectivespace P n .An algebraic statistical model X in P n will be defined by the vanishing of homoge-neous polynomials in the unknowns p , p , . . . , p n . We assume that X is an irreduciblegenerically reduced variety. When the coordinates p , p , . . . , p n of a point p in X arepositive and sum to one, we interpret p as a probability distribution, where the prob-ability of observing event i is p i . We let u = ( u , u , . . . , u n ) ∈ ( C ∗ ) n +1 be a vectorof length n + 1 called data . When each entry u i of the vector is a positive integer weinterpret u i as the number of observations of event i . We use the notation u + := u + · · · + u n and p + := p + · · · + p n , always assuming u + = 0 . ∗ Department of Mathematics, University of California at Berkeley, Berkeley, CA 94720; [email protected]. u is defined as l u ( p ) := p u p u · · · p u n n /p u + + . When u and p are interpreted as data and a probability distribution respectively, thelikelihood of observing u with respect to the distribution p is l u ( p ) divided by a multi-nomial coefficient depending only on u .For fixed data u , to determine local maxima of l u ( p ) on a statistical model and give asolution to the MLE problem, we determine all complex critical points of l u ( p ) restrictedto X . Of these critical points, we find the one with positive coordinates and greatestlikelihood to determine the maximum likelihood estimator ˆ p . The (algebraic) maximumlikelihood estimation problem is solved by determining all critical points of l u ( p ) on X and maximizing l u ( p ) on this set.To find the complex critical points, we determine when the gradient of l u ( p ) is or-thogonal to the tangent space of X at p . So the set of critical points is { p ∈ X reg such that ∇ l u ( p ) ⊥ T p X } . The gradient of the likelihood function equals h u p − u + p + , u p − u + p + , . . . , u n p n − u + p + i , up toscaling by l u ( p ) /p u + + . So the critical points of l u ( p ) are p ∈ X reg such that (cid:20) u p − u + p + , u p − u + p + , . . . , u n p n − u + p + (cid:21) ⊥ T p ( X ) , implicitly forcing the condition p p · · · p n ( p + · · · + p n ) = 0 . Definition 1.
Given an algebraic statistical model X in P n , the maximum likelihooddegree (ML degree) of X is the number of critical points of l u ( p ) restricted to X forgeneric choices of data u , MLdegree( X ) = { p ∈ X : ∇ l u ( p ) ⊥ T p ( X ) } . The main result of this paper is to give a formulation that relates maximum likelihoodestimation to a conormal variety derived from X [Theorem 4]. With this perspective,we use the dual likelihood equations [Theorem 8] to solve the MLE problem for X whenonly given the defining equations of its dual variety X ∗ .The computations in this paper were done using Bertini [1] and
Macaulay2 [7].
In this section, we consider an algebraic statistical model X in P n and will define X ′ tobe an embedding of X in P n +1 . We will present our first result in Theorem 4 giving a2ormulation of the MLE problem in terms of conormal varieties and dual varieties. InCorollary 1 we give a bijection between critical points of the likelihood function on twodifferent varieties. In Corollary 2 we give equations to solve the MLE problem if we haveequations that define a conormal variety. We will also recall how to compute conormalvarieties and dual varieties of X and X ′ .Let X ⊂ P n be a codimension c algebraic statistical model defined by homogenouspolynomials f , f , . . . , f k . We let Jac( X ) denote the k × ( n + 1) matrix of partialderivatives of f , . . . , f k with respect to p , . . . , p n , and we say this is the Jacobian of X .To keep track of the sum of the coordinates p , p , . . . , p n we introduce the coordinate p s and a hyperplane in P n +1 defined by the vanishing of the polynomial H ( p ) := − p − p + · · · − p n + p s . (1)If X is defined by f , . . . , f k then X ′ in the coordinates p , p , . . . , p n , p s is defined bythe vanishing of f , . . . , f k and H . With this definition, we get the following proposition. Proposition 1. If X is defined by the homogeneous polynomials f , f , . . . , f k then theJacobian of X ′ is given by the ( k + 1) × ( n + 2) -matrix Jac( X ′ ) = − − · · · − X ) 0 ... . The important fact about the construction of X ′ is that there is a bijection betweenthe critical points of the function l u ( p ) on X and the critical points of the monomial l ′ u ( p ) := p u p u · · · p u n n p − u + s on X ′ given by Lemma 2.By a slight abuse of notation the “ p ” in l u ( p ) and the “ p ” in l ′ u ( p ) represent twodifferent things. The first p represents a point [ p : p : · · · : p n ] ∈ X , while the secondrepresents a point [ p : p : · · · : p n : p s ] ∈ X ′ . Lemma 2.
There is a bijection between the critical points of the function l u ( p ) on X and the critical points of l ′ u ( p ) on X ′ . Under this bijection, [ p : p : · · · : p n ] ∈ P n is acritical point of l u ( p ) on X if and only if [ p : p : · · · : p n : p s ] ∈ P n +1 is a critical pointof l ′ u ( p ) on X ′ .Proof. To prove this we need to show that [ p : · · · : p n : p s ] ∈ X ′ reg satisfies ∇ l ′ u ( p ) ⊥ T p X ′ if and only if [ p : · · · : p n ] ∈ X reg satisfies ∇ l u ( p ) ⊥ T p X.
3y Proposition 1, it follows [ p : · · · : p n : p s ] ∈ X ′ reg if and only if [ p : · · · : p n ] ∈ X reg ,so it remains to show that ∇ l ′ u ( p ) ⊥ T p X ′ if and only if ∇ l u ( p ) ⊥ T p X . So we need toshow that ∇ l ′ u ( p ) being in the row space of Jac( X ′ ) implies that ∇ l u ( p ) is in the rowspace of Jac( X ) and vice versa. To see this, observe that (cid:20) ∇ l ′ u ( p )Jac ( X ′ ) (cid:21) . . . · · · = u p − u + p s u p − u + p s · · · u n p n − u + p s · · · − u + p s X ) 0 ... Since p s = p + , we have completed the proof because the top row in the matrix above is h ∇ l u ( p ) , − u + p + i .The conormal variety of X is defined to be the Zariski closure in P n × P n of the set N X := { ( p, q ) : q ⊥ T p X reg } . To determine the defining equations of N X , we let M denote a ( k + 1) × ( n + 1) matrixthat is an extended Jacobian whose top row is [ q , q , . . . , q n ] and whose bottom rowsare Jac( X ) . The defining equations of the conormal variety can be computed by takingthe ideal generated by f , . . . , f k and the ( c + 1) × ( c + 1) -minors of M and saturatingby the c × c -minors of Jac( X ) .The dual variety X ∗ is the projection of the conormal variety N X to the dual pro-jective space P n associated to the q -coordinates. To compute the equations of the dualvariety, one eliminates the unknowns p , p , . . . , p n from the equations defining N X . Foradditional information on computing conormal varieties and dual varieties see [13].Since X ′ is contained in a hyperplane defined by H , the dual variety of X ′ is knownto be a cone of X ∗ over the point h = [ − − · · · : − [6] (Proposition 1.1). So X ′∗ in P n +1 is given by X ′∗ = { [ q − b s : q − b s : · · · : q n − b s : b s ] : [ q : · · · : q n ] ∈ X ∗ } It is easy to go between the coordinates of X and coordinates of X ′ because therethere is birational map between these two varieties. But there does not have to be abirational map between X ∗ and X ′∗ . For this reason, the coordinates of the former are4n q , . . . , q n , and the coordinates of the latter are in b , . . . , b n , b s . Our notation is tolet q denote a point [ q : q : · · · : q n ] ∈ X ∗ and to let b denote a point [ b : b : · · · : b n : b s ] ∈ X ′∗ .The next proposition shows that given the defining equations of X ∗ in the unknowns q , . . . , q n , we can determine the defining equations of X ′∗ in the unknowns b , . . . , b n , b s using the relations q = b + b s , q = b + b s , . . . , q n = b n + b s . (2)Meaning, if g ( q , q , . . . , q n ) vanishes on X ∗ , then g ( b + b s , b + b s , . . . , b n + b s ) vanisheson X ′∗ . Moreover, given the Jacobian of X ∗ , we can easily determine the Jacobian of X ′∗ as well using the relations in (2). Proposition 2. If g ( q ) , . . . , g l ( q ) define the variety X ∗ ⊂ P n in coordinates q , q , . . . , q n ,then the defining equations of X ′∗ in coordinates b , b , . . . , b n , b s are g ( b + b s , b + b s , . . . , b n + b s ) = 0 ... g l ( b + b s , b + b s , . . . , b n + b s ) = 0 . Moreover, the Jacobian of X ′∗ is given by Jac (cid:0) X ′∗ (cid:1) = Jac ( X ∗ ) | ( b + b s ,...,b n + b s ) .... . .
11 1 . Proof.
The first part of proposition follows immediately from the relations in (2). By
Jac ( X ∗ ) | ( b + b s ,...,b n + b s ) we mean evaluate the Jacobian of X ∗ at ( b + b s , . . . b n + b s ) . Since the defining equa-tions of X ′∗ are gotten by evaluating each g i ( q ) at ( b + b s , . . . b n + b s ) , it follows by thechain rule that Jac( X ′∗ ) equals the desired matrix product. Example 3.
Consider X in P , a variety defined by f = 2 p p p + p p + p p − p p + p p p . The Jacobian of X and the defining polynomial g ( q ) of the dual variety X ∗ are Jac( X ) = [2 p p − p p , p p + 2 p p + p + p p , p p + p + 2 p p + p p , − p + p p ] g ( q ) = q − q q q + 16 q q − q q +16 q q q + 16 q q q − q q q q . The variety X ′ is defined by the two equations, f ( p ) = 0 and p s = p + p + · · · + p n , but the dual variety X ′∗ is defined by one equation g ( b + b s , b + b s , b + b s , b + b s ) =( b + b s ) − b + b s ) ( b + b s )( b + b s )+16( b + b s ) ( b + b s ) − b + b s ) ( b + b s )+16( b + b s ) ( b + b s )( b + b s )+16( b + b s ) ( b + b s )( b + b s ) − b + b s )( b + b s )( b + b s )( b + b s ) . The Jacobian of X ∗ is q − q q q − q ( q − q q − q q + q q ) − q q + 32 q q + 16 q q − q q q − q q + 32 q q + 16 q q − q q q − q + 16 q q + 16 q q − q q q T . We get the Jacobian of X ′∗ by evaluating Jac( X ∗ ) at ( b + b s , . . . b n + b s ) and multiplying the evaluated Jac( X ∗ ) on the right by the matrix . Now we are ready to state our first result.
Theorem 4.
Fix an algebraic statistical model X . A point (cid:0) [ p : p : · · · : p n : p s ] , [ b : b : · · · : b n : b s ] (cid:1) ∈ N X ′ satisfies the relation [ p b : p b : · · · : p n b n : p s b s ] = [ u : u : · · · : u n : − u + ] if and only if [ p : p : · · · : p n : p s ] is a critical point of l ′ u ( p ) = p u p u · · · p u n n p − u + s on X ′ . roof. To determine critical points of l ′ u ( p ) on X ′ we find when ∇ l ′ u ( p ) = (cid:2) ∂l ′ u /∂p : · · · : ∂l ′ u /∂p s (cid:3) is orthogonal to the tangent space of X ′ at the point p . This is the same as determin-ing when (cid:0) [ p : p : · · · : p s ] , ∇ l ′ u ( p ) (cid:1) ∈ N X ′ . As a point in projective space, we have ∇ l ′ u ( p ) = (cid:20) u p , . . . , u n p n , − u + p s (cid:21) whenever p p · · · p s = 0 . So we immediately have that a critical point of l ′ u ( p ) satisfiesthe desired relations when we take the coordinate-wise product of [ p : p : · · · : p s ] and ∇ l ′ u ( p ) .With Lemma 2, Theorem 4 says that if [ p, b ] ∈ N X ′ and the coordinate-wise productof p and b is [ p b : · · · : p n b n : p s b s ] = [ u : · · · : u n : − u + ] , (3)then [ p : · · · : p n ] is a critical point of l u ( p ) on X . Definition 5.
The likelihood locus of X for the data u is defined as the set of points in N X ′ satisfying the relations in (3), notated L X ( u ) . We define P X ( u ) and B X ( u ) to be P X ( u ) := { p : ( p, b ) ∈ L X ( u ) } and B X ( u ) := { b : ( p, b ) ∈ L X ( u ) } . For additional clarification, note that points in L X ( u ) are contained in the conormalvariety N X ′ ⊂ P n +1 × P n +1 . These points are expressed as ( p, b ) = (cid:0) [ p : p : · · · : p s ] , [ b : b : · · · : b s ] (cid:1) ∈ L X ( u ) . In regards to ML degree, we have for generic choices of u MLdegree( X ) = L X ( u ) = P X ( u ) = B X ( u ) . There are two corollaries to Theorem 4. The first corollary gives a bijection betweencritical points of l ′ u ( p ) on X ′ and critical points of l ′ u ( b ) on X ′∗ . The second corollarygives equations to determine critical points of l ′ u ( p ) on X ′ . Corollary 1.
There is a bijection between critical points of l ′ u ( p ) on X ′ and criticalpoints of l ′ u ( b ) on X ′∗ given by [ p b : p b : · · · : p n b n : p s b s ] = [ u : u : · · · : u n : − u + ] . Moreover, the product l ′ u ( p ) l ′ u ( b ) remains constant over the set of critical points. roof. The first part follows by noticing that the relation forces us to have [ p : p : · · · : p s ] = [ u /b : u /b : · · · : − u + /b s ] which is also the gradient of l ′ u ( b ) . The second part follows as l ′ u ( p ) l ′ u ( b ) = u u u u · · · u u n n ( − u + ) − u + . When u , . . . , u n are positive integers, the bijection in Corollary 1 pairs positivecritical points of l ′ u ( p ) ordered by increasing likelihood with positive critical points of l ′ u ( b ) ordered by decreasing likelihood! Example 6.
We will compute the ML degree of X in Example 3 to be . We fix thedata vector ( u , u , u , u ) = (2 , , , , and determine the points of L X ( u ) p p p p p s . . . . − . . . . − . . . − . b b b b b s . . . . − − . . . . − − . . . − . − . The eliminants for p , p , p , and p are (100 p + 290 p + 74 p − , (62700 p − p + 314358 p − , (1900 p − p + 4886 p − , (62700 p + 447650 p − p + 136125) . The eliminants for b , b , b , b of L X ( u ) are (1680 b − b − b − , (34151040 b − b + 27271868 b − , (28800 b − b + 25100 b − , (272250 b − b + 223825 b + 15675) . Note that we are not saying that the ML degree of X equals the ML degree of X ∗ .In general, MLdegree( X ) = MLdegree( X ∗ ) . b + b + · · · + b n − b s does not vanish on X ′∗ . So there is no analogue of Lemma 2 involving X ′∗ and X ∗ .In terms of previous literature, one should think of Corollary 1 as a generalization ofTheorem 2 of [4]. Corollary 2.
Fix a point [ p, b ] of N X ′ such that p s b s = 0 . The following are equivalent:1. The point [ p, b ] is in L X ( u ) .2. For i = 0 , , , . . . , n , the point [ p, b ] satisfies u i p s b s = − u + p i b i .
3. There exists [ q : · · · : q n ] ∈ X ∗ such that for i = 0 , , . . . , nu i p s b s = − u + p i ( q i − b s ) . Proof.
It is immediate that parts 1 and 2 are equivalent. To see 2 and 3 are equivalent,recall q i = b i + b s for i = 0 , , . . . , n , from the definition of X ′∗ .A consequence of these equations is that it removes the need for saturation by p p · · · p n with Grobner basis computations that involve the likelihood equations when-ever the u i are nonzero. In addition, if we restrict to the affine charts defined by p s = 1 and b s = − u + , then the condition p s b s = 0 is immediately satisfied. In this section we will define a system of equations whose solutions are precisely B X ( u ) = { b : ( p, b ) ∈ L X ( u ) } . Once we know the set B X ( u ) , we determine the critical points of l u ( p ) = p u · · · p u n n /p u + + on X using Lemma 2 and Corollary 2. Concretely, if b ∈ B X ( u ) then [ p : · · · : p n ] = [ u /b : · · · : u n /b n ] is a critical point of l u ( p ) on X . For this reason we make the following definition. Definition 7.
The dual maximum likelihood estimation problem for the algebraic statis-tical model X and data u is to determine B X ( u ) , the set of critical points of l ′ u ( b ) on X ′∗ .9y Corollary 1, we find the critical points of l ′ u ( b ) = b u b u · · · b u n n b − u + s on X ′∗ todetermine the set B X ( u ) . That is, we determine the points b ∈ X ′∗ such that the gradient ∇ l ′ u ( b ) = (cid:20) u b : u b : · · · : u n b n : − u + b s (cid:21) is orthogonal to the tangent space of X ′∗ at b .If X ∗ has codimension c , which also means X ′∗ has codimension c , then the duallikelihood equations are obtained by taking the sum of ideals generated by • the polynomials defining X ′∗ , and • the ( c +1) × ( c +1) minors of an extended Jacobian multiplied by a diagonal matrixwith entries b , b , . . . , b n , b s , (cid:20) ∇ l u ( b )Jac ( X ′∗ ) (cid:21) b . . . b s , (4)and saturating by the product of two ideals, • the principal ideal generated by b b · · · b n b s , and • the ideal generated by the c × c -minors of Jac( X ′∗ ) . This gives us a formulation of the dual likelihood equations. Now we make some sim-plifications to these equations to get Theorem 8.By Euler’s relations of partial derivatives the columns of the matrix product in (4)are linearly dependent. Indeed the columns sum to zero, so we may drop the last columnof the product without changing the rank.By Proposition 2, if g ( q ) , . . . , g l ( q ) define the variety X ∗ , then the defining equationsof X ′∗ are g ( b + b s , b + b s , . . . , b n + b s ) = 0 ... g l ( b + b s , b + b s , . . . , b n + b s ) = 0 . and the Jacobian of X ′∗ is Jac (cid:0) X ′∗ (cid:1) = Jac ( X ∗ ) | ( b + b s ,...b n + b s ) .... . .
11 1 . Since the last column of
Jac( X ′∗ ) is the sum of the first columns, it follows the duallikelihood equations can be reformulated by the next theorem.10 heorem 8. Let g ( q ) , . . . , g l ( q ) define X ∗ ⊂ P n with codimension c . Then, B X ( u ) isvariety of the ideal calculated by taking the sum of the ideals generated by • g ( b + b s , . . . , b n + b s ) , . . . , g l ( b + b s , . . . , b n + b s ) and • the ( c + 1) × ( c + 1) minors of u b u b . . . u n − b n − u n b n Jac ( X ∗ ) | ( b + b s ,...b n + b s ) b . . . b n , (5) and saturating by the product of two ideals, • the principal ideal generated by b b · · · b n b s , and • the ideal of c × c -minors of Jac ( X ∗ ) | ( b + b s ,...b n + b s ) . The point of Theorem 8 is that the dual likelihood equations define a homogeneousideal in the polynomial ring C [ b , b , . . . , b n , b s ] whose variety is B X ( u ) , the set ofcritical points of l ′ u ( b ) on X ′∗ . Theorem 8 can be used to determine the ML degree of X because L X ( u ) = B X ( u ) .Since Theorem 8 is constructive, we express it below as an algorithm. Algorithm 9.
Suppose X ∗ in P n has codimension c . • Input:
Polynomials g ( q ) , g ( q ) , . . . , g l ( q ) defining X ∗ , and a vector u ∈ N n +1 . • Output:
The ML degree of X . • Procedure:
Step 1. Let G q be the ideal generated by g ( q ) , . . . , g l ( q ) , and let G b be the idealobtained by substituting q , . . . , q n for b + b s , . . . , b n + b s , respectively, in theideal G q .Step 2. Let M b,u denote the ( c + 1) -minors of (5).Step 3. Let S b be the ideal generated by the c × c minors of Jac ( X ∗ ) | ( b + b s ,...b n + b s ) . Step 4. Let W b,u be the saturation ( M b,u + G b ) : ( b b · · · b n b s · S b ) ∞ Step 5. Return the degree of W b,u . 11 xample 10. Let X be defined by f ( p ) = 4 p p − p in P . Then X ∗ is defined by g ( q ) = q q − q in the P . So f ( p ) = det (cid:20) p p p p (cid:21) and g ( q ) = det (cid:20) q q q q (cid:21) . The ML degree of X is computed by taking the ideal generated by • g ( b + b s , b + b s , b + b s ) = ( b + b s )( b + b s ) − ( b + b s ) , and • × minors of (cid:20) u b u b u b ( b + b s ) − b + b s ) ( b + b s ) (cid:21) b b b and saturating by the product of two ideals • the principal ideal ( b b b b s ) and • the × minors of (cid:2) ( b + b s ) − b + b s ) ( b + b s ) (cid:3) . We find that there is a unique critical point of l ′ u ( b ) on X ′∗ whose coordinates can bederived from the matrix equality b s (cid:20) b b b b (cid:21) = " u u + (2 u + u ) u u + u +2 u )(2 u + u )4 u u + u +2 u )(2 u + u ) 4 u u + (2 u + u ) . So by Corollary 2, the critical point of l u ( p ) on X is given by p s (cid:20) p p p p (cid:21) = 12 u (cid:20) (2 u + u )( u + 2 u ) (cid:21) (cid:20) (2 u + u )( u + 2 u ) (cid:21) T . In this section, we compare the standard formulation of solving the likelihood equations,Algorithm 6 of [10], to the dual formulation presented here, Algorithm 9. All compu-tations in this subsection were done on a 2.8 GHz Intel Core i7 MacBook Pro using12 acaulay2 . I = h q + 2 q + 3 q + 5 q i I = h q − q q , q q − q q , q − q q i I = h q + q + q + q i I = h q + 15 q + 10 q + 6 q i I = h q q − q q − q q + 18 q q q q − q q i I = h q + q q q − q i I = 2 × of q , q , q q , q , q q , q , q I = 2 × of q , q , q q , q , q q , q , q I = * det q , q , q q , q , q q , q , q + The second column in the table below is a list of ML degrees of varieties whose dualvariety is given by the first column. The third column is the time (in seconds) it takes tocalculate the ML degree using the standard formulation, while the fourth column is thetime (in seconds) it takes to calculate the ML degree using the dual likelihood equations. X ⋆ ML degree Standard Dual I
14 0 .
008 0 . I .
062 0 . I
57 166.872 1.447 I
14 0 .
038 0 . I .
017 0 . I
22 32 . . I
13 4 .
808 33 . I .
349 13 . I . . The most notable discrepancy is in row in bold. In this case, the ideal of X ∗ isgenerated by a cubic, but X is generated by a degree polynomial with terms. To calculate new ML degrees when X ∗ is not a complete intersection [Computation 12],we will work with an adjusted formulation of the dual likelihood equation. This for-mulation introduces codimension X ∗ auxiliary unknowns (Lagrange multipliers). Also,instead of working with every generator of the ideal of X ∗ , we work with codim( X ∗ ) X ∗ . Example 11.
Consider × × -tensors of the form [ p ijk ] with i, j, k, ∈ { , } . If X is the hyperdeterminant of these tensors, then X ∗ is defined by the × minors of allflattenings of the tensor [ q ijk ] . The codimension of X ∗ is . The flattenings belowdefine X ∗ after saturating by q . g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q So by introducing auxiliary unknowns λ , λ , . . . , λ we create a square system of equations in the homogeneous variable groups ( b , . . . , b , b s ) and ( λ , . . . , λ ) . g = ( b + b s )( b + b s ) − ( b + b s )( b + b s ) g = ( b + b s )( b + b s ) − ( b + b s )( b + b s ) g = ( b + b s )( b + b s ) − ( b + b s )( b + b s ) g = ( b + b s )( b + b s ) − ( b + b s )( b + b s )[ λ , λ , λ , . . . , λ ] (cid:20) ∇ l ′ u ( b )Jac( g ) (cid:21) b . . . b = 0 . The solutions with λ b s = 0 give the critical points. We find that there are criticalpoints of l ′ u ( b ) on X ∗ , agreeing with [5], page 53.The next example is a new computational result to determine the ML degree of ahyperdeterminant. Computation 12.
Let X denote the hyperdeterminant of × × tensors of the form [ p ijk ] for i ∈ { , } , j ∈ { , } , k ∈ { , , } . Then the ML degree of X is . Proof.
The variety X is dual to the variety X ∗ defined by the × -minors of theflattenings of the × × tensor [ q ijk ] with i ∈ { , } , j ∈ { , } , k ∈ { , , } . Thevariety X ∗ has codimension , degree , and generators. We consider of the generators, g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q g ( q ) = q q − q q q we recover the dual variety X ∗ . We solve the followingsquare system of equations: the seven equations g ( b + b s , . . . , b n + b s ) = · · · = g ( b + b s , . . . , b n + b s ) = 0 and the equations [1 , λ , λ , . . . , λ ] M b . . . b = 0 , with M being (6) where ( q , . . . , q ) = ( b + b s , . . . , b + b s ) and u consisting ofrandom complex numbers (random positive integers) to determine the ML degree of X numerically (symbolically). u b u b u b u b u b u b u b u b u b u b u b u b − q q q − q − q q q − q − q q q − q − q q q − q − q q q − q − q q q − q − q q q − q (6)The ML degree was obtained using exact methods in Macaulay2 in 10,943 seconds andusing numerical methods in
Bertini in 5,796 seconds. Both computations were performed onthe UC Berkeley server apppsa which has four 16-core 2.3GHz AMD Opteron 6276 CPUs. TheBertini computation was done in parallel using 20 of the 64 cores.One could have attempted to compute the number using Algorithm 6 of [10]. However, todo so, we must have the defining equations of X . We were not able to compute these equationsourselves, but the hyperdeterminant of × × tensors is listed on page 7 of [2]. This is a degree6 polynomial with 66 terms. We were unable to determine the with the standard likelihoodequations and with the Lagrange likelihood equations (page 4 of [8]). The next interesting case is when X is the hyperdeterminant of × × × tensors. Inthis case, X is defined by a polynomial of degree in unknowns that has , , terms [11]. There is no way we will be able to effectively write down the standardlikelihood equations for X . However, it’s dual variety X ∗ is a binomial ideal consistingof the × -minors of all of its flattenings, and we may have a chance of solving the duallikelihood equations both numerically and symbolically.15 The Dual MLE Problem vsMaximum Likelihood Duality
In this section we introduce an example and show how the results presented in this paperfit in context with previous work on Maximum Likelihood Duality. In [4, 9] the notion ofMaximum likelihood duality (ML-duality) was introduced. ML-duality gave a bijectionbetween critical points of l u ( p ) on two different statistical models. Definition 13.
A pair of algebraic statistical models X and Y in P n are said to be ML-dual if for generic u there is an involutive bijection between points of L X ( u ) andpoints of L Y ( u ) . Moreover, this bijection pairs points of L X ( u ) with points of L Y ( u ) such that the coordinate-wise product of each pair can be expressed in terms of the data u alone. Example 14.
Suppose r ≤ m ≤ n , and let V m,n,r denote the Zariski closure in P mn − of rank r matrices of the form p p . . . p n p p ... . . . p m p mn . Then V ∗ m,n,r is known to be the Zariski closure in P mn − of rank m − r matrices ofthe form q q . . . q n q q ... . . . q m q mn . Fix a choice of m, n, r . If we take X = V m,n,r , then points in X ′ will be represented as [ p ij : p s ] ∈ X ′ ⊂ P mn and points in X ′∗ will be represented as [ b ij : b s ] ∈ X ′∗ ⊂ P mn . With Corollary 1, it follows there is a bijection between P X ( u ) and B X ( u ) . On theother hand, by Theorem 1 in [4] we know that V m,n,r and V m,n,m − r are ML-dual. Thismeans if we take Y to be V m,n,m − r there is an involutive bijection between critical points L X ( u ) and L Y ( u ) for generic choices of u . In particular, the bijection is such that thecoordinate-wise product of the paired points is (cid:18)(cid:20) u i + u + j u ij u : 1 (cid:21) , (cid:20) u ij u ++ u i + u + j : 1 (cid:21)(cid:19) ∈ P mn × P mn . Here u ++ := P i,j u ij , u i + := P j u ij , and u + j := P i u ij , and likewise for p ++ , p i + , p + j .16 Conclusion
In this paper we have given an elegant formulation of the MLE problem involving conor-mal varieties. This formulation allows us to avoid the computation of saturation bythe product of unknowns. We also define the dual likelihood equations that allow us tocompute critical points on X even if we only know the defining equations of its dual X ∗ [Algorithm 9]. The important feature of the dual likelihood equations comes from thefact that the defining equations of X may be too difficult to work with. In addition,we showed that if we solve the dual equations, we can recover the critical points on X [Theorem 4]. More broadly, we showed that if there is a bijection between criticalpoints of a function restricted to a variety and critical points of a monomial restrictedto a different variety, then we can formulate a new set of “dual" equations to determinethese points. Acknowledgements
The author would like to thank Elizabeth Gross, Kim Laine, Zvi Rosen, and BerndSturmfels for their comments and suggestions to improve earlier versions of the paper.
References [1] D. J. Bates, J. D. Hauenstein, A. J. Sommese, and C. W. Wimpier. Bertini: Soft-ware for numerical algebraic geometry. Available at https://bertini.nd.edu/.[2] M. R. Bremner. A hyperdeterminant for 2 x 2 x 3 arrays. arXiv:1106.2988, 2011.[3] F. Catanese, S. Hoşten, A. Khetan, and B. Sturmfels. The maximum likelihooddegree.
Amer. J. Math. , 128(3):671–697, 2006.[4] J. Draisma and J. I. Rodriguez. Maximum likelihood duality for determinantalvarieties.
To appear in International Mathematics Research Notices , 2013.[5] M. Drton, B. Sturmfels, and S. Sullivant.
Lectures on algebraic statistics , volume 39of
Oberwolfach Seminars . Birkhäuser Verlag, Basel, 2009.[6] L. Ein. Varieties with small dual varieties, i.
Inventiones mathematicae
To appear in the Journal of Algebraic Statistics , 2013.[10] S. Hoşten, A. Khetan, and B. Sturmfels. Solving the likelihood equations.
Found.Comput. Math. , 5(4):389–407, 2005.[11] P. Huggins, B. Sturmfels, J. Yu, and D. S. Yuster. The hyperdeterminant andtriangulations of the 4-cube.
Math. Comput. , 77(263):1653–1679, 2008.[12] J. Huh. The maximum likelihood degree of a very affine variety.
Compos. Math. ,149(8):1245–1266, 2013.[13] P. Rostalski and B. Sturmfels. Dualities. In
Semidefinite optimization and convexalgebraic geometry , volume 13 of