[PDF] Commutative Algebra of Statistical Ranking

Abstract

A model for statistical ranking is a family of probability distributions whose states are orderings of a fixed finite set of items. We represent the orderings as maximal chains in a graded poset. The most widely used ranking models are parameterized by rational function in the model parameters, so they define algebraic varieties. We study these varieties from the perspective of combinatorial commutative algebra. One of our models, the Plackett-Luce model, is non-toric. Five others are toric: the Birkhoff model, the ascending model, the Csiszar model, the inversion model, and the Bradley-Terry model. For these models we examine the toric algebra, its lattice polytope, and its Markov basis.

Full PDF

CCOMMUTATIVE ALGEBRA OF STATISTICAL RANKING

BERND STURMFELS AND VOLKMAR WELKER

Abstract.

A model for statistical ranking is a family of probability distributions whosestates are orderings of a ﬁxed ﬁnite set of items. We represent the orderings as maximalchains in a graded poset. The most widely used ranking models are parameterized byrational function in the model parameters, so they deﬁne algebraic varieties. We studythese varieties from the perspective of combinatorial commutative algebra. One of ourmodels, the Plackett-Luce model, is non-toric. Five others are toric: the Birkhoﬀ model,the ascending model, the Csiszár model, the inversion model, and the Bradley-Terry model.For these models we examine the toric algebra, its lattice polytope, and its Markov basis. Introduction

A statistical model for ranked data is a family M of probability distribution on thesymmetric group S n . Each distribution p ( θ ) in M depends on some model parameters θ and it associates a probability p π ( θ ) to each permutation π of [ n ] = { , , . . . , n } . Thus themodel M is a parametrized subset of the ( n ! − -dimensional standard simplex ∆ S n .In algebraic statistics, one assumes that the probabilities p π ( θ ) are rational functionsin the model parameters θ , so that M is a semi-algebraic set in ∆ S n , and one aims tocharacterize the prime ideal I M of polynomials that vanish on M . In fact, one of theorigins of the ﬁeld was the spectral analysis for permutation data described by Diaconisand Sturmfels in [12, §6.1]. The corresponding Birkhoﬀ model M is the toric variety of theBirkhoﬀ polytope. This polytope consists of all bistochastic matrices and it is the convexhull of all n × n permutation matrices. There has been a considerable amount of researchon the geometric invariants of the Birkhoﬀ model M . The simplest such invariant is itsdimension, dim( M ) = ( n − . The degree of M is the normalized volume of the Birkhoﬀpolytope, a topic of independent interest in combinatorics [6]. Diaconis and Eriksson [11]conjectured that the Markov basis of the Birkhoﬀ model consists of binomials of degree ≤ .Besides the Birkhoﬀ model, there are many other models for ranked data that are bothrelevant for statistical analysis and have an interesting algebraic structure. It is the objec-tive of this article to conduct a comparative study of such models from the perspectives ofcommutative algebra and geometric combinatorics. Both toric models and non-toric modelsare of interest. The former include the models introduced by Csiszár [9, 10], and the latterinclude the Plackett-Luce model [8, 24, 29] and the generalized Bradley-Terry models [21].The organization of this paper is as follows. In Section 2 we give an informal introductionto all our models. We write out formulas for the probabilities for the six permutations of n = 3 items, and we discuss the subsets they parametrize in the -dimensional simplex ∆ S .Precise formal deﬁnitions for the four toric models are given in Section 3. We represent the a r X i v : . [ m a t h . A C ] D ec Bernd Sturmfels and Volkmar Welker states as maximal chains in a graded poset Q . Typically, Q is the distributive lattice inducedby some order constraints on the n items to be ranked. If there are no such constraintsthen Q = 2 [ n ] is the Boolean lattice whose maximal chains are all n ! permutations in S n .Non-trivial order constraints arise frequently in applications of ranking models, for instancein computational biology [4] and machine learning [8]. Our algebraic framework based ongraded posets Q is well-suited for such contemporary applications of statistical ranking.While the Birkhoﬀ model has already received a lot of attention in the literature, wehere focus on the Csiszár model (Section 4), the ascending model (Section 5) and the inver-sion model (Section 6). For each of these toric varieties, we characterize the correspondinglattice polytope and its Markov bases, that is, binomials that generate the toric ideal.Section 7 is concerned with the

Plackett-Luce model , which is not a toric model, butis parametrized by certain conditional probabilities that are not monomials. In algebraicgeometry language, this model is obtained by blowing up the projective space P n − alonga family of linear subspaces of codimension , and we study its coordinate ring. We alsoexamine marginalizations of our models, including the widely used Bradley-Terry model .2.

Toric Models: A Sneak Preview

A toric model for complete permutation data is speciﬁed by a non-negative integer matrix A with n ! columns that all have the same sum S . These column vectors A π are indexedby permutations π ∈ S n and they represent the suﬃcient statistics of the model. Thearticle [17] serves as our general reference for toric models in statistics, their relationshipwith exponential families, and the role of the matrix A . For an introduction to algebraicstatistics in general, and for further reading on toric models, we refer to the books [13, 28].If r = rank( A ) then the convex hull of the column vectors A π is a lattice polytope ofdimension r − . We refer to it as the model polytope . The toric model can be identiﬁed withthe non-negative points on the projective toric variety associated with the model polytope.Each data set is summarized as a function u : S n (cid:55)→ N , where u ( π ) is the number of timesthe permutation π has been observed. Thinking of u as a column vector, we can form thematrix-vector product Au , whose entries are the suﬃcient statistics of the data u . Then thesum n ! · S of the entries in the vector Au coincides with the sample size N = (cid:80) π ∈ S n u ( π ) .In subsequent sections we will generalize to the situation where S n is replaced by a propersubset, in which case A has fewer than n ! columns, but still labeled by permutations. Thesewill be the linear extensions of a given partial order on [ n ] = { , , . . . , n } . In fact, for somemodels we can even take the set of maximal chains in an arbitrary ranked poset. But for aﬁrst look we conﬁne ourselves to the situation described above, where A has n ! columns.We now deﬁne four toric models for probability distributions on S n . We do this by wayof a verbal description of the suﬃcient statistics in each model. These suﬃcient statisticsare numerical functions on the permutations π of the given set [ n ] of items to be ranked.(a) In the ascending model , the suﬃcient statistics Au record, for each subset I ⊂ [ n ] ,the number of samples π in the data u that have the set I at the bottom. Here, theset I being at the bottom means that ( i ∈ I and j (cid:54)∈ I ) implies π ( i ) < π ( j ) . ommutative Algebra of Statistical Ranking 3 (b) In the Csiszár model , the suﬃcient statistics Au count, for each i ∈ I ⊂ [ n ] , thenumber of samples that have I at the bottom but with i as winner in the group I .This is the model studied by Villõ Csiszár [9, 10] under the name “L-decomposable”.(c) In the Birkhoﬀ model of [12, §6.1], the suﬃcient statistics Au of a data set u record,for each i, j ∈ [ n ] , the number of samples π in which object i is ranked in place j ,(d) In the inversion model , the suﬃcient statistics Au count, for each ordered pair i < j in [ n ] , the number of samples π in which that pair is an inversion, meaning π − ( i ) > π − ( j ) . This model can be seen as a multivariate version of the Mallowsmodel [25].To illustrate the diﬀerences between these models let us consider the simplest case n = 3 .In each case the toric ideal of the model is the kernel of a square-free monomial map from thepolynomial ring K [ p , p , p , p , p , p ] representing the probabilities to anotherpolynomial ring K [ a, b, . . . ] that represents the model parameters. The model polytope isthe convex hull of the six - vectors corresponding to the square-free monomials: p p p p p p Birkhoﬀ a a a a a a a a a a a a a a a a a a inversion b b b b b q q b b q q b b q q q q q ascending c c c c c c c c c c c c c c c c c c Csiszár d | d | d | d | d | d | d | d | d | d | d | d | d | d | d | d | d | d | The toric ideals record the algebraic relations among these square-free monomials: I inv = (cid:104) p p − p p , p p − p p (cid:105) has codimension , I birk = I asc = (cid:104) p p p − p p p (cid:105) has codimension , I csi = (cid:104) (cid:105) has codimension .For each model, the matrix A has six columns, indexed by S , and its rows are labeledby the model parameters. For example, for the ascending model, the matrix has seven rows: As =  p p p p p p c c c c c c c  Here we use the same notation for both the matrix and the model polytope, which is theconvex hull of the columns. From the equality of ideals, I birk = I asc , we infer that the poly-tope As is aﬃnely isomorphic to the × -Birkhoﬀ polytope, which is a cyclic -polytopewith six vertices. The ideal I inv reveals that the model polytope for the inversion model isa regular octahedron , while the polytope for the Csiszár model is the full -simplex .To see that no two of our four models agree, we need to go to n ≥ . Bernd Sturmfels and Volkmar Welker

Example . Let n = 4 . Then all four model polytopes have vertices but their dimen-sions are diﬀerent. The Birkhoﬀ model has dimension , the inversion model has dimension , the ascending model has dimension , and the Csiszár model has dimension . Theo-rem 3.1 will explain the precise relationships and inclusions among the four models. (cid:3) Our work on this project started by trying to understand a certain model whose toricclosure is the ascending model. Here toric closure refers to the smallest toric model contain-ing a given model. That non-toric model for ranking is the

Plackett-Luce model [8, 24, 29].It can be obtained from the ascending model by the following specialization of parameters: c i (cid:55)→ θ i , c ij (cid:55)→ θ i + θ j , c ijk (cid:55)→ θ i + θ j + θ k , . . . . The prime ideal of algebraic relations among the p π is a non-toric ideal which contains thetoric ideal I asc . The case n = 3 is worked out explicitly in Example 7.1. Geometrically, thatsmallest Plackett-Luce model corresponds to blowing up P at the nine points in (19).3. Toric Models: Definitions and General Results

Let Q be a poset on ﬁnite ground set Ω . A Q -ranking is a maximal chain a < · · · < a n in Q . A chain a < · · · < a n being maximal means that a is minimal in Q , a n is maximal, and a i < a i +1 is a cover relation for ≤ i ≤ n − . We write M( Q ) for the set of maximal chainsin Q and Cov( Q ) for the set of cover relations in Q . If Q = 2 [ n ] is the Boolean lattice of allsubsets of [ n ] ordered by inclusion then the maximal chains in Q are in bijection with thepermutations in S n , and the models below coincide with the ones described in Section 2.We shall deﬁne four toric models whose states are the maximal chains π ∈ M( Q ) . Theprobability of π is represented by an indeterminate p π . Each toric model for Q -rankings isdeﬁned by a non-negative integer matrix A whose columns are indexed by M( Q ) and havea ﬁxed coordinate sum S . The matrix A represents a monomial map from the polynomialring K [ p ] in the unknowns p π , π ∈ M( Q ) , to a suitably chosen second polynomial ring.Any data set gives a function u : M( Q ) (cid:55)→ N , where u ( π ) is the number of times thepermutation π has been observed. Thinking of u as a column vector, we can form thematrix-vector product Au , whose entries are the suﬃcient statistics of the data set u . Thecoordinate sum of the vector Au is equal to S times the sample size N = (cid:80) π ∈ M( Q ) u ( π ) .(a) In the ascending model , the suﬃcient statistic Au records, for any given posetelement a ∈ Q , the number of observed maximal chains π that pass though a . Themodel parameters are represented by unknowns c a , and the monomial map is p π (cid:55)→ c a c a · · · c a n for π = ( a π − ( j ) . Themodel parameters are represented by unknowns u ij and v ij . The monomial map is p π (cid:55)→ (cid:89) ≤ iπ − j ) v ij for π ∈ L ( P ) . In general, we have the following inclusions among the four toric models (a)-(d). Theseinclusions of toric varieties correspond to linear projections among the model polytopes.

Theorem 3.1. (i)

The ascending model and the Csiszár model on a poset Q satisfy M asc ⊆ M csi , provided Q has either a unique minimal element ˆ0 or a unique maximal element ˆ1 . (ii) If Q = O ( P ) is a distributive lattice, then the Birkhoﬀ model M birk , the inver-sion model M inv , the ascending model M asc and the Csiszár model M csi satisfy M inv ⊆ M csi and M birk ⊆ M asc ⊆ M csi . (iii) The inclusions (ii) are strict in general. Moreover, if n ≥ and Q = 2 [ n ] then M inv (cid:54)⊂ M asc and M birk (cid:54)⊂ M inv . Proof.

We begin by establishing (iii). The fact that the inclusions in (ii) are strict followsfrom Example 2.1. For the second part of (iii) consider n = 4 . A direct computation as inSection 6 reveals that the inversion model M inv is a projective toric variety of dimension and degree in P . The Markov basis of I inv consists of quadrics. Since M birk hasdimension , we conclude that M birk (cid:54)⊂ M inv . An explicit point p in M birk \M inv is theuniform distribution on the nine derangements. This arises by setting a ii = 0 for all i and Bernd Sturmfels and Volkmar Welker a ij = 1 / √ for all i (cid:54) = j . The quadric p p − p p ∈ I inv does not vanish for thisparticular distribution.The ascending model M asc has dimension and degree . The Markov basis of itstoric ideal I asc consists of six quadrics, cubics and quartics. One of the cubics is(1) p p p − p p p ∈ I asc . An example of a point in M inv \M asc is obtained by taking the parameter values u = u = u = 0 , u = u = u = v = v = v = v = 1 , v = 2 , v = 1 / . The resulting distribution is supported on the six permutations in (1). Its coordinates are p = p = p = 2 / and p = p = p = 1 / . This distribution is not a zero of (1), and hence it is not in the ascending model M asc .The two probability distributions on permutations seen above can be lifted to similarcounterexamples for n ≥ , and we conclude that the non-inclusions are valid for all n ≥ .The inclusion M asc ⊂ M csi in (i) is seen by the specialization of parameters that sends d aj v ij . This shows that the inversion model M inv is a subvariety of the Csiszár model M csi .It remains to show that M birk ⊂ M asc . To do this, we let A denote the model matrix for M birk and B the model matrix for M asc . Both matrices have their entries in { , } andthey have |L ( P ) | columns. The rows A ij of A are indexed by unordered pairs i, j ∈ [ n ] × [ n ] ,and the rows B I of B are indexed by subsets of [ n ] . We have the identity A ij = (cid:80)(cid:8) B I : I ∈ (cid:0) [ n ] j (cid:1) and i ∈ I (cid:9) − (cid:80)(cid:8) B I : I ∈ (cid:0) [ n ] j − (cid:1) and i ∈ I (cid:9) . This shows that every row of A is a Z -linear combination of the rows of B . Hence, the kernelof A contains the kernel of B , and this implies that the toric ideal I A = I birk contains thetoric ideal I B = I asc . We conclude that M birk is a submodel of M asc . (cid:3) In the rest of this paper we consider the ascending and Csiszár models only in the gradedsituation, that is, when the monomial images of all the unknowns p c , c ∈ M( Q ) , have thesame total degree. The latter is equivalent to requiring that all maximal chains in Q havethe same cardinality, which in turn is equivalent to Q being graded. For a graded poset Q we denote by rk : Q → N its rank function and write Q i for the set of its elements of rank i . By rk( Q ) we denote the rank of Q , which is the maximal rank of any of its elements.In the next three sections we undertake a detailed study of the models (b), (a) and(d), in this order. The Birkhoﬀ model (c) has already received considerable attention inthe literature [11, 12], at least for L ( P ) = S n , and we content ourselves with a few brief ommutative Algebra of Statistical Ranking 7 remarks. Its model polytope, the Birkhoﬀ polytope of doubly stochastic matrices, is a keyplayer in combinatorial optimization, and it is linked to many ﬁelds of pure mathematics.The restriction of the Birkhoﬀ model and its polytope to proper subsets L ( P ) of S n has been studied only in some special cases. For example, Chan, Robbins and Yuen [7]considered this polytope for the constraint poset P given by the transitive closure of j >j − and j > j − for ≤ j ≤ n . They stated a conjecture on its volume which was provedby Zeilberger [34]. We close by noting a formula for the dimension of these polytopes. Proposition 3.2.

Let P be an arbitrary constraint poset on [ n ] = { , , . . . , n } . Set Z = { ( i, j ) ∈ [ n ] × [ n ] | π ( i ) (cid:54) = j for all π ∈ L ( P ) } and C = (cid:26) ( i, j ) ∈ [ n ] × [ n ] | ( i, j ) (cid:54)∈ Z and ( i, j (cid:48) ) ∈ Z for some j (cid:48) > j or ( i (cid:48) , j ) ∈ Z for some i (cid:48) > i (cid:27) . The model polytope Bi of the Birkhoﬀ model, expressed using coordinates x ij on R n × n ,equals the face of the classical Birkhoﬀ polytope of bistochastic n × n -matrices deﬁned by (2) x ij = 0 for all ( i, j ) ∈ Z .In particular, the dimension of the Birkhoﬀ model polytope is dim(Bi) = n − | Z | − | C | .Proof. Clearly, the model polytope Bi of the Birkhoﬀ model is contained in the classicalBirkhoﬀ polytope. Equally obvious is that all equations (2) are valid for the model polytope.Hence Bi is contained in the polytope cut out from the classical Birkhoﬀ polytope by (2).Following the lines of the Birkhoﬀ-von Neumann Theorem (see e.g. [1, (5.2)]), we notethat the vertices of the polytope cut out by (2) from the classical Birkhoﬀ polytope arethe permutation matrices for the permutations π ∈ L ( P ) . The ﬁrst assertion now follows.The linear relations on the Birkhoﬀ polytope state that all row and column sums are .We set x ij = 0 for ( i, j ) ∈ Z . In the resulting linear relations precisely the variables x ij for ( i, j ) ∈ C are the leading terms with respect to order of the variables induced by thelexicographic order on the index tuples. This proves the dimension statement. (cid:3) We illustrate Proposition 3.2 with two simple examples. If P is an n -element antichainthen Z = ∅ and C = { (1 , n ) , (2 , n ) , . . . , ( n, n ) , ( n, n − , . . . ( n, } . Here our formula givesthe dimension n − − (2 n −

1) = ( n − of the classical Birkhoﬀ polytope. If P is the n -chain < < · · ·
The Csiszár model for the Boolean lattice Q = 2 [ n ] was studied by Villõ Csiszár in[9, 10]. She calls it the L-decomposable model where the letter “L” refers to Luce [24].Indeed, the model can be seen as the generic model satisfying Luce-decomposability (see[25]). We prefer to call it the

Csiszár model , to credit her work for introducing this modelinto algebraic statistics. We note that the Csiszár model for Q = 2 [ n ] also appears in workon multiple testing procedures by Hommel et. al. [20], but with a diﬀerent coordinatizationof its model polytope. Throughout this section, we ﬁx a graded poset Q of positive rank. Bernd Sturmfels and Volkmar Welker

We begin by deﬁning a - -matrix A = Ci that represents the Csiszár model. Ourconstruction is based on the technique employed for Q = 2 [ n ] in Csiszár’s proof of [9,Theorem 1]. The columns of Ci are indexed by the unknown probabilities p π where π ∈ M( Q ) , and the rows of Ci are indexed by the model parameters d a
Let Q be a graded poset of rank ≥ and Ci ⊆ R Cov( Q ) the model polytopeof its Csiszár model, with coordinates x a
A Gröbner basis for the toric ideal I csi of the Csiszár model on a gradedposet Q is given by all quadratic binomials of the form (6) p π π · p π (cid:48) π (cid:48) − p π π (cid:48) · p π (cid:48) π , where the chains π and π (cid:48) have the same ending point and both π and π (cid:48) start there. Proof.

It is easy to check that the binomial quadrics that lie in the ideal I csi are preciselythe quadrics (6). These are inherited from the conditional independence statements validfor the n -chain graphical model G . These statements translate into a quadratic Gröbnerbasis for the toric ideal of the matrix A G . The leading terms of that Gröbner basis aresquarefree, so by [31, Corollary 8.9] they deﬁne a regular unimodular triangulation of theconvex hull of the columns of A G . Since Ci = A (cid:48)(cid:48) G is a face of that polytope, that face inheritsthe regular unimodular triangulation from A G . We conclude that the Gröbner basis whichspeciﬁes this regular triangulation of Ci consists precisely of the quadrics (6). (cid:3) The Gröbner basis (6) reveals that the Csiszár model has desirable algebraic properties:

Corollary 4.4.

The coordinate ring K [ p ] /I csi of the Csiszár model over any ﬁeld K isCohen-Macaulay and Koszul. Its Krull dimension equals | Cov( Q ) | − | Q | + | Q n | + | Q | .Proof. Since I csi has a quadratic Gröbner basis, by Theorem 4.3, it follows that K [ p ] /I csi is Koszul. Again by Theorem 4.3 there is a squarefree initial ideal of I csi . Hence by [31,Proposition 13.15] the polytope the semigroup algebra K [ p ] /I csi is normal. and henceCohen-Macaulay, by Hochster’s Theorem [19, Theorem 1]. The dimension of this semigroupalgebra is one more than the dimension of its polytope, given in Theorem 4.1. (cid:3) For computations it is convenient to represent the quadrics in (6) as the × -minors ofcertain natural matrices M q that are indexed by the elements q of the poset Q . The rowlabels of the matrix M q are the maximal chains in the order ideal Q ≤ q = { a ∈ Q : a ≤ q } and the column labels of M q are the maximal chains in the ﬁlter Q ≥ q = { b ∈ Q : q ≤ b } .Thus M q is a matrix of format | M( Q ≤ q ) | × | M( Q ≥ q ) | . We deﬁne M q as follows. The entryof M q in the row labeled π ∈ M( Q ≤ q ) and the column labeled π ∈ Q ≥ q is the unknown p π where π denotes the maximal chain of Q that is obtained by concatenating π and π . Corollary 4.5.

The Markov basis of the Csiszár ideal I csi consists of the × -minors ofthe matrices M q , where q runs over Q . This Markov basis is also a Gröbner basis.Proof. Each × -minor of M q has the form required in (6), and, conversely, each binomialin (6) occurs as a × -minor of M q for some q . Note that this element q ∈ Q is generallynot unique for a given binomial. The Gröbner basis statement is a part of Theorem 4.3. (cid:3) We illustrate our results for the case when Q = 2 [ n ] is the Boolean lattice, with n ≤ .For n = 3 , the ideal I csi is zero as seen in Section 2. For n = 4 , the ideal I csi is the completeintersection of six quadrics, namely, the determinants of the six × -matrices M { i,j } . Geo-metrically, these correspond to the six square faces of the -dimensional permutahedron: I csi = (cid:104) p p − p p , p p − p p , p p − p p ,p p − p p , p p − p p , p p − p p (cid:105) . We conclude that the Csiszár model for n = 4 has dimension , as predicted by The-orem 4.1. As a projective variety, this model has degree since it is a complete inter-section. For n = 5 , the Markov basis consists of the × -minors of the ten × -matrices ommutative Algebra of Statistical Ranking 11 M { , } , M { , } , . . . , M { , } and ten × -matrices M { , , } , M { , , } , . . . , M { , , } . For example, M { , } = (cid:18) p p p p p p p p p p p p (cid:19) . Altogether, these matrices have maximal minors but of the minors occur in twomatrices, so the total number of distinct Markov basis elements is . The dimension ofthis model is , and its degree equals . The Hilbert series of K [ p ] /I csi equals (1 + 70 t + 2215 t + 42020 t + 534635 t + 4837694 t + 32227985 t + 161529320 t +617560160 t + 1816401720 t + 4129171068 t + 7265606880 t + 9880962560 t +10337876480 t + 8250364160 t + 4953798656 t + 2189864960 t +688455680 t + 145162240 t + 18350080 t + 1048576 t ) / (1 − t ) . For n = 6 , the Markov basis is represented by the ﬁfteen × -matrices M { i,j } , thetwenty × -matrices M { i,j,k } and the ﬁfteen × -matrices M { i,j,k,l } . Altogether, these matrices have minors of size × but only of the binomial quadrics are distinct.A systematic way of understanding our matrices M q is furnished by Sullivant’s theoryof toric ﬁber products [32]. This method will become crucial when studying the ascend-ing model in the next section and we will explain at the end of the section how toric ﬁberproduct can also be used to give an alternative proof of Theorem 4.3.5. The ascending model

At the end of [9, p. 233] it is asserted that a Markov basis for the ascending model on Q = 2 [ n ] can be obtained in a similar way as was done for the standard Csiszár model, butno details are given. However, simple examples show that it does not suﬃce to considerquadratic binomials for the generating set and it is not clear from [9] which properties thedeﬁning ideals of the ascending and Csiszár model have in common. The deﬁning ideal andthe model polytope of the ascending model seem to be complicated and more interestingthan those of the Csiszár model. These are the structures to be explored in this section.Generalizing the notation introduced in the preceding section, for any subset A ⊆ Q , weconsider the set of elements of A that cover an element from A : ∇ A := { b ∈ Q | a < b ∈ Cov( Q ) for some a ∈ A } . We also consider the set of elements covered by an element from A : ∆ A := { b ∈ Q | b < a ∈ Cov( Q ) for some a ∈ A } . Theorem 5.1.

Fix a graded poset Q of rank n . The model polytope As of the ascendingmodel is the set of solutions in the space R | Q | , with coordinates x a for a ∈ Q , of the equations (cid:88) a ∈ Q i x a = 1 , ≤ i ≤ n, (7) and the inequalities x a ≥ , a ∈ Q, (8) − (cid:88) a ∈ A x a + (cid:88) a ∈∇ A x a ≥ , A ⊆ Q i , ≤ i ≤ n − . (9) Proof.
Equations (7) are valid on every vertex of As because every maximal chain in P hasexactly one element of rank i for all ≤ i ≤ n . The inequalities (9) express the fact thatif a maximal chain passes through an element of A ⊆ Q i then it must also pass through aunique element of ∇ A . Inequalities (8) are obviously valid for As . Hence As is contained inthe intersection of the linear spaces deﬁned by (7) and the halfspaces deﬁned by (8) and (9).For the converse we proceed by induction on n . If n = 0 then As is a simplex of dimension | Q | − , deﬁned by (7) and (8). If n = 1 then the result is identical to [26, Corollary 1.8 (b)].Assume n ≥ . Let x = ( x a ) a ∈ Q ∈ R Q be any vector satisfying (7), (8) and (9). Let x (cid:48) be the projection of x onto the coordinates in Q (cid:48) = Q ∪ · · · ∪ Q n − and x (cid:48)(cid:48) the projectionof x onto Q (cid:48)(cid:48) = Q n − ∪ Q n . By induction, x (cid:48) and x (cid:48)(cid:48) lie in the model polytopes of theascending model for Q (cid:48) and Q (cid:48)(cid:48) . Hence we can write x and x (cid:48) as convex linear combinations: x (cid:48) = (cid:80) c (cid:48) ∈ M( Q (cid:48) ) λ c (cid:48) c (cid:48) and x (cid:48)(cid:48) = (cid:80) c (cid:48)(cid:48) ∈ M( Q (cid:48)(cid:48) ) λ c (cid:48)(cid:48) c (cid:48)(cid:48) . Here we identify c (cid:48) and c (cid:48)(cid:48) with the / -vector that has support c (cid:48) and c (cid:48)(cid:48) respectively.Consider a ﬁxed element a ∈ Q n − . Let c (cid:48) , . . . , c (cid:48) r be the chains from the above expansionof x (cid:48) that contain A and for which λ c (cid:48) > . Let c (cid:48)(cid:48) , . . . , c (cid:48)(cid:48) s be the chains from the aboveexpansion of x (cid:48)(cid:48) that contain a and for which λ c (cid:48)(cid:48) > . The coordinate x (cid:48) a of x (cid:48) thenequals (cid:80) λ c (cid:48) i and the coordinate x (cid:48)(cid:48) a of x (cid:48)(cid:48) equals (cid:80) λ c (cid:48)(cid:48) i . Since x (cid:48) a and x (cid:48)(cid:48) a coincide with thecoordinate x a of x , we have (cid:80) λ c (cid:48) i = (cid:80) λ c (cid:48)(cid:48) i . After relabeling (and possibly swapping x (cid:48) and x (cid:48)(cid:48) ) we may assume that λ c (cid:48) is the minimum of { λ c (cid:48) , . . . , λ c (cid:48) r , λ c (cid:48)(cid:48) , . . . , λ c (cid:48)(cid:48) s } . Then wereplace λ c (cid:48)(cid:48) by λ c (cid:48)(cid:48) − λ c (cid:48) . Let c ∈ M( Q ) be the concatenation of c (cid:48) and c (cid:48)(cid:48) . Now set λ c = λ c (cid:48) and proceed with the new coeﬃcients and the chains c (cid:48) , . . . , c (cid:48) r and c (cid:48)(cid:48) , . . . , c (cid:48)(cid:48) s . Clearly thesums of the coeﬃcients of c (cid:48) , . . . , c (cid:48) r and c (cid:48)(cid:48) , . . . , c (cid:48)(cid:48) s still coincide. Proceeding by inductionand summing over all a ∈ Q n − for which x a > , one constructs an expansion (cid:80) λ i c i interms of chains in M( Q ) whose projection onto M ( Q (cid:48) ) equals x (cid:48) and whose projection onto M ( Q (cid:48)(cid:48) ) equals x (cid:48)(cid:48) . Hence x = (cid:80) λ i c i , and we have λ i ≥ and (cid:80) λ i = (cid:80) a ∈ Q n − x a = 1 by(7). This proves that x ∈ As . (cid:3) In the preceding proof, when showing that any x satisfying (7)–(9) lies in As , we use (9)only in the induction base n = 1 . The equations (7) are complete and independent when Q = 2 [ n ] is the Boolean lattice, so in that case the dimension of the model polytope As isequal to n − n − . In general the dimension is more subtle to calculate and we do notknow any good description. For example if the induced subposet of Q on the elements oftwo consecutive ranks i and i + 1 is disconnected then As is contained in each hyperplanedeﬁned by the equality of the sum over the variables of rank i and i + 1 in a component.Now we turn to the toric ideal I asc of the ascending model. It is the kernel of the map(10) K [ p ] → K [ t ] , p π (cid:55)→ t a t a · · · t a n for π = ( a < · · · < a n ) ∈ M( Q ) . If rk( Q ) = 0 then this map is injective and I asc = { } , so we assume rk( Q ) ≥ fromnow on. The case rk( Q ) = 1 serves as the base case for our inductive constructions. Herethe poset Q is identiﬁed with a bipartite graph on Q and Q , and the monomial map ommutative Algebra of Statistical Ranking 13 p π (cid:55)→ t a t a deﬁnes the toric ring associated with a bipartite graph in commutative algebra.A generating set of the kernel of this map was determined in [27, Lemma 1.1] and shownto be a universal Gröbner basis in [33, Proposition 8.1.10]. This result has already provento be useful in algebraic statistics (see e.g. [14]). Lemma 5.2 (Ohsugi-Hibi [27], Villerreal [33]) . Let Q be a graded poset of rank . Thena universal Gröbner basis of the toric ideal I asc consists of all cycles in Q , expressed asbinomials p a
A Gröbner basis for the toric ideal I asc of the ascending model on a gradedposet Q of rank n is given by two classes of binomials. The ﬁrst class consists of the quadrics (11) p π · p π − p ¯ π · p ¯ π , where π , ¯ π , π , ¯ π are distinct chains of at least three elements, such that π ∪ π = ¯ π ∪ ¯ π as multisets and π ∩ π = ¯ π ∩ ¯ π is nonempty. The second class consists of all binomials (12) p π p π · · · p π s − p ¯ π p ¯ π · · · p ¯ π s , where π , ¯ π , . . . , π s , ¯ π s are constructed as follows: Choose i ∈ { , , . . . , n − } and take anycycle γ = ( a a < · · · a m = a ) in the subposet Q i,i +1 of all elements havingrank i or i + 1 in Q . Then the maximal chains π j , ¯ π j for ≤ j ≤ s are chosen such that π j = ( u j, = ¯ u j, < · · · < u j,i = ¯ u j,i = a j < a j +1 = u j,i +1 < · · · < u j,n ) and ¯ π j = ( u j, = ¯ u j, < · · · < u j,i = ¯ u j,i = a j < a j − = ¯ u j,i +1 < · · · < ¯ u j,n ) and the multisets { u j,(cid:96) | ≤ j ≤ s, i ≤ (cid:96) ≤ n } and { ¯ u j,(cid:96) | ≤ j ≤ s, i ≤ (cid:96) ≤ n } coincide. In Figure 1 we give a visual description of the binomial (12).For the proof of this result we shall employ Sullivant’s theory of toric ﬁber products from [32]. We brieﬂy review that theory. Consider two polynomial rings K [ p (cid:48) ] and K [ p (cid:48)(cid:48) ] and a surjective multigrading φ : { p (cid:48) } ∪ { p (cid:48)(cid:48) } → A ⊆ R d , called the A -grading . Then choosenew variables z π,τ for all π ∈ { p (cid:48) } and τ ∈ { p (cid:48)(cid:48) } such that φ ( π ) = φ ( τ ) . For ideals I in K [ p (cid:48) ] and J in K [ p (cid:48)(cid:48) ] that are A -homogeneous, we let I × A J denote the kernel of the map z π,τ (cid:55)→ p (cid:48) π ⊗ p (cid:48)(cid:48) τ from K [ z ] to the tensor product K [ p (cid:48) ] /I ⊗ K [ p (cid:48)(cid:48) ] /J . u n ¯ u n u n ¯ u n u sn ¯ u sn a = a s a a s − a s − a a s − u = ¯ u u = ¯ u u s = ¯ u s Figure 1.

A binomial in the Gröbner basis of the ascending modelIn order to describe a Gröbner basis of I × A J in terms of Gröbner bases of I and J , theconcept of lifting monomials turns out to be crucial [32, p. 567]. A lift of a variable p (cid:48) π is z πτ for some τ with φ ( π ) = φ ( τ ) . Now assume that A is linearly independent. Let f ∈ K [ p (cid:48) ] be an A -homogeneous polynomial. Each monomial m in f factors as m a . . . m a r where A = { a , . . . , a r } and φ ( m a i ) = deg( m a i ) a i . Moreover, since A is linearly independent,each monomial m in f gives the same number d i := deg( m a i ) of variables of degree a i (counted with multiplicity). Now choose a multisets of d i variables p (cid:48)(cid:48) of degree a i . A lift of f is then any polynomial obtained from the above choices when lifting the variables in eachmonomial from f in such a way that for all monomials the chosen multisets are exhausted. Proof.

We proceed by induction on n = rank( Q ) . If n = 1 then (11) describes an emptyset of binomials and the set in (12) coincides with the Gröbner basis given in Lemma 5.2.Now assume n ≥ . As in the proof of Theorem 5.1 we split Q into the subposet Q (cid:48) = Q ∪· · ·∪ Q n − consisting of ranks , . . . , n − and the bipartite poset Q (cid:48)(cid:48) = Q n − ∪ Q n consisting of ranks n − and n . Assume Q n − = { a , . . . , a r } . Any chain in M( Q (cid:48) ) endsin an element from Q n − , and any chain from M( Q (cid:48)(cid:48) ) starts in an element from Q n − .We consider the polynomial ring K [ p (cid:48) ] with variables p (cid:48) π for π ∈ M( Q (cid:48) ) and K [ p (cid:48)(cid:48) ] withvariables p (cid:48)(cid:48) π for π ∈ M( Q (cid:48)(cid:48) ) . Then we grade p (cid:48) π by e i ∈ R r if π ends in a i and p (cid:48)(cid:48) c by e i ∈ R r if π begins in a i . Note that the set of degrees A = { e , . . . , e r } is linearly independent.We write I (cid:48) asc for the ideal of the ascending model of Q (cid:48) and I (cid:48)(cid:48) asc for the ideal of theascending model of Q (cid:48)(cid:48) . The toric ideal of interest to us is the ﬁber product I asc = I (cid:48) asc × A I (cid:48)(cid:48) asc .Since A is linear independent, we can apply [32, Theorem 12] and the induction hypothesis ommutative Algebra of Statistical Ranking 15 to prove the claim. Sullivant’s result tells us that a Gröbner basis of I asc can be found bylifting Gröbner bases of the ideals I (cid:48) asc and I (cid:48)(cid:48) asc and by adding some quadratic relations.By induction, I (cid:48) asc has a Gröbner basis G (cid:48) consisting of elements (11) and (12). We shalllift these to binomials in I asc . Likewise, I (cid:48)(cid:48) asc has a Gröbner basis G (cid:48)(cid:48) consisting of elements(12). There are no binomials of type (11) in I (cid:48)(cid:48) asc because the poset Q (cid:48)(cid:48) has only rank . Lifting (11):

Let p π p π − p ¯ π p ¯ π be a quadric (11) in G (cid:48) . Since it is A -homogeneous, themultisets of endpoints of π , π and ¯ π , ¯ π coincide. Suppose π and ¯ π have the sameendpoint. In the lifting described above we need to distinguish two cases. Case 1 : π and π end in diﬀerent endpoints. Then, for any two maximal chains τ , τ in Q (cid:48)(cid:48) starting in the endpoints of π and π respectively, the unique lift for these choices is(13) p π τ · p π τ − p ¯ π τ · p ¯ π τ ∈ I asc . Case 2 : π and π end in the same endpoint. Then, for any two chains τ , τ in Q (cid:48)(cid:48) startingin the common endpoint of π and π , besides the lift (13) we also have the lift(14) p π τ · p π τ − p ¯ π τ · p ¯ π τ ∈ I asc . One easily checks that the binomials from (13) and (14) satisfy the conditions from (11).

Lifting (12):

First consider a binomial p π · · · p π s − p ¯ π · · · p ¯ π s of type (12) in the Gröbnerbasis G (cid:48) . Since it is A -homogeneous, the multisets { φ ( π ) , . . . , φ ( π s ) } and { φ (¯ π ) , . . . , φ (¯ π s ) } coincide. Now choose maximal chains π (cid:48)(cid:48) , . . . , π (cid:48)(cid:48) s from Q (cid:48)(cid:48) with the same multiset of A -degrees { φ ( π (cid:48)(cid:48) ) , . . . , φ ( π (cid:48)(cid:48) s ) } . Note that the π (cid:48)(cid:48) i are just single cover relations. For any γ ∈ S s such that φ (¯ π j ) = φ ( π (cid:48)(cid:48) γ ( j ) ) , the binomial p π π (cid:48)(cid:48) · · · p π s π (cid:48)(cid:48) s − p ¯ π π (cid:48)(cid:48) τ (1) · · · p ¯ π s π (cid:48)(cid:48) γ ( s ) lies in I asc and is of type (12).We next consider a binomial p π · · · p π s − p ¯ π · · · p ¯ π s of type (12) in the Gröbner basis G (cid:48)(cid:48) .The proof is analogous to the previous case, but the multiset of A -degree { φ ( π ) , . . . , φ ( π s ) } = { φ (¯ π ) , . . . , φ (¯ π s ) } here is actually a set. Choosing a set { π (cid:48) , . . . , π (cid:48) s } of maximal chainsfrom Q (cid:48) for which { φ ( π ) , . . . , φ ( π s ) } and { φ ( π (cid:48) ) , . . . , φ ( π (cid:48) s ) } coincide leads to a unique lift p π (cid:48)(cid:48) π · · · p π (cid:48)(cid:48) s π s − p π (cid:48)(cid:48) ¯ π · · · p π (cid:48)(cid:48) s ¯ π s is I asc of type (12). All the binomials constructed by these liftings from G (cid:48) and G (cid:48)(cid:48) areamong the binomials described in (11) and (12) for the ideal I asc we seek to generate.Finally, we add the quadratic binomials p π (cid:48) π (cid:48)(cid:48) p π (cid:48) π (cid:48)(cid:48) − p π (cid:48) π (cid:48)(cid:48) p π (cid:48) π (cid:48)(cid:48) for all maximal chains π (cid:48) , π (cid:48) ∈ M( Q (cid:48) ) and π (cid:48)(cid:48) , π (cid:48)(cid:48) ∈ M( Q (cid:48)(cid:48) ) whose A -degrees coincide. These binomials lie in I asc and they have type (11).We have shown that the lifting of the Gröbner bases G (cid:48) for I (cid:48) asc and G (cid:48)(cid:48) for I (cid:48)(cid:48) asc plus theadditional quadrics are a subset of the binomials described in (11) and (12). Using [32,Theorem 12], we conclude that the binomials from (11) and (12) form a Gröbner basis of I asc . Actually, the following converse is true as well: all binomials (11) and (12) in I asc arisefrom I (cid:48) asc and I (cid:48)(cid:48) asc using the lifting procedure we described. (cid:3) Corollary 5.4.

The toric algebra K [ p ] /I asc is normal and Cohen-Macaulay. Proof.

Theorem 5.3 gave a Gröbner basis for I asc whose leading monomials are squarefree.This shows that K [ p ] /I asc is normal. Hochster’s Theorem [19, Theorem 1] implies Cohen-Macaulayness. (cid:3) We could also give an alternative proof of Theorem 4.3 using toric ﬁber products. Namely,the toric algebra K [ p ] /I csi can be obtained as an iterated toric ﬁber product of suitablygraded smaller polynomial rings that are attached to the pieces in a decomposition of Q into antichains. The matrices M q introduced after the proof of Theorem 4.3 represent the“glueing quadrics” used for constructing larger toric ideals from smaller ones.We close with some brief remarks on the ascending model for the Boolean lattice Q = 2 [ n ] .In Section 2 we saw that, for n = 3 , the ideal I asc is principal with generator p p p − p p p . This cubic is of type (12). It represents the unique cycle in the hexagon Q , .For n = 4 , the minimal Markov basis of the ascending model consists of quadrics, cubics and quartics. Thus, here we encounter binomials of both types (11) and (12).The Hilbert series of the Cohen-Macaulay ring K [ p ] /I asc for Q = 2 [4] is found to be t + 72 t + 228 t + 291 t + 168 t + 36 t (1 − t ) . The inversion Model

The inversion model is deﬁned only in the case when Q is the distributive lattice as-sociated with a constraint poset P on [ n ] . The maximal chains in Q correspond to linearextensions π ∈ L ( P ) of the constraint poset. These are the permutations π ∈ S n thatare compatible with P . Fix unknowns u ij and v ij for ≤ i < j ≤ n . Algebraically, theinversion model is deﬁned by the toric ideal which is the kernel of the monomial map p π (cid:55)→ (cid:89) ≤ iπ − j ) v ij . We begin considering the unconstrained inversion model . By this we mean the case when P is an n -element antichain, so there are no constraints at all. In that unconstrained case,we have Q = 2 [ n ] and our state space M( Q ) = S n = L ( P ) consists of all n ! permutations.The Mallows model [25] is a natural specialization of the unconstrained inversion modelto a single parameter q . It is obtained by setting u ij := 1 and v ij := q . So, in this model,the probability of observing the permutation π is P ( π ) = Z − q | inv( π ) | , where inv( π ) = (cid:8) ( i, j ) : 1 ≤ i < j ≤ n, π − ( i ) > π − ( j ) (cid:9) is the set of inversions of π , and Z is a normalizing constant. In contrast, our inversionmodel permits diﬀerent parameters for the various inversions occurring in a permutation.The model polytope for the unconstrained inversion model is a familiar object in combi-natorial optimization, where it is known as the linear ordering polytope [15, 18]. It is knownthat optimizing a general linear function over the linear ordering polytope is an NP-hardproblem [18]. This mirrors the fact that the facial structure of this polytope is very com-plicated and a complete description appears out of reach. As a result of this, we expect the ommutative Algebra of Statistical Ranking 17 toric rings associated with the inversion models to be more complicated than those studiedin the previous two sections. Our study was limited to ﬁnding some computational results. Theorem 6.1.

For n ≤ the toric ring of the unconstrained inversion model is normaland hence Cohen-Macaulay. For n ≤ it is Gorenstein and its Markov basis consists ofquadrics. For n = 6 it is not Gorenstein and there exists a Markov basis element of degree .Proof. Computations using [16] show that the Markov basis for n = 3 , , consistsof , , quadratic binomials. We do not know whether there is a quadratic Gröbnerbasis for n = 5 , or whether the ring is Koszul. The Hilbert series for n ≤ are n Hilbert Series t + t ) / (1 − t ) t + 72 t + 72 t + 17 t + t ) / (1 − t ) t +2966 t +22958 t +61026 t +61026 t +22958 t +2966 t +109 t + t ) / (1 − t ) All three numerator polynomials are symmetric. Using normaliz [5] one checks that thetoric ring is normal in each case. Hochster’s Theorem [19] implies that it is Cohen-Macaulay.The Gorenstein property now follows from the general result that any Cohen Macaulaydomain whose Hilbert series has a symmetric numerator polynomial is Gorenstein.For n = 6 , the computations are much harder, and they reveal that the above nice prop-erties no longer hold. The software also found that the Hilbert series of this unconstrainedinversion model is the product of / (1 − t ) and the remarkable numerator polynomial t + 117783 t + 5125328 t + 76415229 t +475189840 t + 1372165343 t + 1943081264 t + 1372165343 t + 475189840 t +76416069 t + 5127008 t + 118623 t + 704 t + t . This polynomial is close to symmetric but not symmetric, so the ring is not Gorenstein.In addition to quadrics, a Markov basis for n = 6 must contain the cubic binomial(15) p p p − p p p . Indeed, a computation shows that these are only two cubic monomials in the ﬁber givenby the multiset of inversions { (1 , , (2 , , (2 , , (3 , , (3 , , (3 , , (4 , , (5 , , (5 , } . (cid:3) A complete description of the binomial quadrics in a Markov basis was recently found byKatthän [23]. However, the problem of characterizing a full Markov basis is widely open.We do not know whether normality holds for n ≥ , but we suspect not. To addressthis question, we return to the general situation of an underlying constraint poset P . Thestates π of the P -constrained inversion model are elements of the subset L ( P ) ⊂ S n . Thisinclusion corresponds to passing to some coordinate hyperplanes in the ambient space ofthe model polytopes. Therefore, the model polytope for the P -constrained model is a faceof the model polytope for the unconstrained model. Hence, to answer our question aboutnormality for n ≥ , it could suﬃce to show that the toric ring for P is not normal.At present our state of knowledge about the P -constrained inversion models is ratherlimited. We do not yet even have useful formula for the dimension of its model polytope. By contrast, the dimension of the unconstrained model equals (cid:0) n (cid:1) , as this is the dimensionof the linear ordering polytope. This was shown, for example, in [30, Proposition 3.10].We wish to mention a family of constraint posets that is important for applicationsof statistical ranking in data mining, e.g. in recent work of Cheng et al. [8]. For thatapplication one would take P to be any disjoint union of a chain and an antichain. Example . Let n ≥ and P be the poset consisting of the -chain < < and n − incomparable elements. If n = 4 then L ( P ) = { , , , } and the toric ideal I inv is the zero ideal in the polynomial ring in four unknowns. If n = 5 then the number ofstates is and the model polytope has dimension , degree , and the Hilbert series is t + 38 t + 28 t + 3 t (1 − t ) . The Markov basis for this P -constrained model consists of quadrics: p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p It can be asked which P -constrained inversion models have a Markov basis of quadricsand, more generally, which degrees appear in a Markov basis. We conﬁrmed the quadraticMarkov basis for all posets P on n ≤ elements, all on n = 5 elements arising by addingone incomparable element to a poset on elements, and all unconstrained models for n ≤ .Interestingly, the notion of inversion model changes if we deﬁne i < j to be an inversionif π ( i ) > π ( j ) . The latter can be seen as a homogeneous Babington-Smith model from [25].The deﬁning monomial map for this alternative inversion model equals p π (cid:55)→ (cid:89) ≤ iπ ( j ) v ij for π ∈ L ( P ) . For the -chain < < with two incomparable elements, the Markov basis now consists of p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p p p − p p , and p p p − p p p , and p p p p − p p p p .So, unlike in Example 6.2, this Markov basis is not quadratic. The Hilbert series equals t + 28 t + 51 t + 66 t + 63 t + 44 t + 21 t + 5 t (1 − t ) . ommutative Algebra of Statistical Ranking 19 Note that, if L ( P ) is closed under taking inversions, then this model coincides with thenormal P -constraint inversion model up to a relabeling. This holds for the unconstrainedinversion model. All examples tested in this alternative model had normal model polytopes.7. Plackett-Luce Model and Bradley-Terry model

The Plackett-Luce model is a non-toric model on the set L ( P ) of permutations π ∈ S n that are consistent with a given constraint poset P on [ n ] . It can be deﬁned by the map(16) p π (cid:55)→ n − (cid:89) i =1 (cid:80) ij =1 θ π ( j ) for π ∈ L ( P ) . We denote this model by PL P and its homogeneous ideal by I PL P . Thus I PL P is the kernelof the ring map R [ p π : π ∈ L ( P ) ] → R ( θ , θ , . . . , θ n ) deﬁned by the formula (16). Theformula shows that the Plackett-Luce model is a submodel of the ascending model on L ( P ) .In fact, the ascending model is the toric closure of the Plackett-Luce model, by which wemean that As P is the smallest toric model containing PL P . The specialization map is(17) t π ( { , ,...,i } ) (cid:55)→ (cid:0) θ π (1) + θ π (2) + · · · + θ π ( i ) (cid:1) − . We ﬁx K = C and regard the Plackett-Luce model PL P as a projective variety in P |L ( P ) |− .The toric closure property means that all binomials in I PL P must lie in I asc , and this followsfrom unique factorization in R [ θ , . . . , θ n ] , given that the linear forms in (17) are distinct.In order for PL P to be properly deﬁned as a statistical model, its probabilities shouldsum to . For this we would need to identify the normalizing constant, which is the imageof (cid:80) π ∈L ( P ) p π under the map (16). A formula for this quantity can be derived, for manysituations of interest, from equations (25) and (26) in Hunter’s article [21]. The mostgeneral situation where the normalizing constant was determined can be found in [2]. Theymake use of sophisticated methods from the algebraic and geometric theory of valuationson cones. In our situation, (cid:80) π ∈ S n p π is mapped to θ θ ··· θ n under the ring map in (16).Let us begin by examining the unconstrained case when P is an antichain, Q = 2 [ n ] and L ( P ) = M( Q ) = S n . This is the Plackett-Luce model PL n familiar from the statisticsliterature [21, 24, 29]. With the correct normalizing constant, its parametrization equals(18) p π (cid:55)→ n (cid:89) i =1 θ π ( i ) (cid:80) ij =1 θ π ( j ) for π ∈ S n . This deﬁnes a polynomial map from the non-negative orthant R n ≥ to the ( n ! − -dimensionalsimplex of probability distributions on the symmetric group S n . We shall regard PL n as acomplex projective variety in the ambient P n ! − . Being the image of a rational map from P n − , the dimension of this variety is ≤ n − . Theorem 7.4 shows that it equals n − . Example n = 3 ) . The Plackett-Luce model PL is a surface of degree embedded in -dimensional projective space P . The parameterization (16) of that surface is equivalent to p (cid:55)→ θ θ ( θ + θ )( θ + θ ) , p (cid:55)→ θ θ ( θ + θ )( θ + θ ) , p (cid:55)→ θ θ ( θ + θ )( θ + θ ) ,p (cid:55)→ θ θ ( θ + θ )( θ + θ ) , p (cid:55)→ θ θ ( θ + θ )( θ + θ ) , p (cid:55)→ θ θ ( θ + θ )( θ + θ ) . The deﬁning ideal I PL of PL is minimally generated by three quadratic polynomials, inaddition to the familiar cubic binomial that speciﬁes the ambient ascending model: I PL = (cid:28) p ( p + p ) − p ( p + p ) , p ( p + p ) − p ( p + p ) ,p ( p + p ) − p ( p + p ) , p p p − p p p (cid:29) . The singular locus of PL consists of the three isolated points e − e , e − e and e − e in P . In particular, there are no singular points with non-negative coordinates,so this statistical model is a smooth surface in the -dimensional probability simplex.From the point of view of algebraic geometry, our parametrization map represents theblow-up of the projective plane P at the following conﬁguration of nine special points:(19) ( 0 : 0 : 1 ) ( 0 : 1 : 0 ) ( 1 : 0 : 0 )(1 : − −

1) (0 : 1 : − −

1) (1 : − − This conﬁguration has three -point lines and four -point lines. The map blows down thethree -point lines, and this creates a rational surface in P with three singular points.From the point of view of commutative algebra, one might ask whether the four genera-tors of the ideal I PL form a Gröbner basis with respect to some term order. A computationreveals that this is not the case. However, we do get a square-free Gröbner basis for thelexicographic term order with p >p >p >p >p >p . The initial ideal equals in lex ( I PL ) = (cid:104) p , p , p (cid:105) ∩ (cid:104) p , p , p (cid:105) ∩ (cid:104) p , p , p (cid:105) ∩(cid:104) p , p , p (cid:105) ∩ (cid:104) p , p , p (cid:105) ∩ (cid:104) p , p , p (cid:105) ∩ (cid:104) p , p , p (cid:105) . This represents a simplicial complex of seven triangles, listed in a shelling order, so I PL isCohen-Macaulay. The Hilbert series of the ring R [ p ] /I PL equals (1 + 3 t + 3 t ) / (1 − t ) . (cid:3) Example n = 4 ) . The Plackett-Luce model PL is a threefold of degree in P . Itis obtained from P by blowing up lines. The homogeneous prime ideal I PL that deﬁnes PL is minimally generated by quadrics and cubics. Its Hilbert series equals t + 105 t + 65 t (1 − t ) . We do not know whether I PL n is generated in degree and for n ≥ . (cid:3) Let us now turn to the general Plackett-Luce model with a given constraint poset P ,so only permutations π in L ( P ) are allowed. The model PL P is obtained from PL n byprojecting onto those coordinates. Algebraically, the prime ideal I P is obtained from I PL n by eliminating all unknowns p π where π is a permutation that is not compatible with P . Example . Let n = 4 and let P be the poset with two covering relations < and < .The corresponding distributive lattice L ( P ) is the product of two chains of length . Notethat L ( P ) has six maximal chains, namely, the permutations that respect < and < .The corresponding unknowns are mapped to products of four linear forms as follows: p (cid:55)→ θ ( θ + θ )( θ + θ )( θ + θ + θ ) , p (cid:55)→ θ ( θ + θ )( θ + θ )( θ + θ + θ ) ,p (cid:55)→ θ ( θ + θ )( θ + θ )( θ + θ + θ ) , p (cid:55)→ θ ( θ + θ )( θ + θ )( θ + θ + θ ) ,p (cid:55)→ θ ( θ + θ )( θ + θ )( θ + θ + θ ) , p (cid:55)→ θ ( θ + θ )( θ + θ )( θ + θ + θ ) . ommutative Algebra of Statistical Ranking 21 These reducible quartics meet in nine lines in P , so the parametrization of PL P blowsthese up. The ideal I P is complete intersection. Its minimal generators are the cubic p p p + p p + p p p − p p p − p p − p p p and the binomial quadric p p − p p that deﬁnes the ascending model on P . (cid:3) The following is our main result in this section. It should be useful for obtaining infor-mation about the ( n − -dimensional variety PL P and its homogeneous prime ideal I P . Theorem 7.4.

The parameterization P n − → PL P ⊂ P |L ( P ) |− of the Plackett-Luce modelon the poset P is given geometrically as the blowing up of P n − along an arrangement oflinear subspaces of codimension . These subspaces are deﬁned by the equations (cid:80) i ∈ A θ i = (cid:80) j ∈ B θ j = 0 where { A, B } runs over all incomparable pairs in the distributive lattice on P .Proof. Let R [ t ] denote the polynomial ring of parameters in the ascending model (10). Itsindeterminates are t A where A runs over subsets of [ n ] that are order ideals in P . We deﬁne M to be the Stanley-Reisner ideal of the distributive lattice of order ideals in P . This isthe ideal in R [ p ] generated by products t A t B where A and B are incomparable, meaningthat neither A ⊂ B nor B ⊂ A holds. The Alexander dual of M is the monomial ideal M ∗ = (cid:92) { A,B } (cid:104) t A , t B (cid:105) , where the intersection is over all incomparable pairs of order ideals. The generators of M ∗ correspond to the associated primes of M , so they are indexed by compatible permutations π ∈ L ( P ) . Interpreting π as a maximal chain of order ideals, that correspondence is(20) p π (cid:55)→ (cid:89) A (cid:54)∈ π t A for π ∈ L ( P ) . The arrangement of subspaces described in the statement of Theorem 7.4 is the intersectionof the variety of M ∗ with a subspace P n − deﬁned by t A = (cid:80) i ∈ A θ i . By substituting thisinto (20) we see that the blow-up along that subspace arrangement is deﬁned by the map(21) p π (cid:55)→ (cid:89) A (cid:54)∈ π (cid:0)(cid:88) i ∈ A θ i (cid:1) = const · (cid:89) A ∈ π (cid:80) i ∈ A θ i for π ∈ L ( P ) . This is precisely the deﬁning parametrization (16) of the Plackett-Luce model PL P . (cid:3) Example . Let n = 4 and P as in Example 7.3. Then the above Stanley-Reisner ideal is M = (cid:104) t t , t t , t t , t t , t t , t t , t t , t t , t t (cid:105) . Its Alexander dual reveals the combinatorial pattern of the map in Example 7.3: M ∗ = (cid:104) t t t t , t t t t , t t t t , t t t t , t t t t , t t t t (cid:105) . The model PL P is the blow-up of P at nine lines, one for each of the generators of M . (cid:3) Each of our unconstrained ranking models was considered as a subvariety of the complexprojective space P n ! − . If K is any k -element subset of [ n ] then we obtain a natural rationalmap P n ! − (cid:57)(cid:57)(cid:75) P k ! − which records the probabilities for each of the k ! orderings of K only.Statistically, this map corresponds to marginalization for the induced orderings on K . Wecan now take the direct product of all of these maps, where K runs over all (cid:0) nk (cid:1) subsets ofcardinality k in [ n ] . The resulting rational map into a product of projective spaces,(22) P n ! − (cid:57)(cid:57)(cid:75) ( P k ! − )( nk ) , is called the complete marginalization map of order k . For example, if n = 3 and k = 2 then we are mapping into a product of three projective lines, with coordinates ( q : q ) , ( q : q ) and ( q : q ) respectively. Here, the complete marginalization is the rationalmap P (cid:57)(cid:57)(cid:75) P × P × P which is given in coordinates as follows: ( q : q ) = ( p + p + p : p + p + p ) , ( q : q ) = ( p + p + p : p + p + p ) , ( q : q ) = ( p + p + p : p + p + p ) . We shall refer to the complete marginalization of order as the pairwise marginalization . Example . The pairwise marginalization of the Plackett-Luce surface PL ⊂ P is thesurface in P × P × P that is deﬁned by the binomial equation q q q = q q q .The composition of the map in Example 7.1 with the map in (22) is a toric rational map P (cid:57)(cid:57)(cid:75) P × P × P that blows up the three coordinate points (1:0:0) , (0:1:0) and (0:0:1) . (cid:3) It is worthwhile, both algebraically and statistically, to study the various marginal-izations of the Csiszár model, ascending model, the inversion model and the Plackett-Luce model. Of particular interest is the pairwise marginalization of the Plackett-Lucemodel. This is known in the literature as the

Bradley-Terry model [21]. All of thesemarginalized models make sense relative to a ﬁxed constraint poset P . Here, we regardeach k -set K as subposet of P and we write the corresponding marginalization map as(23) P |L ( P ) |− (cid:57)(cid:57)(cid:75) P |L ( K ) |− . The complete k -th marginalization is the image of the direct product of these maps, as K runs over all k -sets. For convenience, we shall here remove those k -sets K that are totallyordered in P because the corresponding maps in (23) are constant when |L ( K ) | = 1 .We conclude this article with the following algebraic characterization of the Bradley-Terry model. We write P c for the bidirected graph on [ n ] where ( i, j ) is a directed edge if i and j are incomparable in P . Each circuit i , i , . . . , i r , i in P c is encoded as a binomial:(24) q i i q i i · · · q i r − i r q i r i − q i i q i i · · · q i r i r − q i i r . These binomials deﬁne hypersurfaces in P ( n ) . For instance, the model in Example 7.6 isthe toric hypersurface in P × P × P thus associated to a -cycle.The theorem below refers to unimodular Lawrence ideals . This class of toric ideals wasintroduced and studied by Bayer et al. in [3]. The associated toric varieties live naturallyin a product of projective lines P × · · · × P . The case of interest here is that of unimodularLawrence ideals arising from graphs. For these ideals and their syzygies we refer to [3, §5]. ommutative Algebra of Statistical Ranking 23 Theorem 7.7.

The Bradley-Terry model with constraints P is toric. It is deﬁned by theunimodular Lawrence ideal whose generators are the circuits (24) in the bidirected graph P c . From this result we can now determine the commutative algebra invariants of theBradley-Terry model, such as its Hilbert series in the Z n -grading and its multidegree. Proof.

Following [21], the parametrization of the Bradley-Terry model can be written as(25) q ij (cid:55)→ θ j θ i + θ j for i, j incomparable in P .Let ρ { i,j } be new unknowns indexed by unordered pairs { i, j } ⊂ [ n ] . The unimodularLawrence ideal associated with the bidirected graph P c is the kernel of the monomial map(26) q ij (cid:55)→ ρ { i,j } · θ j for i, j incomparable in P .The specialization ρ { i,j } = ( θ i + θ j ) − shows that the ideal I BT P of the Bradley-Terry modelis contained the unimodular Lawrence ideal generated by the circuits (24). In addition, theideal I BT P contains the linear polynomials q ij + q ji − . These represent the fact that, inany compatible ranking π , either item i ranks before item j or vice versa, but not both.Let J be the ideal generated by the circuits (24) and these linear polynomials. Wehave seen that J ⊆ I BT P , and we are claiming that equality holds. But this follows byobserving that both ideals are prime, and their varieties have the same dimension, namely n − . Indeed, I BT P is prime by deﬁnition, and J is prime because adding the linear forms q ij + q ji − to the unimodular Lawrence ideal simply amounts to dehomogenizing from P to A in each factor. Geometrically, this operation preserves the dimension of the variety. (cid:3) Acknowledgments

We very grateful to Winfried Bruns and Raymond Hemmecke for their substantial helpwith the computational results in Theorem 6.1. Using the developers’ versions of

Normaliz [5] and [16] respectively, they succeeded in computing the Hilbert series of the in-version model for n = 6 and in ﬁnding the cubic Markov basis element (15). We alsothank Eyke Hüllermeier and Seth Sullivant for helpful conversations and the referees formany suggestions that helped us to improve the paper. Bernd Sturmfels was partiallysupported by the U.S. National Science Foundation (DMS-0757207 and DMS-0968882).Volkmar Welker was partially supported by MSRI Berkeley. References [1] A. Barvinok:

A Course in Convexity , Graduate Studies in Mathematics, , AMS, Providence, 2002.[2] A. Boussicault, V. Feray, A. Lascoux and V. Reiner: Linear extension sums as valuations of cones, arXiv:1008.3278 .[3] D. Bayer, S. Popescu and B. Sturmfels: Syzygies of unimodular Lawrence ideals, J. ReineAngew. Math. (2001) 169–186.[4] N. Beerenwinkel, N. Eriksson and B. Sturmfels: Evolution on distributive lattices,

Journal of Theo-retical Biology (2006) 409–420.[5] W. Bruns, B. Ichim and C. Söger:

Normaliz – software for aﬃne monoids, vector conﬁgurations, lat-tice polytopes, and rational cones, , 2010. [6] E.R. Canﬁeld and B.D. McKay: The asymptotic volume of the Birkhoﬀ polytope,

J. Analytic Comb. (2009) article Experiment. Math. (2000) 91–99.[8] W. Cheng, K. Dembczynski and E. Hüllermeier: Label ranking based on the Placket-Luce model, Proc. ICML-2010, International Conference on Machine Learning , Haifa, Israel, June 2010.[9] V. Csiszár: Markov bases of conditional independence models for permutations,

Kybernetica (2009) 249-260.[10] V. Csiszár: On L-decomposability of random permutations, J. Math. Psychology (2009) 294-297.[11] P. Diaconis and N. Eriksson: Markov bases for noncommutative Fourier analysis of ranked data, J. of Symbolic Computation (2006) 182–195.[12] P. Diaconis and B. Sturmfels: Algebraic algorithms for sampling from conditional distributions, Ann.Stat. (1998) 363-397.[13] M. Drton, B. Sturmfels and S. Sullivant: Lectures on Algebraic Statistics , Oberwolfach Seminars, Vol39, Birkhäuser, Basel, 2009.[14] S.E. Fienberg, S. Petrović and A. Rinaldo: Algebraic statistics for a directed random graph modelwith reciprocation,

Algebraic Methods in Statistics and Probability II , pp. 261–283, ContemporaryMath. , Amer. Math. Soc., Providence, 2010.[15] S. Fiorini: { , } -cuts and the linear ordering problem: surfaces that deﬁne facets, SIAM J. DiscreteMath. (2006), 893–912.[16] 4ti2 team: – A software package for algebraic, geometric and combinatorial problems in linearspaces, available at .[17] D. Geiger, C. Meek and B. Sturmfels: On the toric algebra of graphical models, Ann. Statist. (2006), 1463-1492.[18] M. Grötschel, M. Jünger and G. Reinelt: Facets of the linear ordering polytope, Math. Program. (1985), 43–60.[19] M. Hochster: Rings of invariants, Cohen-Macaulay rings generated by monomials, and polytopes, Ann. Math. (1972) 318–338.[20] G. Hommel, F. Bretz and W. Maurer: Powerful short-cuts for multiple testing procedures with specialreference to gatekeeping strategies, Statist. Med. (2007) 4063-4073.[21] D.R. Hunter: MM algorithms for generalized Bradley-Terry models, Ann. Stat. (2004) 384-406.[22] A. Katsabekis and A. Thoma: Parametrizations of toric varieties over any ﬁeld, J. Algebra (2007) 751–763.[23] L. Katthän: Decomposing sets of inversions, arXiv:1111.3419 .[24] R.D. Luce:

Individual Choice Behavior , Wiley, New York, 1959.[25] J.I. Marden:

Analyzing and Modeling Rank Data , Monographs on Statistics and Applied Probability, , Chapman & Hall, London, 1995[26] H. Ohsugi and T. Hibi: Normal polytopes arising from ﬁnite graphs, J. Algebra (1998) 409–426.[27] H. Ohsugi and T. Hibi: Toric ideals generated by quadratic binomials,

J. Algebra (1999) 509–527.[28] L. Pachter and B. Sturmfels:

Algebraic Statistics for Computational Biology , Cambridge UniversityPress, Cambridge, 2005.[29] R.L. Plackett: Random permutations.

J. R. Stat. Soc., Ser. B (1968) 517–534.[30] V. Reiner, F. Saliola and V. Welker: Spectra of symmetrized shuﬄing operators, arXiv:1102.2460 .[31] B. Sturmfels: Gröbner bases and convex polytopes, Univ. Lect. Ser. , AMS, Providence, 1996.[32] S. Sullivant: Toric ﬁber products, J. Algebra (2007) 560–577.[33] R.H. Villarreal:

Monomial Algebras , Pure and Appl. Math. , Marcel Dekker, New York, 2001.[34] D. Zeilberger: Proof of a conjecture of Chan, Robbins, and Yuen. In: Orthogonal polynomials: nu-merical and symbolic algorithms (Leganés, 1998),

Electron. Trans. Numer. Anal. (1999). ommutative Algebra of Statistical Ranking 25 Department of Mathematics, University of California, Berkeley, CA 94720, USA

E-mail address : [email protected] Fachbereich Mathematik und Informatik, Philipps-Universität, 35032 Marburg, Germany

E-mail address ::

Related Researches

On weakly 1 -absorbing prime ideals of commutative rings

by M. J. Nikmehr

Hilbert polynomial of length functions

by Antongiulio Fornasiero

Hilbert coefficients and Buchsbaumness of the associated graded ring of filtration

by Kumari Saloni

On Graded ? -Prime Submodules

by Azzh Saad Alshehry

On Graded classical 2-absorbing second submodules of graded modules over graded commutative rings

by Khaldoun Al-Zoubi

A Structural Invariant On Certain Two-Dimensional Noetherian Partially Ordered Sets

by Cory Colbert

Rees algebra and special fiber ring of binomial edge ideals of closed graphs

by Arvind Kumar

Levelness versus almost Gorensteinness of edge rings of complete multipartite graphs

by Akihiro Higashitani

The derived sequence of a pre-Jaffard family

by Dario Spirito

The Second Vanishing Theorem for Local Cohomology Modules

by Wenliang Zhang

Division properties in exterior algebras of free modules

by Bronis?aw Jakubczyk

1-absorbing primary submodules

by Ece Yetkin Celikel

Multivariate generalized splines and syzygies on graphs

by Selma Altınok

Hilbert series of generic ideals in products of projective spaces

by Ralf Fröberg

Gorenstein Binomial Edge Ideals

by René González-Martínez

Quasi J-submodules

by Ece Yetkin Celikel

Quasi J-ideals of Commutative Rings

by Hani A. Khashan

On the depth and reflexivity of tensor products

by Olgur Celikbas

Computing real radicals by moment optimization

by Lorenzo Baldi

Graded local cohomology modules with respect to the linked ideals

by Maryam Jahangiri

Properties of the Toric Rings of a Chordal Bipartite Family of Graphs

by Laura Ballard

Poincaré series of multiplier and test ideals

by Josep ?lvarez Montaner

On the support of relative D -modules

by Robin van der Veer

Multiplicities and Mixed Multiplicities of arbitrary Filtrations

by Steven Dale Cutkosky

Weakly J-ideals of Commutative Rings

by Hani A. Khashan

«

1

2

3

4

»

Submitted on 8 Jan 2011 (v1), last revised 5 Dec 2011 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar