Maximum lilkelihood estimation in the β -model
aa r X i v : . [ s t a t . O T ] J un The Annals of Statistics (cid:13)
Institute of Mathematical Statistics, 2013
MAXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL By Alessandro Rinaldo, Sonja Petrovi´c andStephen E. Fienberg
Carnegie Mellon University, Pennsylvania State University andCarnegie Mellon University
We study maximum likelihood estimation for the statistical modelfor undirected random graphs, known as the β -model, in which thedegree sequences are minimal sufficient statistics. We derive necessaryand sufficient conditions, based on the polytope of degree sequences,for the existence of the maximum likelihood estimator (MLE) of themodel parameters. We characterize in a combinatorial fashion sam-ple points leading to a nonexistent MLE, and nonestimability of theprobability parameters under a nonexistent MLE. We formulate con-ditions that guarantee that the MLE exists with probability tendingto one as the number of nodes increases.
1. Introduction.
Many statistical models for the representation and anal-ysis of network data rely on information contained in the degree sequence , thevector of node degrees of the observed graph. Node degrees not only quan-tify the overall connectivity of the network, but also reveal other potentiallymore refined features of interest. The study of the degree sequences and, inparticular, of the degree distributions of real networks is a classic topic innetwork analysis, which has received extensive treatment in the statisticalliterature [see, e.g., Holland and Leinhardt (1981), Fienberg and Wasser-man (1981a), Fienberg, Meyer and Wasserman (1985)], the physics litera-ture [see, e.g., Newman, Strogatz and Watts (2001), Albert and Barab´asi(2002), Newman (2003), Park and Newman (2004), Newman, Barab´asi and
Received September 2011; revised December 2012. Supported in part by Grant FA9550-12-1-0392 from the U.S. Air Force Office of Scien-tific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA),NSF Grant DMS-06-31589, and by a grant from the Singapore National Research Foun-dation (NRF) under the Interactive & Digital Media Programme Office to the LivingAnalytics Research Centre (LARC).
AMS 2000 subject classifications.
Key words and phrases. β -model, polytope of degree sequences, random graphs, maxi-mum likelihood estimator. This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Statistics ,2013, Vol. 41, No. 3, 1085–1110. This reprint differs from the original inpagination and typographic detail. 1
A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG
Watts (2006), Foster et al. (2007), Willinger, Alderson and Doyle (2009)]as well as in the social network literature [see, e.g., Robins et al. (2007),Goodreau (2007), Handcock and Morris (2007) and references therein]. Seealso the monograph by Goldenberg et al. (2010) and the books by Kolaczyk(2009), Cohen and Havlin (2010) and Newman (2010).The simplest instance of a statistical network model based exclusively onthe node degrees is the exponential family of probability distributions forundirected random graphs with the degree sequence as its natural sufficientstatistic. This is in fact a simpler, undirected version of the broader class ofstatistical models for directed networks known as the p -models, introducedby Holland and Leinhardt (1981). We will refer to this model as the beta model (henceforth the β -model), a name recently coined by Chatterjee, Di-aconis and Sly (2011), and refer to Blitzstein and Diaconis (2010) for detailsand extensive references.Despite its apparent simplicity and popularity, the β -model, much likemost network models, exhibits nonstandard statistical features, since itscomplexity, measured by the dimension of the parameter space, increaseswith the size of the graph. Lauritzen (2003, 2008) characterized β -modelsas the natural models for representing exchangeable binary arrays that are weakly summarized , that is, random arrays whose distribution only dependson the row and column totals. More recently, Chatterjee, Diaconis and Sly(2011) conducted an analysis of the asymptotic properties of the β -model,including existence and consistency of the maximum likelihood estimator(MLE) as the dimension of the network increases, and provided a simplealgorithm for estimating the natural parameters. They also characterizedthe graph limits, or graphons [see Lov´asz and Szegedy (2006)], correspond-ing to a sequence of β -models with given degree sequences [for a connectionbetween the theory of graphons and exchangeable arrays see Diaconis andJanson (2008)]. Concurrently, Barvinok and Hartigan (2010) explored theasymptotic behavior of sequences of random graphs with given degree se-quences, and studied a different mode of stochastic convergence. Amongother things, they show that, as the size of the network increases and undera “tameness” condition, the number of edges of a uniform graph with givendegree sequence converges in probability to the number of edges of a randomgraph drawn from a β -model parametrized by the MLE corresponding todegree sequence. Yan and Xu (2012) and Yan, Xu and Yang (2012) derivedasymptotic conditions for uniform consistency and asymptotic normality ofthe MLE of the β -model, and asymptotic normality of the likelihood ra-tio test for homogeneity of the model parameters. Perry and Wolfe (2012)consider a general class of models for network data parametrized by node-specific parameters, of which the β -model is a special case. The authorsderive nonasymptotic conditions under which the MLEs of model parame-ters exist and can be well approximated by simple estimators. AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL In an attempt to avoid the reliance on asymptotic methods, whose ap-plicability to network models remains largely unclear [see, e.g., Haberman(1981)], several researchers have turned to exact inference for the β -model,which hinges upon the nontrivial task of sampling from the set of graphswith a given degree sequence. Blitzstein and Diaconis (2010) developed andanalyzed a sequential importance sampling algorithm for generating a ran-dom graph with the prescribed degree sequence [see also Viger and Latapy(2005) for a different algorithm]. Hara and Takemura (2010) and Ogawa,Hara and Takemura (2013) tackled the same task using more abstract alge-braic methods, and Petrovi´c, Rinaldo and Fienberg (2010) studied Markovbases for the more general p model.In this article we study the existence of the MLE for the parameters ofthe β -model under a more general sampling scheme in which each edge isobserved a fixed number of times (instead of just once, as in previous works)and for increasing network sizes. We view the issue of existence of the MLE asa natural measure of the intrinsic statistical difficulty of the β -model for tworeasons. First, existence of the MLE is a natural minimum requirement forfeasibility of statistical inference in discrete exponential families, such as the β -model: nonexistence of the MLE is in fact equivalent to nonestimability ofthe model parameters, as illustrated in Fienberg and Rinaldo (2012). Thus,establishing conditions for existence of the MLE amounts to specifying theconditions under which statistical inference for these models is fully possible.Second, under the asymptotic scenario of growing network sizes, existenceof the MLE will provide a natural measure of sample complexity of the β -model and will indicate the asymptotic scaling of the model parameters forwhich statistical inference is viable.Though Chatterjee, Diaconis and Sly (2011) and Barvinok and Hartigan(2010) also considered the existence of the MLE, our analysis differs sub-stantially from theirs in that it is rooted in the statistical theory of discretelinear exponential families and relies in a fundamental way on the geometricproperties of these families [see, in particular, Rinaldo, Fienberg and Zhou(2009), Geyer (2009)]. Our contributions are as follows: • We provide explicit necessary and sufficient conditions for existence of theMLE for the β -model that are based on the polytope of degree sequences,a well-studied polytope arising in the study of threshold graphs; see Ma-hadev and Peled (1995). In contrast, the conditions of Chatterjee, Diaconisand Sly (2011) are only sufficient. We then show that nonexistence of the In the analysis of Barvinok and Hartigan (2010), the maximum entropy matrix associ-ated to a degree sequence is in fact exactly the MLE corresponding to the observed degreesequence. This is a well-known property of linear exponential families; see, for example,Cover and Thomas (1991), Chapter 11.
A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG
MLE is brought on by certain forbidden patterns of extremal network con-figurations, which we characterize in a combinatorial way. Furthermore,when the MLE does not exist, we can identify exactly which probabilityparameters are estimable. • We use the properties of the polytope of degree sequences to formulategeometric conditions that allow us to derive finite sample bounds on theprobability that the MLE does not exist. Our asymptotic results improveanalogous results of Chatterjee, Diaconis and Sly (2011) and our proof isboth simpler and more direct. Furthermore, we show that the tamenesscondition of Barvinok and Hartigan (2010) is stronger than our conditionsfor existence of the MLE. • Our analysis is not specific to the β -model but, in fact, follows a principledway for detecting nonexistence of the MLE and identifying nonestimableparameters that is based on polyhedral geometry and applies more gener-ally to discrete models. We illustrate this point by analyzing other networkmodels that are variations or generalizations of the β -model: the β -modelwith random numbers of edges, the Rasch model, the Bradley–Terry modeland the p model. Due to space limitations, the details of these additionalanalyses are contained in the supplementary material [Rinaldo, Petrovi´cand Fienberg (2013)].While this is a self-contained article, the results derived here are bestunderstood as applications of the geometric and combinatorial properties oflog-linear models under product-multinomial sampling schemes, as detailedin Fienberg and Rinaldo (2012) and its supplementary material, to whichwe refer the reader for further details as well as for practical algorithms.The article is organized in the following way. Section 2 introduces the β -model and establishes the exponential family parametrization that is keyto our analysis. In Section 3 we derive necessary and sufficient conditions forexistence of the MLE of the β -model parameters and characterize parameterestimability under a nonexistent MLE. These results are further discussedwith examples in Section 4. In Section 5 we provide sufficient conditions onthe expected degree sequence guaranteeing that, with high probability asthe network size increases, the MLE exists. Finally, in Section 6 we indicatepossible extensions of our work and briefly discuss some of the computationalissues directly related to detecting nonexistence of the MLE and parameterestimability.We will assume throughout some familiarity with basic concepts frompolyhedral geometry [see, e.g., Schrijver (1986)] and the theory of exponen-tial families; see, for example, Barndorff-Nielsen (1978), Brown (1986).
2. The (generalized) β -model. In this section we describe the exponen-tial family parametrization of a simple generalization of the β -model, which,with slight abuse of notation, we will refer to as the β -model as well. AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL We are concerned with modeling the occurrence of edges in a simple undi-rected random graph with node set { , . . . , n } . The statistical experimentconsists of recording, for each pair of nodes ( i, j ) with i < j , the number ofedges appearing in N i,j i.i.d. samples, where the integers { N i,j , i < j } aredeterministic and positive (we can relax both the nonrandomness and pos-itivity assumptions). Thus, in our setting we allow for the possibility thateach edge in the network be sampled a different number of times, a realisticfeature that makes the model more flexible. For i < j , we denote by x i,j ,the number of times we observe the edge ( i, j ) and, accordingly, by x j,i thenumber of times edge ( i, j ) is missing. Thus, for all ( i, j ), x i,j + x j,i = N i,j . We model the observed edge counts { x i,j , i < j } as draws from mutuallyindependent binomial distributions, with x i,j ∼ Bin( N i,j , p i,j ), where p i,j ∈ (0 ,
1) for each i < j .Data arising from such an experiment has a representation in the form ofa n × n contingency table with empty diagonal cells and whose ( i, j )th cellcontains the count x i,j , i = j . For modeling purposes, however, we need onlyconsider the upper-triangular part of this table. Indeed, since, given x i,j , thevalue of x j,i is determined by N i,j − x i,j , we can represent the sample spacemore parsimoniously as the following subset of N ( n ): S n := { x i,j : i < j and x i,j ∈ { , , . . . , N i,j }} . We index the coordinates { ( i, j ) : i < j } of any point in S n lexicographically.In the β -model, we parametrize the (cid:0) n (cid:1) edge probabilities by points β ∈ R n as follows. For each β ∈ R n , the probability parameters are uniquelydetermined as p i,j = e β i + β j e β i + β j and p j,i = 1 − p i,j = 11 + e β i + β j ∀ i = j (1)or, equivalently, in terms of log-odds,log p i,j − p i,j = β i + β j ∀ i = j. (2)The magnitude and sign of β i quantifies the propensity of node i to have ties:the degree of node i is expected to be large (small) if β i is positive (negative)and of large magnitude. Thus the β -model is the natural heterogenous ver-sion of the well-known Erd˝os–R´enyi random graph model [Erd˝os and R´enyi(1959)]. For a discussion of this model and its generalizations see Goldenberget al. (2010).For a given choice of β , the probability of observing the vector of edgecounts x ∈ S n is p β ( x ) = Y i When N i,j = 1 for all ( i, j ),the support S n reduces to the set G n := { , } ( n ), which encodes all undi-rected simple graphs on n nodes: for any x ∈ G n , the corresponding graphhas an edge between nodes i and j , with i < j , if and only if x i,j = 1. In thiscase, the β -model yields a class of distributions for random undirected sim-ple graphs on n nodes, where the edges are mutually independent Bernoullirandom variables with probabilities of success { p i,j , i < j } satisfying (1).Then, by (5), the i th minimal sufficient statistic d i is the degree of node i ,that is, the number of nodes adjacent to i , and the vector d ( x ) of sufficientstatistics is the degree sequence of the observed graph x . This is the versionof the β -model studied by Chatterjee, Diaconis and Sly (2011). 3. Existence of the MLE for the β -model. We now derive a necessaryand sufficient condition for the existence of the MLE of the natural param-eter β or, equivalently, of the probability parameters { p i,j , i < j } as definedin (1). For a given x ∈ S n , we say that the MLE does not exist when n β ∗ : p β ∗ ( x ) = sup β ∈ R n p β ( x ) o = ∅ , where p β ( x ) is given in (4). For the natural parameters, nonexistence ofthe MLE implies that we cannot attain the supremum of the likelihoodfunction (4) by any finite vector in R n . For the probability parameters,nonexistence signifies that the supremum of (3) cannot be attained by anyset of probability values bounded away from 0 and 1, and satisfying theequations from (1). Either way, nonexistence of the MLE implies that onlya random subset of the model parameters is estimable; see Fienberg andRinaldo (2012). AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL Our analysis on the existence of the MLE and parameter estimability forthe β -model is based on a geometric object that plays a key role throughoutthe rest of the paper: the polytope of degree sequences . To this end, we notethat, for each x ∈ S n , we can obtain the vector of sufficient statistics d ( x )for the β -model as d ( x ) = A x, where A is the n × (cid:0) n (cid:1) design matrix equal to the node-edge incidence ma-trix of a complete graph on n nodes. Specifically, we index the rows of Aby the node labels i ∈ { , . . . , n } , and the columns by the set of all pairs( i, j ) with i < j , ordered lexicographically. The entries of A are ones alongthe coordinates ( i, ( i, j )) and ( j, ( i, j )) for i < j , and zeros otherwise. Forinstance, when n = 4 A = , where we index the columns lexicographically by the pairs (1 , , , , , 4) and (3 , x ∈ G n ,A x is the associated degree sequence. The polytope of degree sequences P n is the convex hull of all possible degree sequences, that is, P n := convhull( { A x, x ∈ G n } ) . The integral polytope P n is a well-studied object in graph theory; for exam-ple, see Chapter 3 in Mahadev and Peled (1995). In particular, when n = 2, P n is just a line segment in R connecting the points (0 , 0) and (1 , n ≥ 3, dim( P n ) = n .We now fully characterize the existence of the MLE for the β -model usingthe polytope of degree sequences in the following fashion. For any x ∈ S n ,let ˜ p i,j := x i,j N i,j , i < j, and set ˜ d = ˜ d ( x ) ∈ R n to be the vector with coordinates˜ d i := X ji ˜ p i,j , i = 1 , . . . , n, (6)a rescaled version of the sufficient statistics (5), normalized by the numberof observations. In particular, for the random graph model, ˜ d = d . Theorem 3.1. Let x ∈ S n be the observed vector of edge counts. TheMLE exists if and only if ˜ d ( x ) ∈ int( P n ) . Theorem 3.1 verifies the conjecture contained in Addendum A in Chatter-jee, Diaconis and Sly (2011) for the random graph model: the MLE exists if A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG and only if the degree sequence belongs to the interior of P n . This result fol-lows from the standard properties of exponential families; see Theorem 9.13in Barndorff-Nielsen (1978) or Theorem 5.5 in Brown (1986). It also con-firms the observation made by Chatterjee, Diaconis and Sly (2011) that theMLE never exists if n = 3: indeed, since P has exactly 8 vertices, as manyas the possible graphs on 3 nodes, no degree sequence can be inside P .We conclude by taking note that, by representing the sufficient statisticsas a linear mapping d = A x , we can recast the β -model as a log-linear modelwith design matrix A ⊤ and product-multinomial scheme, with (cid:0) n (cid:1) samplingconstraints, one for each edge. This simple yet far reaching observation allowsus, among the other things, to design algorithms for detecting nonexistenceof the MLE and identifying estimable parameters under a nonexistent MLE,as explained in the supplementary material to this article.3.1. Parameter estimability under a nonexistent MLE. The geometricnature of Theorem 3.1 has important consequences. First, it allows us toidentify the patterns of observed edge counts that cause nonexistence of theMLE; that is, the sample points for which the MLE is undefined. Second, ityields a complete description of estimability of the edge probability param-eters under a nonexistent MLE, a key issue for correct evaluation of degreesof freedom of the model. The next result addresses the last two points. Lemma 3.2. A point y belongs to the interior of some face F of P n ifand only if there exists a set F ⊂ { ( i, j ) , i < j } such that y = A p, (7) where p = { p i,j : i < j, p i,j ∈ [0 , } ∈ R ( n ) is such that p i,j ∈ { , } if ( i, j ) / ∈ F and p i,j ∈ (0 , if ( i, j ) ∈ F . The set F is uniquely determined by the face F and is the maximal set for which (7) holds. Following Geiger, Meek and Sturmfels (2006) and Fienberg and Rinaldo(2012), we refer to any such set F a facial set of P n and its complement, F c = { ( i, j ) : i < j } \ F , a co-facial set . Facial sets form a lattice that isisomorphic to the face lattice of P n [Fienberg and Rinaldo (2012), Lemma 5].Thus the faces of P n are in one-to-one correspondence with the facial setsof P n and, for any pair of faces F and F ′ of P n with associated facial sets F and F ′ , F ∩ F ′ = ∅ if and only if F ∩ F ′ = ∅ and F ⊂ F ′ if and only if F ⊂ F ′ . In details, for a point x ∈ S n , d ( x ) = A x belongs to the interior of aface F of P n if and only if there exists a nonnegative p such that d ( x ) = A p ,where F = { ( i, j ) : 0 < p i,j < } is the facial set corresponding to F . By thesame token, y ∈ int( P n ) if and only if y = A p for a vector p with coordinatesstrictly between 0 and 1. AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL Facial sets have statistical relevance for two reasons. First, nonexistenceof the MLE can be described combinatorially in terms of co-facial sets, thatis, patterns of edge counts that are either 0 or N i,j . In particular, the MLEdoes not exist if and only if the set { ( i, j ) : i < j, x i,j = 0 or N i,j } contains aco-facial set. Second, apart from exhausting all possible patterns of forbiddenentries in the table leading to a nonexistent MLE, facial sets specify whichprobability parameters are estimable. In fact, inspection of the likelihoodfunction (3) reveals that, for any observable set of counts { x i,j : i < j } , therealways exists a unique maximizer b p = { b p i,j , i < j } which, by strict concavity,is uniquely determined by the first order optimality conditions˜ d ( x ) = A b p, also known as the moment equations. Existence of the MLE is then equiv-alent to 0 < b p i,j < i < j . When the MLE does not exist, that is,when ˜ d is on the boundary of P n , the moment equations still hold, but theentries of the optimizer { b p i,j , i < j } , known as the extended MLE , are nolonger strictly between 0 and 1. Instead, by Lemma 3.2, the extended MLEis such that b p i,j = ˜ p i,j ∈ { , } for all ( i, j ) ∈ F c . Furthermore, it is possible toshow [see, e.g., Morton (2013)] that b p i,j ∈ (0 , 1) for all ( i, j ) ∈ F . Therefore,when the MLE does not exist, only the probabilities { p i,j , ( i, j ) ∈ F } areestimable by the extended MLE. We refer the reader to Barndorff-Nielsen(1978), Brown (1986), Fienberg and Rinaldo (2012) and references therein,for details about the theory of extended exponential families and extendedmaximum likelihood estimation in log-linear models.To summarize, while co-facial sets encode the patterns of table entriesleading to a nonexistent MLE, facial sets indicate which probability param-eters are estimable. A similar, though more involved interpretation holds forthe estimability of the natural parameters, for which the reader is referredto Fienberg and Rinaldo (2012). Further, for a given sample point x , therealized facial set and its cardinality are both random, as they depend onthe actual value of the observed sufficient statistics A x . This implies that,with a nonexistent MLE, the set of estimable parameters is itself random. 4. The boundary of P n . Theorem 3.1 and Lemma 3.2 show that theboundary of the polytope P n plays a fundamental role in determining theexistence of the MLE for the β -model and in specifying which parameters areestimable. In particular, the larger the number of faces (i.e., facial sets) of P n the higher the complexity of the β -model as measured by the numbers ofpossible patterns of edge counts for which the MLE does not exist. Therefore,gaining an even basic understanding of the number and of the types ofco-facial patterns will provide valuable insights into the behavior of the β -model. Below we further elaborate on the consequences of the results A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG established in Section 3 and present a small selection of examples of co-facial sets associated to the facets of P n .Though the discussion and examples of this section will reveal a num-ber of subtle issues, we believe that the key message is two-fold. First, thecombinatorial complexity of P n , measured by both the number of the typesof co-facial sets, grows very fast with n , with the co-facial sets associatedto node degrees bounded away from 0 and n − P n is impractical, it is important to devisealgorithms for detecting a nonexistent MLE and identifying the facial setsof estimable parameters. Both these issues become more severe in large andsparse networks, where it is expected that the exploding number of possiblenontrivial co-facial set renders estimation of the model parameters more dif-ficult. Later in Section 5, we will derive conditions, based on the geometryof P n that prevents this from happening, with large probability for large n .4.1. The combinatorial complexity of P n . Mahadev and Peled (1995) de-scribe the facet-defining inequalities of P n , for all n ≥ n ≤ P be theset of all pairs ( S, T ) of disjoint nonempty subsets of { , . . . , n } , such that | S ∪ T | ∈ { , . . . , n − , n } . For any ( S, T ) ∈ P and y ∈ P n , let g ( S, T, y, n ) := | S | ( n − − | T | ) − X i ∈ S y i + X i ∈ T y i . (8) Theorem 4.1 [Theorem 3.3.17 in Mahadev and Peled (1995)]. Let n ≥ and y ∈ P n . The facet-defining inequalities of P n are: (i) y i ≥ , for i = 1 , . . . , n ; (ii) y i ≤ n − , for i = 1 , . . . , n ; (iii) g ( S, T, y, n ) ≥ , for all ( S, T ) ∈ P . Even with the exhaustive characterization of P n provided by Theorem 4.1,understanding the combinatorial complexity of P n (i.e., the collection of allits faces and their inclusion relations) is far from trivial. Stanley (1991)studied the number faces of the polytope of degree sequences P n and derivedan expression for computing the entries of the f -vector of P n . The f -vector ofan n -dimensional polytope is the vector of length n whose i th entry containsthe number of i -dimensional faces, i = 0 , . . . , n − 1. For example, the f -vectorof P is the 8-dimensional vector(334,982 , , , , , , , . Thus, P is an 8-dimensional polytope with 334,982 vertices, 1,726,648 edgesand so on, up to 3322 facets. Also, according to Stanley’s formula, the num-ber of facets of P , P , P and P are 22, 60, 224 and 882, respectively AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL [these numbers correspond to the numbers we obtained with the software polymake , using the methods described in the supplementary material tothis article; see Gawrilow and Joswig (2000)]. Stanley’s analysis showed thatthe combinatorial complexity of P n is extraordinarily large, with both thenumber of vertices, and the number of facets growing at least exponentiallyin n , and consequently, the tasks of identifying points on the boundary of P n and the associated facial set are far from trivial. For instance, comput-ing directly the number of vertices of P is prohibitively expensive, evenusing one of the best known algorithms, such as the one implemented in thesoftware minksum ; see Weibel (2010). To overcome these problems we havedevised an algorithm for detecting boundary points and the associated facialsets that can handle networks with up to hundreds of nodes. We report onthis algorithm, which is based on a log-linear model reparametrization andis equivalent to what is known in computational geometry as the “Cayleytrick,” in the supplementary material. Using the methods described there,we were able to identify a few interesting cases in which the MLE does notexist, most of which have gone unrecognized in the statistical literature. Be-low we describe some of our computations for the purpose of elucidating theresults derived in Section 3.4.2. Some examples of co-facial sets. Recall that we can represent thedata as a n × n table of counts with structural zero diagonal elements andwhere the ( i, j )th entry of the table indicates the number of times, outof N i,j , in which we observed the edges ( i, j ). In our examples, empty cellscorrespond to facial sets and may contain arbitrary count values, in contrastto the cells in the co-facial sets that contain either a zero value or a maximalvalue, namely N i,j . Lemma 3.2 implies that extreme count values of thisnature are precisely what leads to the nonexistence of the MLE. The patternshown on the left of Table 1 provides an instance of a co-facial set, whichcorresponds to a facet of P . Assume for simplicity that the empty cellscontain counts bounded away from 0 and N i,j . Then the sufficient statistics˜ d are also bounded away from 0 and n − 1, and so are the row and column Table 1 Left: co-facial set leading to a nonexistent MLE. Center: an exampleof data exhibiting the pattern of counts consistent with the co-facialset on the left when N i,j = 3 for all i = j . Right: table of the extendedMLE of the estimated probabilities × N , × × N , × × × × 31 2 0 × × × × × A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG Table 2 Examples of a co-facial set leading to a nonexistent MLE.Left: ˜ d = 0 . Right: example where the degrees are all bounded awayfrom and , the MLE does not exist × N , × N , × N , × × N , × × N , N , × sums of the normalized counts { x i,j N i,j : i = j } , yet the MLE does not exist.This is further illustrated in Table 1, center, which shows an instance ofdata with N i,j = 3 for all i = j , satisfying the above pattern and, on theright, the probability values maximizing the log-likelihood function. Noticethat, because the MLE does not exist, the supremum of the log-likelihoodunder the natural parametrization is attained in the limit by any sequence ofnatural parameters { β ( k ) } of the form β ( k ) = ( − c k , − c k , c k , c k ), where c k →∞ as k → ∞ . As a result, some of these probability values are 0 and 1.The order of the pattern is crucial. In Table 2 we show, on the left, anotherexample of a co-facial set that is easy to detect, since it corresponds to avalue of 0 for the normalized sufficient statistic ˜ d . Indeed, from cases (i)and (ii) of Theorem 4.1, the MLE does not exist if ˜ d i = 0 or ˜ d i = n − i . On the right, we show a co-facial set that is instead compatiblewith normalized sufficient statistics being bounded away from 0 and n − P , including the cases already shown.In general, there are 2 n facets of P n that are determined by one ˜ d i equalto 0 or n − 1. Thus, just by inspecting the row sums or the observed sufficientstatistics, we can detect only 2 n co-facial sets associated to as many facetsof P n . Comparing this number to the entries of the f -vector calculated inStanley (1991), however, and as our computations confirm, most of the facetsof P n do not yield co-facial sets of this form. Since the number of facetsappears to grow exponentially in n , we conclude that most of the co-facialsets do not appear to arise in this fashion. Thus, at least combinatorially,patterns of data counts leading to the nonexistence of MLEs but with thenormalized degree bounded away from 0 and n − The random graph case. In the special case of N i,j = 1 for all i < j ,which is equivalent to a model for random undirected graphs, points on theboundary of P n are, by construction, degree sequences and have a directgraph-theoretical interpretation. We say that a subset of a set of nodes of agiven graph is stable if it induces a subgraph with no edges and a clique ifit induces a complete subgraph. AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL Table 3 All possible co-facial sets for P corresponding to the facets of P (empty cells indicate arbitrary entry values) × N , × × N , × × N , × N , × N , × × N , × × N , N , ×× N , × N , × N , × × N , × N , N , × × × × N , × N , N , ×× × N , N , × N , × × N , N , N , × × × × N , N , × × N , ×× N , N , × N , × × × N , × N , × N , × × N , N , × N , × ×× N , × N , × N , × × × N , N , × N , × × N , × N , × N , ×× × N , × N , N , × × × × N , N , N , × × N , × × N , ×× × N , N , × × × N , × × N , × × N , × N , × ×× × N , × N , × Lemma 4.2 [Lemma 3.3.13 in Mahadev and Peled (1995)]. Let d be adegree sequence of a graph G that lies on the boundary of P n . Then either d i = 0 , or d i = n − for some i , or there exist nonempty and disjoint subsets S and T of { , . . . , n } such that: (1) S is clique of G ; (2) T is a stable set of G ; A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG Fig. 1. Examples of random graphs on (left), (center) and (right) nodes with nodedegrees bounded away from and n − and for which the MLE is not defined. Lemma 4.2applies with S = { , } and T = { , } (left), with S = { , , } and T = { , } (center) andwith S = { , , } and T = { , , } (right). (3) every vertex in S is adjacent to every vertex in ( S ∪ T ) c in G ; (4) no vertex of T is adjacent to any vertex of ( S ∪ T ) c in G . Using Lemma 4.2, we can create virtually any example of a random graphwhose node degree sequence lies on the boundary of P n . In particular, wenote that having node degrees bounded away from 0 and n − is not asufficient condition for the existence of the MLE, although its violation im-plies nonexistence of the MLE; see the examples of Figure 1. Nonetheless,Lemma 4.2 is of little or no practical use when it comes to detecting bound-ary points and the associated co-facial sets, since checking for the existenceof a pair ( S, T ) of subsets of nodes satisfying conditions (1) through (4) isalgorithmically impractical. In the supplementary material to this article,we describe alternative procedures that can be used in large networks.Figure 1 shows three examples of graphs on 4, 5 and 6 nodes for which theMLE of the β -model is undefined even though the node degrees are boundedaway from 0 and n − n = 4, our computations show that there are 14 distinct co-facial sets associated to the facets of P n . Eight of them correspond to degreesequences containing a 0 or a 3, and the remaining six are shown in Ta-ble 4, which we computed numerically using the procedure described in thesupplementary material. Notice that the three tables on the second row areobtained from the first three tables by switching zeros with ones. Further-more, the number of the co-facial sets we found is smaller than the numberof facets of P n , which is 22, as shown in Table 3. This is a consequence of thefact that the only observed counts in the random graph model are 0’s or 1’s:it is in fact easy to see in Table 3 that any co-facial set containing three zerocounts and three maximal counts N i,j is equivalent, in the random graphcase, to a node having degree zero or 3. However, as soon as N i,j ≥ 2, thenumber of possible co-facial sets matches the number of faces of P n . There-fore, the condition N i,j = 1 is not inconsequential, as it appears to reduce AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL Table 4 Patterns of zeros and ones yielding random graphs with nonexistent MLE(empty cells indicate that the entry could be a or a ) × × × × × × × × × × × ×× × × × × × × × × × × × the numbers of observable patterns leading to a nonexistent MLE, thoughwe do not know the extent of the impact of such reduction in general. 5. Existence of the MLE: Finite sample bounds. In this section we ex-ploit the geometry of the boundary of P n from Lemma 4.2 to derive sufficientconditions that imply the existence of the MLE with large probability as thesize of the network n grows. These conditions essentially guarantee that theprobability of observing any of the super-exponentially many (in n ) co-facialsets of P n is polynomially small in n . Unlike in previous analyses, our resultdoes not require the network to be dense.We make the simplifying assumption that N i,j = N , for all i and j , where N = N ( n ) ≥ n . Recall the random vector ˜ d , whosecoordinates are given in (6) and let d = E [ ˜ d ] ∈ R n be its expected value underthe β -model. Then d i = X ji p i,j , i = 1 , . . . , n. We formulate sufficient conditions for the existence of the MLE in terms ofthe entries of the vector d . Theorem 5.1. Assume that, for all n ≥ max { , q c n log nN + 1 } , the vec-tor d satisfies the conditions: (i) min i min { d i , n − − d i } ≥ q c n log nN + C , (ii) min ( S,T ) ∈P g ( S, T, d, n ) > | S ∪ T | q c n log nN + C ,where c > / and C ∈ (0 , n − − q c n log nN ) . Then, with probability at least − n c − , the MLE exists. A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG When N is constant, for example, when N = 1 as in the random graphcase, we can relax the conditions of Theorem 5.1 by requiring condition (ii)to hold only over subsets S and T of cardinality of order Ω( √ n log n ). Whilewe present this result in greater generality by assuming only that n ≥ N , wedo not expect it to be sharp in general when N grows with n . Corollary 5.2. Let n ≥ max { N, , √ cn log n + 1 } , c > and C ∈ (0 , n − − √ cn log n ) . Assume the vector d satisfies the conditions: (i ′ ) min i min { d i , n − − d i } ≥ √ cn log n + C , (ii ′ ) min ( S,T ) ∈P n g ( S, T, d, n ) > | S ∪ T |√ cn log n + C ,where P n := { ( S, T ) ∈ P : min {| S | , | T |} > p cn log n + C } , where the set P was defined before Theorem 4.1. Then the MLE exists withprobability at least − n c − . If N = 1 , it is sufficient to have c > / , andthe MLE exists with probability larger than − n c − .Discussion and comparison with previous work. Since | S ∪ T | ≤ n , onecould replace assumption (ii) of Theorem 5.1 with the simpler but strongercondition min ( S,T ) ∈P n g ( S, T, d, n ) > n / p c log n + C n . Then, if we assume for simplicity that N is a constant, as in Corollary 5.2,the MLE exists with probability tending to one at a rate that is polynomialin n whenever min i min { d i , n − − d i } = Ω( p n log n )and, for all pairs ( S, T ) ∈ P , g ( S, T, d, n ) > Ω( n / p log n ) . For the case N = 1, we can compare Corollary 5.2 with Theorem 3.1 inChatterjee, Diaconis and Sly (2011), which also provides sufficient conditionsfor the existence of the MLE with probability no smaller than 1 − n c − (forall n large enough). Their result appears to be stronger than ours, but thatis actually not the case as we now explain. In fact, their conditions requirethat, for some constant c , c and c in (0 , c ( n − < d i < c ( n − 1) forall i and | S | ( | S | − − X i ∈ S d i + X i/ ∈ S min { d i , | S |} > c n (9) AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL for all sets S such that | S | > ( c ) n . For any nonempty subsets S ⊂ { , . . . , n } and T ⊂ { , . . . , n } \ S , X i/ ∈ S min { d i , | S |} ≤ X i ∈ T d i + | S || ( S ∪ T ) c | , which implies that | S | ( n − − | T | ) − X i ∈ S d i + X i ∈ T d i > | S | ( | S | − − X i ∈ S d i + X i/ ∈ S min { d i | S |} , where we have used the equality n = | S | + | T | + | ( S ∪ T ) c | . Thus if (9) holdsfor some nonempty S ⊂ { , . . . , n } , it satisfies the facet conditions implied byall the pairs ( S, T ), for any nonempty set T ⊂ { , . . . , n } \ S . As a result, forany subset S , condition (9) is stronger than any of the facet conditions of P n specified by S . In addition, we weakened significantly the requirements inChatterjee, Diaconis and Sly (2011) that c ( n − < d i < c ( n − 1) for all i to min i min { d i , n − − d i } ≥ √ cn log n + C . As a direct consequence of thisweakening, we only need | S | > √ cn log n + C as opposed to | S | > ( c ) n .Overall, in our setting, the vector of expected degrees of the sequence ofnetworks is allowed to lie much closer to the boundary of P n . As we explainnext, such weakening is significant, since the setting of Chatterjee, Diaconisand Sly (2011) only allows us to estimate an increasing number of proba-bility parameters (the edge probabilities) that are uniformly bounded awayfrom 0 and 1, while our assumptions allow for these probabilities to becomedegenerate as the network size grows, and therefore hold even in nondensenetwork settings. The nondegenerate case. We now briefly discuss the case of sequences ofnetworks for which N = 1 and the edge probabilities are uniformly boundedaway from 0 and 1, that is, δ < p i,j < − δ ∀ i, j (10)for some δ ∈ (0 , 1) independent of n . In this scenario, the number of proba-bility parameters to be estimated grows with n , but their values are guar-anteed to be nondegenerate. It immediately follows from the nondegenerateassumption (10) that d ∈ int( P n ) and δ ( n − < d i < (1 − δ )( n − , i = 1 , . . . , n. (11)Then, the same arguments we used in the proof of Corollary 5.2 imply thatthe MLE exists with high probability. We provide a sketch of the proof.First, we note that, with high probability, g ( S, T, ˜ d, n ) ≥ g ( S, T, d, n ) − | S ∪ T | Ω( √ n log n ), for each pair ( S, T ) ∈ P . Furthermore, because of (11), it isenough to consider only pairs ( S, T ) of disjoint subsets of { , . . . , n } of sizesof order Ω( n ). For each such pair, the condition on d i further yields that A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG g ( S, T, d, n ) is of order Ω( n ), and, by Theorem 8 the MLE exists with highprobability.In fact, the boundedness assumption of Chatterjee, Diaconis and Sly(2011) that k β k ∞ < L with L independent of n , is equivalent to the nonde-generate assumption (10), as we see from equation (1). Unlike Chatterjee,Diaconis and Sly (2011), who focus on the nondegenerate case, our resultshold under weaker scaling, as we only require, for instance, that d i be of or-der Ω( √ n log n ) for all i . Relatedly, we note that the tameness condition ofBarvinok and Hartigan (2010) is equivalent to δ < b p i,j < − δ for all i and j and a fixed δ ∈ (0 , b p i,j is the MLE of p i,j . Therefore, the tamenesscondition is stronger than the existence of the MLE. In fact, using againTheorem 1.3 in Chatterjee, Diaconis and Sly (2011), for all n sufficientlylarge, the tameness condition is equivalent to the boundedness condition ofChatterjee, Diaconis and Sly (2011).We conclude this section with two useful remarks. First, Theorem 1.3in Chatterjee, Diaconis and Sly (2011) demonstrates that, when the MLEexists, max i | b β i − β i | = O ( p log n/n ), with probability at least 1 − n c − .Combined with our Corollary 5.2, this implies that the MLE is a consis-tent estimator under a growing network size and with edge probabilitiesapproaching the degenerate values of 0 and 1.Second, after the submission of this article we learned about the interest-ing asymptotic results of Yan and Xu (2012), Yan, Xu and Yang (2012), whoclaim that, based on a modification of the arguments of Chatterjee, Diaconisand Sly (2011), it is possible to show the MLE of the β -model exists and isuniformly consistent if L = o (log n ) and L = o (log log n ), respectively, where L = max i | β i | . 6. Discussion and extensions. We have used polyhedral geometry to an-alyze the conditions for existence of the MLE of a generalized version of the β -model and to derive finite sample bounds for the probability associatedwith the existence of the MLE. Our results offer a novel and explicit char-acterization of the patterns of edge counts leading to nonexistent MLEs.The problem of nonexistence occurs in numbers and with a complexity thatwas not previously known. Our results allow us to sharpen conditions forexistence of the MLE. Our analysis in particular highlights the fact thatrequiring node degrees equal to 0 and n − AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL Our generalization of the β -model allows for Poisson and binomial, notsimply Bernoulli distributions for edges. Email databases and others in-volving repeated transactions among pairs of parties provides the simplestexamples of situations for networks where edges can occur multiple times.These are often analyzed as weighted networks but that may not necessarilymake as much sense as using a Poisson for random numbers of occurrences.As our results indicate, the nonexistence of the MLE is equivalent tononestimability of a subset of the parameters of the model, but by no meansdoes it imply that no statistical inference can take place. In fact, whenthe MLE does not exist, there always exists a “restricted” β -model thatis specified by the appropriate facial set, and for which all parameters areestimable. Thus, for such a small model, traditional statistical tasks suchas hypothesis testing and assessment of parameter uncertainty are possible,even though it becomes necessary to adjust the number of degrees of freedomfor the nonestimable parameters. A complete description of this approach,which is rooted in the theory of extended exponential families, is beyond thescope of the article. See Fienberg and Rinaldo (2012) for details.We can extend our study of the β -model in a number of ways. In the sup-plementary material to this article, we consider various generalizations ofthe β -model setting, including the β -model with random numbers of edges,the Rasch model from item response theory, the Bradley–Terry paired com-parisons model and the p network model. For most of these models we wereable to carry out a fairly explicit analysis based on the underlying geome-try, but for the full p model the complexity of the model polytope appearsto make such a direct analysis very difficult [this is reflected in the highcomplexity of the Markov basis for p model, of which we give full accountin Petrovi´c, Rinaldo and Fienberg (2010)]. Another interesting extension ofour results of Section 5 would be to translate our conditions, which are for-mulated in terms of expected degree sequences, into conditions on the p i,j ’sthemselves, for instance, by establishing appropriate bounds for min i 7. Proofs. Proof of Theorem 3.1. Throughout the proof, we will use standardresults and terminology from the theory of exponential families, for whichstandard references are Brown (1986) and Barndorff-Nielsen (1978). Thepolytope S n := convhull( { A x, x ∈ S n } )is the convex support for the sufficient statistics of the natural exponentialfamily described in Section 2. Furthermore, by a fundamental result in thetheory of exponential families [see, e.g., Theorem 9.13 in Barndorff-Nielsen(1978)], the MLE of the natural parameter β ∈ R n [or, equivalently of theset probabilities { p i,j , i < j } ∈ R ( n ) satisfying (1)] exists if and only if d ∈ AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL int( S n ). Thus, it is sufficient to show that d ∈ int( S n ) if and only if ˜ d ∈ int( P n ).Denote with a i,j the column of A corresponding to the ordered pair ( i, j ),with i < j , and set P i,j = convhull { , a i,j } ⊂ R n . (12)Each P i,j is a line segment between its vertices 0 and a i,j . Then, P n canbe expressed as the zonotope obtained as the Minkowski sum of the linesegments P i,j , P n = X i By Proposition 2.1 in Fukuda (2004), F = F ( P n , c ) = X i Let ˜ d = ( ˜ d , . . . , ˜ d n ) be the random vectordefined in (6). We will show that, under the stated assumptions, ˜ d ∈ int( P n )with probability no smaller than 1 − n c − .Since N is constant, we conveniently re-express the random vector ˜ d asan average of independent and identically distributed graphical degree se-quences. In detail, we can write˜ d = 1 N N X k =1 d ( k ) , (15)where each d ( k ) is the degree sequence arising from of an independent realiza-tion of random graph with edge probabilities { p i,j : i < j } , for k = 1 , . . . , N .Thus, each ˜ d i is the sum of N ( n − 1) independent random variables takingvalues in { , N } . Then, an application of Hoeffding’s inequality and of theunion bound yields that the event O n := (cid:26) max i | ˜ d i − d i | ≤ r c n log nN (cid:27) (16)occurs with probability at least 1 − n c − . Throughout the rest of the proofwe assume that the event O n holds.By assumption (i), for each i ,0 < C + r c n log nN ≤ d i − r c n log nN ≤ ˜ d i ≤ d i + r c n log nN ≤ n − − C − r c n log nN < n − , AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL so that 0 < ˜ d i < n − , i = 1 , . . . , n. (17)Notice that the assumed constraint on the range of C guarantees theabove inequalities are well defined. Next, for each pair ( S, T ) ∈ P , | g ( S, T, ˜ d, n ) − g ( S, T, d, n ) | ≤ | S ∪ T | max i | ˜ d i − d i | , which yields g ( S, T, ˜ d, n ) ≥ g ( S, T, d, n ) − | S ∪ T | r c n log nN . Using assumption (ii), the previous inequality implies thatmin ( S,T ) ∈P g ( S, T, ˜ d, n ) > C > . (18)Thus, we have shown that (17) and (18) hold, provided that the event O n istrue and assuming (i) and (ii). Therefore, by Theorem 4.1 the MLE exists. (cid:3) Proof of Corollary 5.2. Using the same setting and notation ofTheorem 5.1, we will assume throughout the proof that the event O ′ n := n max k max i | d ( k ) i − d i | ≤ p cn log n o holds true. By Hoeffding’s inequality, the union bound and the inequalitylog N ≤ log n , we have P ( O ′ cn ) ≤ {− c log n + log n + log N } ≤ n c − . A simple calculation shows that, when O ′ n is satisfied, we also have n max i | ˜ d i − d i | ≤ p cn log n o . Then, by the same arguments we used in the proof of Theorem 5.1, assump-tion (i ′ ) yields that 0 < ˜ d i < n − , i = 1 , . . . , n, (19)and, for each pair ( S, T ) ∈ P , g ( S, T, ˜ d, n ) ≥ g ( S, T, d, n ) − | S ∪ T | p cn log n. (20)It is easy to see that, for the event O ′ n , assumption (i ′ ) also yieldsmin k min i min { d ( k ) i , n − − d ( k ) i } ≥ p cn log n + C. (21) A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG We now show that, when (19) and the previous equation are satisfied, theMLE exists if min ( S,T ) ∈P n g ( S, T, d, n ) > C > . (22)Indeed, suppose that (19) is true and that ˜ d belongs to the boundary of P n .Then, by the integrality of the polytope P n , there exist nonempty and dis-joint subsets T and S of { , . . . , n } satisfying the conditions of Lemma 4.2 foreach of the degree sequences d (1) , . . . , d ( k ) . If min k min i d ( k ) i > √ cn log n + C ,then, necessarily, | S | > √ cn log n + C , because | S | is the maximal degree ofevery node i ∈ T . Similarly, since each i ∈ S has degree at least | S | − | ( S ∪ T ) c | , if max k max i d ( k ) i < n − − √ cn log n − C , the inequality | S | − | ( S ∪ T ) c | < n − − p cn log n − C must hold, implying that | T | = n − | S | − | ( S ∪ T ) c | > √ cn log n + C . Thus,we have shown that if (19) and (21) hold, and ˜ d belongs to the boundaryof P n , the cardinalities of the sets S and T defining the facet of P n to which˜ d belongs cannot be smaller than √ cn log n + C . By Theorem 4.1, when (19)and (21) hold, (22) implies that ˜ d ∈ int( P n ), so the MLE exists. However,equation (20) and assumption (ii ′ ) implies (22), so the proof is complete. (cid:3) Acknowledgments. A previous version of this manuscript was completedwhile the second author was in residence at Institut Mittag-Leffler, for whosehospitality she is grateful.SUPPLEMENTARY MATERIAL Supplement to “Maximum lilkelihood estimation in the β -model” (DOI:10.1214/12-AOS1078SUPP; .pdf). In the supplementary material we extendour analysis to other models for network data: the Rasch model, the β -modelwith no sampling constraints on the number of observed edges per dyad, theBradley–Terry model and the p model of Holland and Leinhardt (1981). Wealso provide details on how to determine whether a given degree sequencebelongs to the interior of the polytope of degree sequences P n and on how tocompute the facial set corresponding to a degree sequence on the boundaryof P n . REFERENCES Albert, R. and Barab´asi, A.-L. (2002). Statistical mechanics of complex networks. Rev.Modern Phys. Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical The-ory . Wiley, Chichester. MR0489333 Barvinok, A. and Hartigan, J. A. (2010). The number of graphs and a random graphwith a given degree sequence. Available at http://arxiv.org/pdf/1003.0356v2 .AXIMUM LILKELIHOOD ESTIMATION IN THE β -MODEL Blitzstein, J. and Diaconis, P. (2010). A sequential importance sampling algorithmfor generating random graphs with prescribed degrees. Internet Math. Brown, L. (1986). Fundamentals of Statistical Exponential Families . Institute of Mathe-matical Statistics Lecture Notes—Monograph Series . IMS, Hayward, CA. Chatterjee, S. , Diaconis, P. and Sly, A. (2011). Random graphs with a given degreesequence. Ann. Appl. Probab. Cohen, R. and Havlin, S. (2010). Complex Networks: Structure, Robustness and Func-tion . Cambridge Univ. Press, Cambridge. Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory . Wiley, NewYork. MR1122806 Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7) Erd˝os, P. and R´enyi, A. (1959). On random graphs. I. Publ. Math. Debrecen Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linearmodels. Ann. Statist. Fienberg, S. E. , Meyer, M. M. and Wasserman, S. S. (1985). Statistical analysis ofmultiple sociometric relations. J. Amer. Statist. Assoc. Fienberg, S. E. and Wasserman, S. S. (1981a). Categorical data analysis of singlesociometric relations. Sociological Methodology Fienberg, S. E. and Wasserman, S. S. (1981b). An exponential family of probabilitydistributions for directed graphs: Comment. J. Amer. Statist. Assoc. Foster, J. G. , Foster, D. V. , Grassberger, P. and Paczuski, M. (2007). Link andsubgraph likelihoods in random undirected networks with fixed and partially fixed de-gree sequences. Phys. Rev. E (3) Fukuda, K. (2004). From the zonotope construction to the Minkowski addition of convexpolytopes. J. Symbolic Comput. Gawrilow, E. and Joswig, M. (2000). polymake: A framework for analyzing convex poly-topes. In Polytopes—Combinatorics and Computation (Oberwolfach, 1997) ( G. Kalai and G. M. Ziegler , eds.). DMV Seminar Geiger, D. , Meek, C. and Sturmfels, B. (2006). On the toric algebra of graphicalmodels. Ann. Statist. Geyer, C. J. (2009). Likelihood inference in exponential families and directions of reces-sion. Electron. J. Stat. Goldenberg, A. , Zheng, A. X. , Fienberg, S. E. and Airoldi, E. M. (2010). A surveyof statistical network models. Foundations and Trends in Machine Learning Goodreau, S. M. (2007). Advances in exponential random graph (p*) models applied toa large social network. Social Networks Haberman, S. (1981). Discussion of “An exponential family of probability distributionsfor directed graphs,” by P. W. Holland and S. Leinhardt. J. Amer. Statist. Assoc. Handcock, M. S. and Morris, M. (2007). A simple model for complex networks witharbitrary degree distribution and clustering. In Statistical Network Analysis : Models,Issues and New Directions (E. Airoldi, D. Blei, S. E. Fienberg, A. Goldenberg, E. Xingand A. Zheng, eds.). Lecture Notes in Computer Science Hara, H. and Takemura, A. (2010). Connecting tables with zero-one entries by a subsetof a Markov basis. In Algebraic Methods in Statistics and Probability II ( M. Viana and H. Wynn , eds.). Contemporary Mathematics A. RINALDO, S. PETROVI ´C AND S. E. FIENBERG Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distri-butions for directed graphs. J. Amer. Statist. Assoc. Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models .Springer, New York. MR2724362 Lauritzen, S. L. (2003). Rasch models with exchangeable rows and columns. In BayesianStatistics, 7 (Tenerife, 2002) ( J. M. Bernardo , M. J. Bayarri , J. O. Berger , A.P. Dawid , D. Heckerman , A. F. M. Smith and M. West , eds.) 215–232. OxfordUniv. Press, New York. MR2003175 Lauritzen, S. L. (2008). Exchangeable Rasch matrices. Rend. Mat. Appl. (7) Lov´asz, L. and Szegedy, B. (2006). Limits of dense graph sequences. J. Combin. TheorySer. B Mahadev, N. V. R. and Peled, U. N. (1995). Threshold Graphs and Related Topics . Annals of Discrete Mathematics . North-Holland, Amsterdam. MR1417258 Meyer, M. M. (1982). Transforming contingency tables. Ann. Statist. Morton, J. (2013). Relations among conditional probabilities. J. Symbolic Comput. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Rev. Newman, M. E. J. (2010). Networks: An Introduction . Oxford Univ. Press, Oxford.MR2676073 Newman, M. , Barab´asi, A.-L. and Watts, D. J. , eds. (2006). The Structure and Dy-namics of Networks . Princeton Univ. Press, Princeton, NJ. MR2352222 Newman, M. E. J. , Strogatz, S. H. and Watts, D. J. (2001). Random graphs witharbitrary degree distributions and their applications. Phys. Rev. E (3) Ogawa, M. , Hara, H. and Takemura, A. (2013). Graver basis for an undirected graphand its application to testing the beta model of random graphs. Ann. Inst. Statist.Math. Park, J. and Newman, M. E. J. (2004). Statistical mechanics of networks. Phys. Rev.E (3) Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Available at http://arxiv.org/abs/1201.5871 . Petrovi´c, S. , Rinaldo, A. and Fienberg, S. E. (2010). Algebraic statistics for a di-rected random graph model with reciprocation. In Algebraic Methods in Statistics andProbability II . Contemporary Mathematics Rinaldo, A. , Fienberg, S. E. and Zhou, Y. (2009). On the geometry of discrete ex-ponential families with application to exponential random graph models. Electron. J.Stat. Rinaldo, A. , Petrovi´c, S. and Fienberg, S. E. (2013). Supplement to “Maximumlilkelihood estimation in the β -model.” DOI:10.1214/12-AOS1078SUPP. Robins, D. , Pattison, P. , Kalish, Y. and Lusher, D. (2007). An introduction to ex-ponential random graph ( p ∗ ) models for social networks. Social Networks Schrijver, A. (1998). Theory of Linear and Integer Programming . Wiley, New York. Stanley, R. P. (1991). A zonotope associated with graphical degree sequences. In Ap-plied Geometry and Discrete Mathematics . DIMACS Series in Discrete Mathemat-ics and Theoretical Computer Science β -MODEL Viger, F. and Latapy, M. (2005). Efficient and simple generation of random simpleconnected graphs with prescribed degree sequence. In Computing and Combinatorics . Lecture Notes in Computer Science Weibel, C. (2010). Implementation and parallelization of a reverse-search algorithmfor Minkowski sums. In Proceedings of the 12th Workshop on Algorithm Engineer-ing and Experiments (ALENEX 2010) https://sites.google.com/site/christopheweibel/research/minksum . Willinger, W. , Alderson, D. and Doyle, J. C. (2009). Mathematics and the Internet:A source of enormous confusion and great potential. Notices Amer. Math. Soc. Yan, T. and Xu, J. (2012). A central limit theorem in the β -model for undirected ran-dom graphs with a diverging number of vertices. Available at http://arxiv.org/abs/1202.3307 . Yan, T. , Xu, J. and Yang, Y. (2012). High dimensional Wilks phenomena in randomgraph models. Available at http://arxiv.org/abs/1201.0058 . A. RinaldoDepartment of StatisticsCarnegie Mellon University5000 Forbes AvenuePittsburgh, Pennsylvania 15213USAE-mail: [email protected] S. Petrovi´cDepartment of StatisticsPennsylvania State University326 Thomas BuildingUniversity Park, Pennsylvania 16802USAE-mail: [email protected]