aa r X i v : . [ m a t h . C O ] O c t MATRICES WITH PRESCRIBED ROW AND COLUMN SUMS
Alexander Barvinok
October 2010
Abstract.
This is a survey of the recent progress and open questions on the struc-ture of the sets of 0-1 and non-negative integer matrices with prescribed row andcolumn sums. We discuss cardinality estimates, the structure of a random matrixfrom the set, discrete versions of the Brunn-Minkowski inequality and the statisticaldependence between row and column sums.
1. Introduction
Let R = ( r , . . . , r m ) and C = ( c , . . . , c n ) be positive integer vectors such that(1.1) r + . . . + r m = c + . . . + c n = N. We consider the set A ( R, C ) of all m × n matrices D = ( d ij ) with 0-1 entries, rowsums R and column sums C : A ( R, C ) = ( D = ( d ij ) : n X j =1 d ij = r i for i = 1 , . . . , m m X i =1 d ij = c j for j = 1 , . . . , nd ij ∈ { , } ) . Key words and phrases.
AMS -TEX
1e also consider the set A + ( R, C ) of non-negative integer m × n matrices withrow sums R and column sums C : A + ( R, C ) = ( D = ( d ij ) : n X j =1 d ij = r i for i = 1 , . . . , m m X i =1 d ij = c j for j = 1 , . . . , nd ij ∈ Z + ) . Vectors R and C are called margins of matrices from A ( R, C ) and A + ( R, C ).We reserve notation N for the sums of the coordinates of R and C in (1.1) andwrite | R | = | C | = N .While the set A + ( R, C ) is non-empty as long as the balance condition (1.1)is satisfied, a result of Gale and Ryser (see, for example, Section 6.2 of [BR91])provides a necessary and sufficient criterion for set A ( R, C ) to be non-empty. Letus assume that m ≥ c ≥ c ≥ . . . ≥ c n ≥ n ≥ r i ≥ i = 1 , . . . , n. Set A ( R, C ) is not empty if and only if (1.1) holds and m X i =1 min { r i , k } ≥ k X j =1 c j for k = 1 , . . . , n. Assuming that A ( R, C ) = ∅ , we are interested in the following questions: • What is the cardinality | A ( R, C ) | of A ( R, C ) and the cardinality | A + ( R, C ) | of A + ( R, C )? • Let us us consider A ( R, C ) and A + ( R, C ) as finite probability spaces withthe uniform measure. What a random matrix D ∈ A ( R, C ) and a random matrix D ∈ A + ( R, C ) are likely to look like?The paper is organized as follows.In Section 2 we estimate of | A ( R, C ) | within an ( mn ) O ( m + n ) factor and in Sec-tion 3 we estimate | A + ( R, C ) | within an N O ( m + n ) factor. In all but very sparse casesthis way we obtain asymptotically exact estimates of ln | A ( R, C ) | and ln | A + ( R, C ) | respectively. The estimate of Section 2 is based on a representation of | A ( R, C ) | as the permanent of a certain mn × mn matrix of 0’s and 1’s, while the estimateof Section 3 is based on a representation of | A + ( R, C ) | as the expectation of the2ermanent of a certain N × N random matrix with exponentially distributed en-tries. In the proofs, the crucial role is played by the van der Waerden inequality forpermanents of doubly stochastic matrices. The cardinality estimates are obtainedas solutions to simple convex optimization problems and hence are efficiently com-putable, although they cannot be expressed by a “closed formula” in the margins( R, C ). Our method is sufficiently robust as the same approach can be applied toestimate the cardinality of the set of matrices with prescribed margins and with 0’sin prescribed positions.In Sections 4 and 5 we discuss some consequences of the formulas obtained inSections 2 and 3. In particular, in Section 4, we show that the numbers | A ( R, C ) | and | A + ( R, C ) | are both approximately log-concave as functions of the margins( R, C ). We note an open question whether these numbers are genuinely log-concaveand give some, admittedly weak, evidence that it may be the case. In Section 5, wediscuss statistical dependence between row and column sums. Namely, we considerfinite probability spaces of m × n non-negative integer or 0-1 matrices with thetotal sum N of entries and two events in those spaces: event R consisting of thematrices with row sums R and event C consisting of the matrices with column sums C . It turns out that 0-1 and non-negative integer matrices exhibit opposite types ofbehavior. Assuming that the margins R and C are sufficiently far away from sparseand uniform, we show that for 0-1 matrices the events R and C repel each other(events R and C are negatively correlated) while for non-negative integer matricesthey attract each other (the events are positively correlated).In Section 6, we discuss what random matrices D ∈ A ( R, C ) and D ∈ A + ( R, C )look like. We show that in many respects, a random matrix D ∈ A ( R, C ) behaveslike an m × n matrix X of independent Bernoulli random variables such that E X = Z where Z is a certain matrix, called the maximum entropy matrix, with row sums R , column sums C and entries between 0 and 1. It turns out that Z is the solutionto an optimization problem, which is convex dual to the optimization problemof Section 2 used to estimate | A ( R, C ) | . On the other hand, a random matrix D ∈ A + ( R, C ) in many respects behaves like an m × n matrix X of independentgeometric random variables such that E X = Z + where Z + is a certain matrix, alsocalled the maximum entropy matrix, with row sums R , column sums C and non-negative entries. It turns out that Z + is the solution to an optimization problemwhich is convex dual to the optimization problem of Section 3 used to estimate | A + ( R, C ) | . It follows that in various natural metrics matrices D ∈ A ( R, C )concentrate about Z while matrices D ∈ A + ( R, C ) concentrate about Z + . Wenote some open questions on whether individual entries of random D ∈ A ( R, C )and random D ∈ A + ( R, C ) are asymptotically Bernoulli, respectively geometric,with the expectations read off from Z and Z + .In Section 7, we discuss asymptotically exact formulas for | A ( R, C ) | and | A + ( R, C ) | . Those formulas are established under essentially more restrictive con-ditions than cruder estimates of Sections 2 and 3. We assume that the entries ofthe maximum entropy matrices Z and Z + are within a constant factor, fixed in3dvance, of each other. Recall that matrices Z and Z + characterize the typicalbehavior of random matrices D ∈ A ( R, C ) and D ∈ A + ( R, C ) respectively. Inthe case of 0-1 matrices our condition basically means that the margins (
R, C ) liesufficiently deep inside the region defined by the Gale-Ryser inequalities. As themargins approach the boundary, the number | A ( R, C ) | gets volatile and hence can-not be expressed by an analytic formula like the one described in Section 7. Thesituation with non-negative integer matrices is less clear. It is plausible that thenumber | A + ( R, C ) | experiences some volatility when some entries of Z + becomeabnormally large, but we don’t have a proof of that happenning.In Section 8, we mention some possible ramifications, such as enumeration ofhigher-order tensors and graphs with given degree sequences.The paper is a survey and although we don’t provide complete proofs, we oftensketch main ideas of our approach.
2. The logarithmic asymptotic for the number of 0-1 matrices
The following result is proven in [Ba10a]. (2.1) Theorem.
Given positive integer vectors R = ( r , . . . , r m ) and C = ( c , . . . , c n ) , let us define the function F ( x , y ) = m Y i =1 x − r i i ! n Y j =1 y − c j j Y i,j (1 + x i y j ) for x = ( x , . . . , x m ) and y = ( y , . . . , y n ) and let α ( R, C ) = inf x ,... ,x m > y ,... ,y n > F ( x , y ) . Then the number A ( R, C ) of m × n zero-one matrices with row sums R and columnsums C satisfies α ( R, C ) ≥ | A ( R, C ) | ≥ ( mn )!( mn ) mn m Y i =1 ( n − r i ) n − r i ( n − r i )! ! n Y j =1 c c j j c j ! α ( R, C ) . Using Stirling’s formula, s ! s s = √ πse − s (cid:18) O (cid:18) s (cid:19)(cid:19) , mn ) O ( m + n ) .Indeed, the “ e − s ” terms cancel each other out, since e − mn m Y i =1 e n − r i ! n Y j =1 e c j = 1 . Thus, for sufficiently dense 0-1 matrices, where we have | A ( R, C ) | = 2 Ω( mn ) , wehave an asymptotically exact formulaln | A ( R, C ) | ≈ ln α ( R, C ) as m, n −→ + ∞ . (2.2) A convex version of the optimization problem. Let us substitute x i = e s i for i = 1 , . . . , m and y j = e t j for j = 1 , . . . , n in F ( x , y ). Denoting G ( s , t ) = − m X i =1 r i s i − n X j =1 t j c j + X i,j ln (cid:0) e s i + t j (cid:1) for s = ( s , . . . , s m ) and t = ( t , . . . , t n ) , (2.2.1)we obtain ln α ( R, C ) = inf s ,... ,s m t ,... ,t n G ( s , t ) . We observe that G ( s , t ) is a convex function on R m + n . In particular, one cancompute the infimum of G efficiently by using interior point methods, see, forexample, [NN94]. (2.3) Sketch of proof of Theorem 2.1. The upper bound for | A ( R, C ) | isimmediate: it follows from the expansion Y ij (1 + x i y j ) = X R,C | A ( R, C ) | x R y C , where x R = x r · · · x r m m and y C = y c · · · y c n n for R = ( r , . . . , r m ) and C = ( c , . . . , c n ) and the sum is taken over all pairsof non-negative integer vectors R = ( r , . . . , r m ) and C = ( c , . . . , c n ) such that r + . . . + r m = c + . . . + c n ≤ mn .To prove the lower bound, we express | A ( R, C ) | as the permanent of an mn × mn matrix. Recall that the permanent of a k × k matrix B = ( b ij ) is defined byper B = X σ ∈ S k k Y i =1 b iσ ( i ) , S k of all permutations σ of theset { , . . . , k } , see, for example, Chapter 11 of [LW01]. One can show, see [Bar10a]for details, that(2.3.1) | A ( R, C ) | = m Y i =1 n − r i )! ! n Y j =1 c j ! per B, where B is the mn × mn matrix of the following structure:the rows of B are split into distinct m + n blocks, the m blocks of type I having n − r , . . . , n − r m rows respectively and n blocks of type II having c , . . . , c n rowsrespectively;the columns of B are split into m distinct blocks of n columns each;for i = 1 , . . . , m , the entry of B that lies in a row from the i -th block of rows oftype I and a column from the i -th block of columns is equal to 1;for i = 1 , . . . , m and j = 1 , . . . , n , the entry of B that lies in a row from the j -th block of rows of type II and the j -th column from the i -th block of columns isequal to 1;all other entries of B are 0.Suppose that the infimum of function G ( s , t ) defined by (2.2.1) is attained at aparticular point s = ( s , . . . , s m ) and t = ( t , . . . , t n ) (the case when the infimumis not attained is handled by an approximation argument). Let x i = exp { s i } for i = 1 , . . . , m and y j = exp { t j } for j = 1 , . . . , n .Setting the gradient of G ( s , t ) to 0, we obtain n X j =1 x i y j x i y j = r i for i = 1 , . . . , m m X i =1 x i y j x i y j = c j for j = 1 , . . . , n. (2.3.2)Let us consider a matrix B ′ obtained from matrix B as follows:for i = 1 , . . . , m we multiply every row of B in the i -th block of type I by1 x i ( n − r i ) ;for j = 1 , . . . , n , we multiply every row of B in the j -th block of type II by y j c j ;6or i = 1 , . . . , m and j = 1 , . . . , n we multiply the j -th column in the i -th blockof columns of B by x i x i y j . Thenper B = m Y i =1 x − r i i ( n − r i ) n − r i ! n Y j =1 y − c j j c c j j Y ij (1 + x i y j ) per B ′ . On the other hand, equations (2.3.2) imply that the row and column sums of B ′ areequal to 1, that is, B ′ is doubly stochastic . Applying the van der Waerden bound forpermanents of doubly stochastic matrices, see, for example, Chapter 12 of [LW01],we conclude that per B ′ ≥ ( mn )!( mn ) mn , which, together with (2.3.1) completes the proof. (cid:3) One can prove a version of Theorem 2.1 for 0-1 matrices with prescribed rowand column sums and prescribed zeros in some positions.
3. The logarithmic asymptotics for thenumber of non-negative integer matrices
The following result is proven in [Ba09]. (3.1) Theorem.
Let R = ( r , . . . , r m ) and C = ( c , . . . , c n ) be positive integervectors such that r + . . . + r m = c + . . . + c n = N . Let us define a function F + ( x , y ) = m Y i =1 x − r i i ! n Y j =1 y − c j j Y ij − x i y j for x = ( x , . . . , x m ) and y = ( y , . . . , y n ) . Then F + ( x , y ) attains its minimum α + ( R, C ) = min
Theorems 2.1 and 3.1 allow us to establish approximate log-concavity of thenumbers A ( R, C ) and A + ( R, C ).For a non-negative integer vector B = ( b , . . . , b p ), we denote | B | = p X i =1 b i . (4.1) Theorem. Let R , . . . , R p be positive integer m -vectors and let C , . . . , C p be positive integer n -vectors such that | R | = | C | , . . . , | R p | = | C p | .Let β , . . . , β p ≥ be real numbers such that β + . . . + β p = 1 and such that R = β R + . . . + β p R p is a positive integer m -vector and C = β C + . . . + β p C p is a positive integer n -vector. Let N = | R | = | C | .Then for some absolute constant γ > we have (1) ( mn ) γ ( m + n ) | A ( R, C ) | ≥ p Y k =1 | A ( R k , C k ) | β k and (2) N γ ( m + n ) | A + ( R, C ) | ≥ p Y k =1 | A + ( R k , C k ) | β k . Proof.
Let us denote function F of Theorem 2.1 for the pair ( R k , C k ) by F k andfor the pair ( R, C ) just by F . Then(4.1.1) F ( x , y ) = p Y k =1 F β k k ( x , y )and hence α ( R, C ) ≥ p Y k =1 ( α ( R k , C k )) β k . Part (1) now follows by Theorem 2.1.Similarly, we obtain (4.1.1) if we denote function F + of Theorem 3.1 for the pair( R k , C k ) by F k and for the pair ( R, C ) just by F . Hence α + ( R, C ) ≥ p Y k =1 ( α + ( R k , C k )) β k . Part (2) now follows by Theorem 3.1. (cid:3) mn ) mn ( mn )! m Y i =1 ( n − r i )!( n − r i ) n − r i ! n Y j =1 c j ! c c j j | A ( R, C ) | ≥ p Y k =1 | A ( R k , C k ) | β k , where R = ( r , . . . , r m ) and C = ( c , . . . , c n ).In [Ba07] a more precise estimate N N N ! min m Y i =1 r i ! r r i i , n Y j =1 c j ! c c j j | A + ( R, C ) | ≥ p Y k =1 | A + ( R k , C k ) | β k is proven under the additional assumption that | R k | = | C k | = N for k = 1 , . . . , p .Theorem 4.1 raises a natural question whether stronger inequalities hold. (4.2) Brunn-Minkowski inequalities. (4.2.1) Question. Is it true that under the conditions of Theorem 4.1 we have | A ( R, C ) | ≥ p Y k =1 | A ( R k , C k ) | β k ? (4.2.2) Question. Is it true that under the conditions of Theorem 4.1 we have | A + ( R, C ) | ≥ p Y k =1 | A + ( R k , C k ) | β k ?Should they hold, inequalities of (4.2.1) and (4.2.2) would be natural examplesof discrete Brunn-Minkowski inequalities, see [Ga02] for a survey.Some known simpler inequalities are consistent with the inequalities of (4.2.1)–(4.2.2). Let X = ( x , . . . , x p ) and Y = ( y , . . . , y p ) be non-negative integer vectorssuch that x ≥ x ≥ . . . ≥ x p and y ≥ y ≥ . . . ≥ y p . We say that X dominates Y if k X i =1 x i ≥ k X i =1 y i for k = 1 , . . . , p − p X i =1 x i = p X i =1 y i . Equivalently, X dominates Y if Y is a convex combination of vectors obtained from X by permutations of coordinates.One can show that(4.2.3) | A ( R, C ) | ≥ | A ( R ′ , C ′ ) | and | A + ( R, C ) | ≥ | A + ( R ′ , C ′ ) | provided R ′ dominates R and C ′ dominates C , see Chapter 16 of [LW01] and [Ba07].Inequalities (4.2.3) are consistent with the inequalities of (4.2.1) and (4.2.2).12 . Dependence between row and column sums The following attractive “independence heuristic” for estimating | A ( R, C ) | and | A + ( R, C ) | was discussed by Good [Go77] and by Good and Crook [GC76]. (5.1) The independence heuristic. Let us consider the set of all m × n matrices D = ( d ij ) with 0-1 entries and the total sum N of entries as a finite probabilityspace with the uniform measure. Let us consider the event R consisting of thematrices with the row sums R = ( r , . . . , r m ) and the event C consisting of thematrices with the column sums C = ( c , . . . , c n ). Then Pr ( R ) = (cid:18) mnN (cid:19) − m Y i =1 (cid:18) nr i (cid:19) and Pr ( C ) = (cid:18) mnN (cid:19) − n Y j =1 (cid:18) mc j (cid:19) . In addition, A ( R, C ) = R ∩ C . If we assume that events R and C are independent, we obtain the following independence estimate (5.1.1) I ( R, C ) = (cid:18) mnN (cid:19) − m Y i =1 (cid:18) nr i (cid:19) n Y j =1 (cid:18) mc j (cid:19) for the number | A ( R, C ) | of 0-1 matrices with row sums R and column sums C .Similarly, let us consider the set of all m × n matrices D = ( d ij ) with non-negativeinteger entries and the total sum N of entries as a finite probability space with theuniform measure. Let us consider the event R + consisting of the matrices with therow sums R = ( r , . . . , r m ) and the event C + consisting of the matrices with thecolumn sums C = ( c , . . . , c n ). Then Pr ( R + ) = (cid:18) N + mn − mn − (cid:19) − m Y i =1 (cid:18) r i + n − n − (cid:19) and Pr ( C + ) = (cid:18) N + mn − mn − (cid:19) − n Y j =1 (cid:18) c j + m − m − (cid:19) . We have A + ( R, C ) = R + ∩ C + . If we assume that events R + and C + are independent, we obtain the independenceestimate (5.1.2) I + ( R, C ) = (cid:18) N + mn − mn − (cid:19) − m Y i =1 (cid:18) r i + n − n − (cid:19) n Y j =1 (cid:18) c j + m − m − (cid:19) . I ( R, C ) and I + ( R, C ) provide reasonableapproximations to | A ( R, C ) | and | A + ( R, C ) | respectively in the following two cases:in the case of equal margins, when r = . . . = r m = r and c = . . . = c n = c, see [C+08] and [C+07]in the sparse case, whenmax i =1 ,... ,m r i ≪ n and max j =1 ,... ,n c j ≪ m, see [G+06] and [GM08].We will see in Section 5.4 that the independence estimates provide the correctlogarithmic asymptotics in the case when all row sums are equal or all column sumsare equal. However, if both row and column sums are sufficiently far away frombeing uniform and sparse, the independence estimates, generally speaking, pro-vide poor approximations. Moreover, in the case of 0-1 matrices the independenceestimate I ( R, C ) typically grossly overestimates | A ( R, C ) | while in the case ofnon-negative integer matrices the independence estimate I + ( R, C ) typically grosslyunderestimates | A + ( R, C ) | . In other words, for typical margins R and C the events R and C repel each other (the events are negatively correlated) while events R + and C + attract each other (the events are positively correlated). To see why this isthe case, we write the estimates α ( R, C ) of Theorem 2.1 and α + ( R, C ) of Theorem3.1 in terms of entropy.The following result is proven in [Ba10a]. (5.2) Lemma.
Let P ( R, C ) be the polytope of all m × n matrices X = ( x ij ) withrow sums R , column sums C and such that ≤ x ij ≤ for all i and j . Suppose thatpolytope P ( R, C ) has a non-empty interior, that is contains a matrix Y = ( y ij ) such that < y ij < for all i and j . Let us define a function h : P ( R, C ) −→ R by h ( X ) = X i,j x ij ln 1 x i,j + (1 − x ij ) ln 11 − x ij for X ∈ P ( R, C ) . Then h is a strictly concave function on of P ( R, C ) and hence attains its maximumon P ( R, C ) at a unique matrix Z = ( z ij ) , which we call the maximum entropymatrix. Moreover, (1) We have < z ij < for all i and j ; (2) The infimum α ( R, C ) of Theorem 2.1 is attained at some particular point ( x , y ) ; (3) We have α ( R, C ) = e h ( Z ) . ketch of Proof. It is straightforward to check that h is strictly concave and that ∂∂x ij h ( X ) = ln 1 − x ij x ij . In particular, the (right) derivative at x ij = 0 is + ∞ , the (left) derivative at x ij = 1is −∞ and the derivative for 0 < x ij < Z must have all entries strictly between 0 and 1, since otherwise we canincrease the value of h by perturbing Z in the direction of a matrix Y from theinterior of P ( R, C ). This proves Part (1).The Lagrange optimality conditions imply thatln 1 − z ij z ij = − λ i − µ j for all i, j and some numbers λ , . . . , λ m and µ , . . . , µ n . Hence(5.2.1) z ij = e λ i + µ j e λ i + µ j for all i, j. In particular, m X i =1 e λ i + µ j e λ i + µ j = c j for j = 1 , . . . , n and n X j =1 e λ i + µ j e λ i + µ j = r i for i = 1 , . . . , m. (5.2.2)Equations (5.2.2) imply that point s = ( λ , . . . , λ m ) and t = ( µ , . . . , µ n ) is a crit-ical point of function G ( s , t ) defined by (2.2.1) and hence the infimum α ( R, C )of F ( x , y ) is attained at x i = e λ i for i = 1 , . . . , m and y j = e µ j for j = 1 , . . . , n .Hence Part (2) follows. Using (5.2.1) it is then straightforward to check that F ( x , y ) = e h ( Z ) for the minimum point ( x , y ). (cid:3) We note that h ( x ) = x ln 1 x + (1 − x ) ln 1 x for 0 ≤ x ≤ x , see Section 6.The following result is proven in [Ba09]. (5.3) Lemma. Let P + ( R, C ) be the polytope of all non-negative m × n matrices X = ( x ij ) with row sums R and column sums C . Let us define a function g : P + ( R, C ) −→ R by g ( X ) = X i,j ( x ij + 1) ln (1 + x ij ) − x ij ln x ij for X ∈ P + ( R, C ) . hen g is a strictly concave function on P + ( R, C ) and hence attains its maximumon P + ( R, C ) at a unique matrix Z + = ( z ij ) , which we call the maximum entropymatrix. Moreover, (1) We have z ij > for all i, j and (2) For the minimum α + ( R, C ) of Theorem 3.1, we have α + ( R, C ) = e g ( Z + ) .Sketch of Proof. It is straightforward to check that g is strictly concave and that ∂∂x ij g ( X ) = ln 1 + x ij x ij for all i, j. In particular, the (left) derivative is + ∞ for x ij = 0 and finite for every x ij > P + ( R, C ) contains an interior point (for example, matrix Y = ( y ij ) with y ij = r i c j /N ), arguing as in the proof of Lemma 5.2, we obtain Part (1).The Lagrange optimality conditions imply thatln 1 + z ij z ij = λ i + µ j for all i, j and some numbers λ , . . . , λ m and µ , . . . , µ n . Hence(5.3.1) z ij = e − λ i − µ j − e − λ i − µ j for all i, j. In particular, n X i = m e − λ i − µ j e − λ i − µ j = c j for j = 1 , . . . , n and n X j =1 e − λ i − µ j e − λ i − µ j = r i for i = 1 , . . . , m. (5.3.2)Equations (5.3.2) imply that the point s = ( λ , . . . , λ m ) and t = ( µ , . . . , µ n ) isa critical point of function G + ( s , t ) defined by (3.2.1) and hence the minimum α + ( R, C ) of F + ( x , y ) is attained at x i = e λ i for i = 1 , . . . , m and y j = e µ j for j =1 , . . . , n . Using (5.3.1), it is then straightforward to check that F + ( x , y ) = e h ( Z + ) for the minimum point ( x , y ). (cid:3) We note that g ( x ) = ( x + 1) ln( x + 1) − x ln x for x ≥ x , see Section 6.16 Let H ( p , . . . , p k ) = k X i =1 p i ln 1 p i be the entropy function defined on k -tuples (probability distributions) p , . . . , p k such that p + . . . + p k = 1 and p i ≥ i = 1 , . . . , k . Assuming that polytope P ( R, C ) of Lemma 5.2 has a non-empty interior, we can writeln α ( R, C ) = N H (cid:16) z ij N ; i, j (cid:17) + ( mn − N ) H (cid:18) − z ij mn − N ; i, j (cid:19) − N ln N − ( mn − N ) ln( mn − N ) , where Z = ( z ij ) is the maximum entropy matrix. On the other hand, for theindependence estimate (5.1.1), we haveln I ( R, C ) = N H (cid:16) r i N ; i (cid:17) + ( mn − N ) H (cid:18) n − r i mn − N ; i (cid:19) + N H (cid:16) c j N ; j (cid:17) + ( mn − N ) H (cid:18) m − c j mn − N ; j (cid:19) − N ln N − ( mn − N ) ln( mn − N ) + O (cid:0) ( m + n ) ln( mn ) (cid:1) . Using the inequality which relates the entropy of a distribution and the entropy ofits margins, see, for example, [Kh57], we obtain(5.4.1) H (cid:16) z ij N ; i, j (cid:17) ≤ H (cid:16) r i N ; i (cid:17) + H (cid:16) c j N ; j (cid:17) with the equality if and only if z ij = r i c j N for all i, j and(5.4.2) H (cid:18) − z ij mn − N ; i, j (cid:19) ≤ H (cid:18) n − r i mn − N ; i (cid:19) + H (cid:18) m − c j mn − N ; j (cid:19) with the equality if and only if1 − z ij = ( n − r i ) ( m − c j ) mn − N for all i, j. Thus we have equalities in (5.4.1) and (5.4.2) if and only if( r i m − N ) ( c j n − N ) = 0 for all i, j, I ( R, C ) estimates | A ( R, C ) | within an ( mn ) O ( m + n ) factor. In all other cases, I ( R, C ) overestimates | A ( R, C ) | by as much as a 2 Ω( mn ) factor as long as thedifferences between the right hand sides and left hand sides of (5.4.1) and (5.4.2)multiplied by N and ( mn − N ) respectively overcome the O (cid:0) ( m + n ) ln( mn ) (cid:1) errorterm, see also Section 5.5 for a particular family of examples.We handle non-negative integer matrices slightly differently. For the indepen-dence estimate (5.1.2) we obtainln I + ( R, C ) = − ( N + mn ) H (cid:18) r i + nN + mn ; i (cid:19) − ( N + mn ) H (cid:18) c j + mN + mn ; j (cid:19) − m X i =1 r i ln r i − n X j =1 c j ln c j + N ln N + ( N + mn ) ln( N + mn ) + O (cid:0) ( m + n ) ln N (cid:1) On the other hand, by Lemma 5.3 we haveln α + ( R, C ) = g ( Z + ) ≥ g ( Y ) , where Z + is the maximum entropy matrix and Y = ( y ij ) is the matrix defined by y ij = r i c j N for all i, j. It is then easy to check that g ( Y ) = − ( N + mn ) H (cid:18) r i c j + NN ( N + mn ) ; i, j (cid:19) − m X i =1 r i ln r i − n X j =1 c j ln c j + N ln N + ( N + mn ) ln( N + mn ) . By the inequality relating the entropy of a distribution and the entropy of itsmargins [Kh57], we have(5.4.3) H (cid:18) r i c j + NN ( N + mn ) ; i, j (cid:19) ≤ H (cid:18) r i + nN + mn ; i (cid:19) + H (cid:18) c j + mN + mn ; j (cid:19) with the equality if and only if r i c j + NN ( N + mn ) = ( r i + n )( c j + m )( N + mn ) for all i, j, r i m − N ) ( c j n − N ) = 0 for all i, j, so that all row sums are equal or all column sums are equal. In that case, bysymmetry we have Y = Z + and hence I + ( R, C ) estimates | A + ( R, C ) | within an N O ( m + n ) factor. In all other cases, I + ( R, C ) underestimates | A + ( R, C ) | by asmuch as a 2 Ω( mn ) factor as long as the difference between the right hand side andleft hand side of (5.4.3) multiplied by N + mn overcomes the O (cid:0) ( m + n ) ln N (cid:1) errorterm, see also Section 5.5 for a particular family of examples. (5.5) Cloning margins. Let us choose a positive integer m -vector R = ( r , . . . , r m ) and a positive integer n -vector C = ( c , . . . , c n ) such that r + . . . + r m = c + . . . + c n = N. For a positive integer k , let us define a km -vector R k and a kn -vector C k by R k = kr , . . . , kr | {z } k times , . . . , kr m , . . . , kr m | {z } k times and C k = kc , . . . , kc | {z } k times , . . . , kc n , . . . , kc n | {z } k times . We say that margins ( R k , C k ) are obtained by cloning from margins ( R, C ). It isnot hard to show that if Z and Z + are the maximum entropy matrices associatedwith margins ( R, C ) via Lemma 5.2 and Lemma 5.3 respectively, then the maximumentropy matrices associated with margins ( R k , C k ) are the Kronecker products Z ⊗ Id k and Z + ⊗ Id k respectively, where Id k is the k × k identity matrix. One haslim k −→ + ∞ | A ( R k , C k ) | /k = α ( R, C ) andlim k −→ + ∞ | A + ( R k , C k ) | /k = α + ( R, C ) . Moreover, if not all coordinates r i of R are equal and not all coordinates c j of C are equal then the independence estimate I ( R k , C k ), see (5.1.1), overestimatesthe number of km × kn matrices with row sums R k and column sums C k and 0-1 entries within a 2 Ω( k ) factor while the independence estimate I + ( R k , C k ), see(5.1.2), underestimates the number of km × kn non-negative integer matrices withina 2 Ω( k ) factor, see [Ba10a] and [Ba09] for details.19 . Random matrices with prescribed row and column sums Estimates of Theorems 2.1 and 3.1, however crude, allow us to obtain a descrip-tion of a random or typical matrix from sets A ( R, C ) and A + ( R, C ), consideredas finite probability spaces with the uniform measures.Recall that x is a Bernoulli random variable if Pr { x = 0 } = p and Pr { x = 1 } = q for some p, q ≥ p + q = 1. Clearly, E x = q .Recall that P ( R, C ) is the polytope of m × n matrices with row sums R , columnsums C and entries between 0 and 1. Let function h : P ( R, C ) −→ R and themaximum entropy matrix Z ∈ P ( R, C ) be defined as in Lemma 5.2.The following result is proven in [Ba10a], see also [BH10a]. (6.1) Theorem.
Suppose that polytope P ( R, C ) has a non-empty interior and let Z ∈ P ( R, C ) be the maximum entropy matrix. Let X = ( x ij ) be a random m × n matrix of independent Bernoulli random variables x ij such that E X = Z . Then (1) The probability mass function of X is constant on the set A ( R, C ) of 0-1matrices with row sums R and column sums C and Pr { X = D } = e − h ( Z ) for all D ∈ A ( R, C );(2)
We have Pr { X ∈ A ( R, C ) } ≥ ( mn ) − γ ( m + n ) , where γ > is an absolute constant. Theorem 6.1 implies that in many respects a random matrix D ∈ A ( R, C )behaves as a random matrix X of independent Bernoulli random variables such that E X = Z , where Z is the maximum entropy matrix. More precisely, any eventthat is sufficiently rare for the random matrix X (that is, an event the probabilityof which is essentially smaller than ( mn ) − O ( m + n ) ), will also be a rare event for arandom matrix D ∈ A ( R, C ). In particular, we can conclude that a typical matrix D ∈ A ( R, C ) is sufficiently close to Z as long as sums of entries over sufficientlylarge subsets S of indices are concerned.For an m × n matrix B = ( b ij ) and a subset S ⊂ n ( i, j ) : i = 1 , . . . , m, j = 1 , . . . , n o let σ S ( B ) = X ( i,j ) ∈ S b ij be the sum of the entries of B indexed by set S . We obtain the following corollary,see [Ba10a] for details. 20 Let us fix real numbers κ > and < δ < . Then there existsa number q = q ( κ, δ ) > such that the following holds.Let ( R, C ) be margins such that n ≥ m > q and the polytope P ( R, C ) has anon-empty interior and let Z ∈ P ( R, C ) be the maximum entropy matrix. Let S ⊂ (cid:8) ( i, j ) : i = 1 , . . . , m ; j = 1 , . . . , n (cid:9) be a set such that σ S ( Z ) ≥ δmn andlet ǫ = δ ln √ m . If ǫ ≤ then Pr n D ∈ A ( R, C ) : (1 − ǫ ) σ S ( Z ) ≤ σ S ( D ) ≤ (1 + ǫ ) σ S ( Z ) o ≥ − n − κn . Recall that x is a geometric random variable if Pr { x = k } = pq k for k = 0 , , , . . . for some p, q ≥ p + q = 1. We have E x = q/p .Recall that P + ( R, C ) is the polytope of m × n non-negative matrices with rowsums R and column sums C . Let function g : P + ( R, C ) −→ R and the maximumentropy matrix Z + ∈ P ( R, C ) be defined as in Lemma 5.3.The following result is proven in [Ba10b], see also [BH10a]. (6.3) Theorem.
Let Z + ∈ P ( R, C ) be the maximum entropy matrix. Let X =( x ij ) be a random m × n matrix of independent geometric random variables x ij suchthat E X = Z + . Then (1) The probability mass function of X is constant on the set A + ( R, C ) of non-negative integer matrices with row sums R and column sums C and Pr { X = D } = e − g ( Z + ) for all D ∈ A + ( R, C );(2)
We have Pr { X ∈ A + ( R, C ) } ≥ N − γ ( m + n ) , where γ > is an absolute constant and N = r + . . . + r m = c + . . . + c n for R = ( r , . . . , r m ) and C = ( c , . . . , c n ) . Theorem 6.3 implies that in many respects a random matrix D ∈ A + ( R, C )behaves as a matrix X of independent geometric random variables such that E X = Z + , where Z + is the maximum entropy matrix. More precisely, any event that issufficiently rare for the random matrix X (that is, an event the probability of whichis essentially smaller than N − O ( m + n ) ), will also be a rare event for a random matrix D ∈ A + ( R, C ). In particular, we can conclude that a typical matrix D ∈ A + ( R, C )is sufficiently close to Z + as long as sums of entries over sufficiently large subsets S of indices are concerned.Recall that σ S ( B ) denotes the sum of the entries of a matrix B indexed by a set S . We obtain the following corollary, see [Ba10b] for details.21 Let us fix real numbers κ > and < δ < . Then there existsa positive integer q = q ( κ, δ ) such that the following holds.Let R = ( r , . . . , r m ) and C = ( c , . . . , c m ) be positive integer vectors such that r + . . . + r m = c + . . . + c n = N , δNm ≤ r i ≤ Nδm for i = 1 , . . . m,δNn ≤ c j ≤ Nδn for j = 1 , . . . , n and Nmn ≥ δ. Suppose that n ≥ m > q and let S ⊂ (cid:8) ( i, j ) : i = 1 , . . . , m, j = 1 , . . . , n (cid:9) be aset such that | S | ≥ δmn . Let Z + ∈ P + ( R, C ) be the maximum entropy matrix andlet ǫ = δ ln nm / . If ǫ ≤ then Pr n D ∈ A + ( R, C ) : (1 − ǫ ) σ S ( Z + ) ≤ σ S ( D ) ≤ (1 + ǫ ) σ S ( Z + ) o ≥ − n − κn . As is discussed in [BH10a], the ultimate reason why Theorems 6.1 and 6.3 holdtrue is thatthe matrix X of independent Bernoulli random variables such that E X = Z isthe random matrix with the maximum possible entropy among all random m × n matrices with 0-1 entries and the expectation in the affine subspace of the matriceswith row sums R and column sums C andthe matrix X of independent geometric random variables such that E X = Z + is the random matrix with the maximum possible entropy among all random m × n matrices with non-negative integer entries and the expectation in the affine subspaceof the matrices with row sums R and column sums C .Thus Theorems 6.1 and 6.3 can be considered as an illustration of the Good’sthesis [Go63] that the “null hypothesis” for an unknown probability distributionfrom a given class should be the hypothesis that the unknown distribution is, infact, the distribution of the maximum entropy in the given class. (6.5) Sketch of proof of Theorem 6.1. Let Z = ( z ij ) be the maximum entropymatrix as in Lemma 5.2. Let us choose D ∈ A ( R, C ), D = ( d ij ). Using (5.2.1),22e get Pr (cid:8) X = D (cid:9) = Y i,j z d ij ij (1 − z ij ) − d ij = Y ij e ( λ i + µ j ) d ij e λ i + µ j = exp m X i =1 λ i r i + n X j =1 µ j c j Y ij
11 + e λ i + µ j = e − h ( Z ) , which proves Part (1).To prove Part (2), we use Part (1), Theorem 2.1 and Lemma 5.2. We have Pr (cid:8) X ∈ A ( R, C ) (cid:9) = | A ( R, C ) | e − h ( Z ) ≥ ( mn ) − γ ( m + n ) α ( R, C ) e − h ( Z ) =( mn ) − γ ( m + n ) for some absolute constant γ > (cid:3) (6.6) Sketch of proof of Theorem 6.3. Let Z + = ( z ij ) be the maximum entropymatrix as in Lemma 5.3. Let us choose D ∈ A + ( R, C ), D = ( d ij ). Using (5.3.1),we get Pr (cid:8) X = D (cid:9) = Y i,j (cid:18)
11 + z ij (cid:19) (cid:18) z ij z ij (cid:19) d ij = Y ij (cid:0) − e − λ i − µ j (cid:1) e − ( λ i + µ j ) d ij = exp − m X i =1 λ i r i − n X j =1 µ j c j Y ij (cid:0) − e − λ i − µ j (cid:1) = e − g ( Z + ) , which proves Part (1).To prove Part (2), we use Part (1), Theorem 3.1 and Lemma 5.3. We have Pr (cid:8) X ∈ A + ( R, C ) (cid:9) = | A + ( R, C ) | e − g ( Z + ) ≥ N − γ ( m + n ) α + ( R, C ) e − g ( Z + ) = N − γ ( m + n ) for some absolute constant γ > (cid:3) (6.7) Open questions. Theorems 6.1 and 6.3 show that a random matrix D ∈ A ( R, C ), respectively D ∈ A + ( R, C ), in many respects behaves like a matrix ofindependent Bernoulli, respectively geometric, random variables whose expectationis the maximum entropy matrix Z , respectively Z + . One can ask whether indi-vidual entries d ij of D behave asymptotically as Bernoulli, respectively geometric,random variables with expectations z ij as the size of the matrices grows. In thesimplest situation we ask the following 23 Let (
R, C ) be margins and let ( R k , C k ) be margins obtainedfrom ( R, C ) by cloning as in Section 5.5. Is it true that as k grows, the entry d of a random matrix D ∈ A ( R k , C k ), respectively D ∈ A + ( R k , C k ), converges indistribution to the Bernoulli, respectively geometric, random variable with expec-tation z , where Z = ( z ij ), respectively Z + = ( z ij ), is the maximum entropymatrix of margins ( R, C )?Some entries of the maximum entropy matrix Z + may turn out to be surprisinglylarge, even for reasonably looking margins. In [Ba10b], the following example isconsidered. Suppose that m = n and let R n = C n = (3 n, n, . . . , n ). It turnsout that the entry z of the maximum entropy matrix Z + is linear in n , namely z > . n , while all other entries remain bounded by a constant. One can askwhether the d entry of a random matrix D ∈ A + ( R n , C n ) is indeed large, as thevalue of z suggests. (6.7.2) Question. Let ( R n , C n ) be margins as above. Is it true that as n grows,one has E d = Ω( n ) for a random matrix D ∈ A + ( R n , C n )?Curiously, the entry z becomes bounded by a constant if 3 n is replaced by 2 n .
7. Asymptotic formulas for the number ofmatrices with prescribed row and column sums
In this section, we discuss asymptotically exact estimates for | A ( R, C ) | and | A + ( R, C ) | . (7.1) An asymptotic formula for | A ( R, C ) | . Theorem 6.1 suggests the fol-lowing way to estimate the number | A ( R, C ) | of 0-1 matrices with row sums R and column sums C . Let us consider the matrix of independent Bernoulli randomvariables as in Theorem 6.1 and let Y be the random ( m + n )-vector obtained bycomputing the row and column sums of X . Then, by Theorem 6.1, we have(7.1.1) | A ( R, C ) | = e h ( Z ) Pr (cid:8) X ∈ A ( R, C ) (cid:9) = e h ( Z ) Pr (cid:8) Y = ( R, C ) (cid:9) . Now, random ( m + n )-vector Y is obtained as a sum of mn independent randomvectors and E Y = ( R, C ), so it is not unreasonable to assume that Pr (cid:8) Y = ( R, C ) (cid:9) can be estimated via some version of the Local Central Limit Theorem. In [BH10b]we show that this is indeed the case provided one employs the Edgeworth correctionfactor in the Central Limit Theorem.We introduce the necessary objects to state the asymptotic formula for the num-ber of 0-1 matrices with row sums R and column sums C .Let Z = ( z ij ) be the maximum entropy matrix as in Lemma 5.2. We assumethat 0 < z ij < i and j . Let us consider the quadratic form q : R m + n −→ R defined by q ( s, t ) = 12 X ≤ i ≤ m ≤ j ≤ n (cid:0) z ij − z ij (cid:1) ( s i + t j ) for s = ( s , . . . , s m ) and t = ( t , . . . , t n ) . q is positive semidefinite with the kernel spanned by vector u = , . . . , | {z } m times ; − , . . . , − | {z } n times . Let H = u ⊥ be the hyperplane in R m + n defined by the equation(7.1.2) s + . . . + s m = t + . . . + t n . Then the restriction q | H of q onto H is a positive definite quadratic form andwe define its determinant det q | H as the product of the non-zero eigenvalues of q .We consider the Gaussian probability measure on H with the density proportionalto e − q and define random variables φ , ψ : H −→ R by φ ( s, t ) = 16 X ≤ i ≤ m ≤ j ≤ n z ij (1 − z ij ) (2 z ij −
1) ( s i + t j ) and ψ ( s, t ) = 124 X ≤ i ≤ m ≤ j ≤ n z ij (1 − z ij ) (cid:0) z ij − z ij + 1 (cid:1) ( s i + t j ) for ( s, t ) = ( s , . . . , s m ; t , . . . , t n ) . We let µ = E φ and ν = E ψ . (7.2) Theorem. Let us fix < δ < / , let R = ( r , . . . , r m ) and C = ( c , . . . , c n ) be margins such that m ≥ δn and n ≥ δm . Let Z = ( z ij ) be the maximum entropymatrix as in Lemma 5.2 and suppose that δ ≤ z ij ≤ − δ for all i and j .Let the quadratic form q and values µ and ν be as defined in Section 7.1.Then the number (7.2.1) e h ( Z ) √ m + n (4 π ) ( m + n − / p det q | H exp n − µ ν o approximates the number | A ( R, C ) | of as m, n −→ + ∞ within a relative errorwhich approaches 0 as m, n −→ + ∞ . More precisely, for any < ǫ ≤ / , thevalue of (7.2.1) approximates | A ( R, C ) | within relative error ǫ provided m, n ≥ (cid:18) ǫ (cid:19) γ ( δ ) for some γ ( δ ) > . Some remarks are in order. 25ll the ingredients of formula (7.2.1) are efficiently computable, in time poly-nomial in m + n , see [BH10b] for details. If all row sums are equal then we have z ij = c j /m by symmetry and if all column sums are equal, we have z ij = r i /n . Inparticular, if all row sums are equal and if all column sums are equal, we obtainthe asymptotic formula of [C+08].Let us consider formula (7.1.1). If, in the spirit of the Local Central LimitTheorem, we approximated Pr (cid:8) Y = ( R, C ) (cid:9) by Pr (cid:8) Y ∗ ∈ ( R, C ) + Π (cid:9) , where Y ∗ is the ( m + n − Y and where Π is the set of points on thehyperplane H that are closer to ( R, C ) than to any other integer vector in H , wewould have obtained the first part e h ( Z ) √ m + n (4 π ) ( m + n − / p det q | H of formula (7.2.1). Under the conditions of Theorem 7.2 we have c ( δ ) ≤ exp n − µ ν o ≤ c ( δ )for some constants c ( δ ) , c ( δ ) > δ ≤ z ij ≤ − δ are,generally speaking, unavoidable. If the entries z ij of the maximum entropy matrixare uniformly small, then the distribution of the random vector Y of row and col-umn sums of the random Bernoulli matrix X is no longer approximately Gaussianbut approximately Poisson and formula (7.2.1) does not give correct asymptotics.The sparse case of small row and column sums is investigated in [G+06].More generally, to have some analytic formula approximating | A ( R, C ) | weneed certain regularity conditions on ( R, C ), since the number | A ( R, C ) | becomesvolatile when the margins ( R, C ) approach the boundary of the Gale-Ryser condi-tions, cf. [JSM92]. By requiring that the entries of maximum entropy matrix Z areseparated from both 0 and 1, we ensure that the margins ( R, C ) remain sufficientlyinside the polyhedron defined by the Gale-Ryser inequality and the number of 0-1matrices with row sums R and column sums C changes sufficiently smoothly when R and C change. (7.3) An asymptotic formula for | A + ( R, C ) | . As in Theorem 6.3, let X be thematrix of independent geometric random variables such that E X = Z + , where Z + is the maximum entropy matrix. Let Y be the random ( m + n )-vector obtained bycomputing the row and column sums of X . Then, by Theorem 6.3, we have(7.3.1) | A + ( R, C ) | = e g ( Z + ) Pr (cid:8) X ∈ A + ( R, C ) (cid:9) = e g ( Z + ) Pr (cid:8) Y = ( R, C ) (cid:9) . In [BH09] we show how to estimate the probability that Y = ( R, C ) using the LocalCentral Limit Theorem with the Edgeworth correction.26et Z + = ( z ij ) be the maximum entropy matrix as in Lemma 5.3. Let us considerthe quadratic form q + : R m + n −→ R defined by q + ( s, t ) = 12 X ≤ i ≤ m ≤ j ≤ n (cid:0) z ij + z ij (cid:1) ( s i + t j ) for s = ( s , . . . , s m ) and t = ( t , . . . , t n ) . Let H ⊂ R m + n be the hyperplane defined by (7.1.2). The restriction q + | H of q + onto H is a positive definite quadratic form and we define its determinantdet q + | H as the product of the non-zero eigenvalues of q + . We consider the Gaussianprobability measure on H with the density proportional to e − q + and define randomvariables φ + , ψ + : H −→ R by φ + ( s, t ) = 16 X ≤ i ≤ m ≤ j ≤ n z ij (1 + z ij ) (2 z ij + 1) ( s i + t j ) and ψ + ( s, t ) = 124 X ≤ i ≤ m ≤ j ≤ n z ij (1 + z ij ) (cid:0) z ij + 6 z ij + 1 (cid:1) ( s i + t j ) for ( s, t ) = ( s , . . . , s m ; t , . . . , t n ) . We let µ + = E φ and ν + = E ψ + . (7.4) Theorem. Let us fix < δ < , let R = ( r , . . . , r m ) and C = ( c , . . . , c n ) be margins such that m ≥ δn and n ≥ δm . Let Z + = ( z ij ) be the maximum entropymatrix as in Lemma 5.3. Suppose that δτ ≤ z ij ≤ τ for all i, j for some τ ≥ δ .Let the quadratic form q + and values µ + and ν + be as defined in Section 7.3.Then the number (7.4.1) e g ( Z + ) √ m + n (4 π ) ( m + n − / p det q + | H exp n − µ + ν + o approximates the number | A + ( R, C ) | of as m, n −→ + ∞ within a relative errorwhich approaches 0 as m, n −→ + ∞ . More precisely, for any < ǫ ≤ / , thevalue of (7.4.1) approximates | A + ( R, C ) | within relative error ǫ provided m, n ≥ (cid:18) ǫ (cid:19) γ ( δ ) or some γ ( δ ) > . All the ingredients of formula (7.4.1) are efficiently computable, in time poly-nomial in m + n , see [BH09] for details. If all row sums are equal then we have z ij = c j /m by symmetry and if all column sums are equal, we have z ij = r i /n . Inparticular, if all row sums are equal and if all column sums are equal, we obtainthe asymptotic formula of [C+07]. The term e g ( Z + ) √ m + n (4 π ) ( m + n − / p det q + | H corresponds to the Gaussian approximation for the distribution of the random vec-tor Y in (7.3.1), while exp n − µ + ν + o is the Edgeworth correction factor.While the requirement that the entries of the maximum entropy matrix Z + are separated from 0 is unavoidable (if z ij are small, the coordinates of Y areasymptotically Poisson, not Gaussian, see [GM08] for the analysis of the sparsecase), it is not clear whether the requirement that all z ij are within a constantfactor of each other is indeed needed. It could be that around certain margins ( R, C )the number | A + ( R, C ) | experiences sudden jumps, as the margins change, whichprecludes the existence of an analytic expression similar to (7.4.1) for | A + ( R, C ) | .A candidate for such an abnormal behavior is supplied by the margins discussedin Section 6.7. Namely, if m = n and R = C = ( λn, n, . . . , n ) then for λ = 2all the entries of the maximum entropy matrix Z + are O (1), while for λ = 3 thefirst entry z grows linearly in n . Hence for some particular λ between 2 and 3a certain “phase transition”’ occurs: the entry z jumps from O (1) to Ω( n ). Itwould be interesting to find out if there is indeed a sharp change in | A + ( R, C ) | when λ changes from 2 to 3.
8. Concluding remarks
Method of Sections 6 and 7 have been applied to some related problems, suchas counting higher-order “tensors” with 0-1 or non-negative integer entries andprescribed sums along coordinate hyperplanes [BH10a] and counting graphs withprescribed degrees of vertices [BH10b], which corresponds to counting symmetric0-1 matrices with zero trace and prescribed row (column) sums.In general, the problem can be described as follows: we have a polytope P ⊂ R d defined as the intersection of the non-negative orthant R d + with an affine subspace A in R d and we construct a d -vector X of independent Bernoulli (in the 0-1 case)or geometric (in the non-negative integer case) random variables, so that the ex-pectation of X lies in A and the distribution of X is uniform, when restricted ontothe set of 0-1 or integer points in P . Random vector X is determined by its ex-pectation E X = z and z is found by solving a convex optimization problem on28 . Since vector X conditioned on the set of 0-1 or non-negative integer vectors in P is uniform, the number of 0-1 or non-negative integer points in P is expressedin terms of the probability that X lies in A . Assuming that the affine subspace A is defined by a system Ax = b of linear equations, where A is k × d matrix ofrank k < d , we define a k -vector Y = AX of random variables and estimate theprobability that Y = b by using a Local Central Limit Theorem type argument.Here we essentially use that E Y = b , since the expectation of X lies in A .Not surprisingly, the argument works the easiest when the codimension k ofthe affine subspace (and hence the dimension of vector Y ) is small. In particular,counting higher-order “tensors” is easier than counting matrices, the need in theEdgeworth correction factor, for example, disappears as the vector Y turns out tobe closer in distribution to a Gaussian vector, see [BH10a]. Once a Gaussian oralmost Gaussian estimate for the probability Pr (cid:8) Y = b (cid:9) is established, one canclaim a certain concentration of a random 0-1 or integer point in P around z = E X . References [Ba07] A. Barvinok,
Brunn-Minkowski inequalities for contingency tables and integer flows ,Advances in Mathematics (2007), 105–122.[Ba09] A. Barvinok,
Asymptotic estimates for the number of contingency tables, integer flows,and volumes of transportation polytopes , International Mathematics Research Notices (2009), 348–385.[Ba10a] A. Barvinok,
On the number of matrices and a random matrix with prescribed row andcolumn sums and 0-1 entries , Advances in Mathematics (2010), 316–339.[Ba10b] A. Barvinok,
What does a random contingency table look like? , Combinatorics, Prob-ability and Computing (2010), 517–539.[BH09] A. Barvinok and J.A. Hartigan, An asymptotic formula for the number of non-negativeinteger matrices with prescribed row and column sum , preprint arXiv:0910.2477 (2009).[BH10a] A. Barvinok and J.A. Hartigan,
Maximum entropy Gaussian approximation for thenumber of integer points and volumes of polytopes , Advances in Applied Mathematics (2010), 252–289.[BH10b] A. Barvinok and J.A. Hartigan, The number of graphs and a random graph with agiven degree sequence , preprint arXiv:1003.0356 (2010).[Be74] E. Bender,
The asymptotic number of non-negative integer matrices with given rowand column sums , Discrete Math. (1974), 217–223.[BR91] R.A. Brualdi and H.J. Ryser, Combinatorial Matrix Theory , Encyclopedia of Mathe-matics and its Applications, , Cambridge University Press, Cambridge, 1991.[C+08] E.R. Canfield, C. Greenhill, and B.D. McKay, Asymptotic enumeration of dense 0-1 matrices with specified line sums , Journal of Combinatorial Theory. Series A (2008), 32–66.[C+07] E.R. Canfield and B.D. McKay,
Asymptotic enumeration of contingency tables withconstant margins , preprint arXiv math.CO/0703600, Combinatorica, to appear (2007).[Ga02] R.J. Gardner,
The Brunn-Minkowski inequality , Bull. Amer. Math. Soc. (N.S.) (2002), 355–405.[Go63] I.J. Good, Maximum entropy for hypothesis formulation, especially for multidimen-sional contingency tables , Ann. Math. Statist. (1963), 911–934.[Go76] I.J. Good, On the application of symmetric Dirichlet distributions and their mixturesto contingency tables , Ann. Statist. (1976), 1159–1189. GC77] I.J. Good and J.F. Crook,
The enumeration of arrays and a generalization related tocontingency tables , Discrete Mathematics (1977), 23–45.[G+06] C. Greenhill, B.D. McKay, and X. Wang, Asymptotic enumeration of sparse 0-1 ma-trices with irregular row and column sums , Journal of Combinatorial Theory. Series A (2006), 291–324.[GM08] C. Greenhill and B.D. McKay,
Asymptotic enumeration of sparse nonnegative integermatrices with specified row and column sums , Advances in Applied Mathematics (2008), 59–481.[JSM92] M. Jerrum, A. Sinclair and B. McKay, When is a graphical sequence stable? , RandomGraphs, Vol. 2 (Pozna´n, 1989), Wiley-Intersci. Publ., Wiley, New York, 1992, pp. 101–115.[Kh57] A.I. Khinchin,
Mathematical Foundations of Information Theory , Dover PublicationsInc., New York, N. Y., 1957.[LW01] J.H. van Lint and R.M. Wilson,
A Course in Combinatorics. Second edition , Cam-bridge University Press, Cambridge, 2001.[NN94] Y. Nesterov and A. Nemirovskii,
Interior-Point Polynomial Algorithms in ConvexProgramming , SIAM Studies in Applied Mathematics, 13, Society for Industrial andApplied Mathematics (SIAM), Philadelphia, PA, 1994.
Department of Mathematics, University of Michigan, Ann Arbor, MI 48109-1043,USA
E-mail address : barvinok @ umich.eduumich.edu