Proof of an entropy conjecture of Leighton and Moitra
aa r X i v : . [ m a t h . C O ] M a r Proof of an entropy conjecture ofLeighton and Moitra
H¨useyin Acan ∗† [email protected] Pat Devlin ∗‡ [email protected] Jeff Kahn ∗‡ [email protected] Abstract
We prove the following conjecture of Leighton and Moitra. Let T be a tournament on [ n ] and S n the set of permutations of [ n ]. For anarc uv of T , let A uv = { σ ∈ S n : σ ( u ) < σ ( v ) } . Theorem.
For a fixed ε >
0, if P is a probability distribution on S n such that P ( A uv ) > / ε for every arc uv of T , then the binaryentropy of P is at most (1 − ϑ ε ) log n ! for some (fixed) positive ϑ ε .When T is transitive the theorem is due to Leighton and Moitra; forthis case we give a short proof with a better ϑ ε . In what follows we use log for log and H ( · ) for binary entropy. The purposeof this note is to prove the following natural statement, which was conjec-tured by Tom Leighton and Ankur Moitra [6] (and told to the third authorby Moitra in 2008). Theorem 1.
Let T be a tournament on [ n ] and σ a random (not necessarilyuniform) permutation of [ n ] satisfying:for each arc uv of T , P ( σ ( u ) < σ ( v )) > / ε . (1) AMS 2010 subject classification: 05C20, 05D40, 94A17, 06A07Key words and phrases: entropy, permutations, tournaments, regularity ∗ Department of Mathematics, Rutgers University † Supported by National Science Foundation Fellowship (Award No. 1502650). ‡ Supported by NSF grant DMS1501962. hen H ( σ ) ≤ (1 − ϑ ) log n ! , (2) where ϑ > depends only on ε . (We will usually think of permutations as bijections σ : [ n ] → [ n ]). Theoriginal motivation for Leighton and Moitra came mostly from questionsabout sorting partially ordered sets; see [6] for more on this.For the special case of transitive T , Theorem 1 was proved in [6] with ϑ ε = Cε . Note that for a typical (a.k.a. random ) T , the conjecture’s hy-pothesis is unachievable, since, as shown long ago by Erd˝os and Moon [2],no σ agrees with T on more than a (1 / o (1))-fraction of its arcs. In fact, itseems natural to expect that transitive tournaments are the worst instances,being the ones for which the hypothesized agreement is easiest to achieve.From this standpoint, what we do here may be considered somewhat unsat-isfactory, as our ϑ ’s are quite a bit worse than those in [6]. For transitive T it’s easy to see [6, Claim 4.14] that one can’t take ϑ greater than 2 ε , whichseems likely to be close to the truth. We make some progress on this, givinga surprisingly simple proof of the following improvement of [6]. Theorem 2.
For T , P , σ as Theorem 1 with T transitive, H ( σ ) ≤ (1 − ε / n log n. The proof of Theorem 1 is given in Section 3 following brief preliminariesin Section 2. The underlying idea is similar to that of [6], which in turn wasbased on the beautiful tournament ranking bound of W. Fernandez de laVega [1]; see Section 3 (end of “Sketch”) for an indication of the relationto [6]. Theorem 2 is proved in Section 4.
Usage
In what follows we assume n is large enough to support our argumentsand pretend all large numbers are integers.As usual G [ X ] is the subgraph of G induced by X ; we use G [ X, Y ] forthe bipartite subgraph induced (in the obvious sense) by disjoint X and Y .For a digraph D , D [ X ] and D [ X, Y ] are used analogously. For both graphsand digraphs, we use | · | for number of edges (or arcs).2lso as usual, the density of a pair (
X, Y ) of disjoint subsets of V ( G ) is d ( X, Y ) = d G ( X, Y ) = | G [ X, Y ] | / ( | X || Y | ), and we extend this to bipartitedigraphs D in which at most one of D ∩ ( X × Y ) , D ∩ ( Y × X ) is nonempty. (3)For a digraph D , D r is the digraph gotten from D by reversing its arcs.Write S n for the set of permutations of [ n ]. For σ ∈ S n , we use T σ for the corresponding (transitive) tournament on [ n ] (that is, uv ∈ T σ iff σ ( u ) < σ ( v )) and for a digraph D (on [ n ]) definefit( σ, D ) = | D ∩ T σ | − | D r ∩ T σ | (e.g. when D is a tournament, this is a measure of the quality of σ as aranking of D ). Regularity
Here we need just Szemer´edi’s basic notion [7] of a regular pair and avery weak version (Lemma 3) of his Regularity Lemma. As usual a bipartitegraph H on disjoint X ∪ Y is δ - regular if | d H ( X ′ , Y ′ ) − d H ( X, Y ) | < δ whenever X ′ ⊆ X , Y ′ ⊆ Y , | X ′ | > δ | X | and | Y ′ | > δ | Y | , and we extend thisin the obvious way to the situation in (3). It is easy to see that if a bigraph H is δ -regular then its bipartite complement is as well; this implies that fora tournament T on [ n ] and X , Y disjoint subsets of [ n ], T ∩ ( X × Y ) is δ -regular if and only if T ∩ ( Y × X ) is . (4)The following statement should perhaps be considered folklore, thoughsimilar results were proved by J´anos Koml´os, circa 1991 (see [5, Sec. 7.3]). Lemma 3.
For each δ > there is a β > − δ − O (1) such that for any bigraph H on X ∪ Y with | X | , | Y | ≥ n , there is a δ -regular pair ( X ′ , Y ′ ) with X ′ ⊆ X, Y ′ ⊆ Y and each of | X ′ | , | Y ′ | at least βn . Corollary 4.
For each δ > , β as in Lemma 3 and digraph G = ( V, E ) ,there is a partition L ∪ R ∪ W of V such that E ∩ ( L × R ) is δ -regular and min {| L | , | R |} ≥ β | V | / . Proof.
Let X ∪ Y be an (arbitrary) equipartition of V and apply Lemma 3to the undirected graph H underlying the digraph G ∩ ( X × Y ).3 Proof of Theorem 1
We now assume that σ drawn from the probability distribution P on S n satisfies (1) and try to show (2) (with ϑ TBA). We use E for expectationw.r.t. P and µ for uniform distribution on S n . Sketch and connection with [6]We will produce S , . . . , S m ⊆ T with S i ⊆ L i × R i for some disjoint L i , R i ⊆ [ n ], satisfying:(i) with k S i k := min {| L i | , | R i |} , P k S i k = Ω( n log n ) (where the impliedconstant depends on ε );(ii) each S i is δ -regular (with δ = δ ε TBA);(iii) for all i < j , either ( L i ∪ R i ) ∩ ( L j ∪ R j ) = ∅ or L j ∪ R j is containedin one of L i , R i (note this implies the S i ’s are disjoint).Let A i = { fit( σ, S i ) > ε | S i |} and Q = { P {k S i k : A i occurs } = Ω( n log n ) } .The main points are then:(a) P ( Q ) is bounded below by a positive function of ε . (This is just (i)together with a couple applications of Markov’s Inequality.)(b) Regularity of S i implies µ ( A i ) ≤ exp[ − Ω( k S i k )].(c) Under (iii), for any I ⊆ [ m ], µ ( ∩ i ∈ I A i ) < exp[ − P i ∈ I Ω( k S i k )](a weak version of independence of the A i ’s under µ ).And these points easily combine to give (2) (see (6) and (8)).For the transitive case in [6] most of this argument is unnecessary; inparticular, regularity disappears and there is a natural decomposition of T into S i ’s: Supposing T = { ab : a < b } and (for simplicity) n = 2 k , we maytake the S i ’s to be the sets L i × R i with ( L i , R i ) running over pairs([(2 s − − j n + 1 , (2 s − − j n ] , [(2 s − − j n + 1 , s − j n ]) , (5)with j ∈ [ k ] and s ∈ [2 j − ]. (As mentioned earlier, this decomposition ofthe (identity) permutation (1 , . . . , n ) also provides the framework for [1].)4fter some translation, our argument (really, a fairly small subset thereof)then specializes to essentially what’s done in [6].Set δ = . ε and let β be half the β of Lemma 3 and Corollary 4. We usethe corollary to find a rooted tree T each of whose internal nodes has degree(number of children) 2 or 3, together with disjoint subsets S , S , . . . , S m of(the arc set of) T , corresponding to the internal nodes of T . The nodes of T will be subsets of [ n ] (so the size, | U | , of a node U is its size as a set).To construct T , start with root V = [ n ] and repeat the following for k = 1 , . . . until each unprocessed node has size less than (say) t := √ n .Let V k be an unprocessed node of size at least t and apply Corollary 4 to T [ V k ] to produce a partition V k = L k ∪ R k ∪ W k , with | L k | , | R k | > β | V k | and S k := T ∩ ( L k × R k ) δ -regular of density at least 1/2. (Note (4) says we canreverse the roles of L k and R k if the density of T ∩ ( L k × R k ) is less than1/2.) Add L k , R k , W k to T as the children of V k and mark V k “processed.”(Note the V k ’s are the internal nodes of T ; nodes of size less then t are notprocessed and are automatically leaves. Note also that there is no restrictionon | W k | and that, for k > V k is equal to one of L i , R i , W i for some i < k .)Let m be the number of internal nodes of T (the final tree). Note thatthe leaves of T have size at most t and that the S i ’s satisfy (ii) and (iii) ofthe proof sketch; that they also satisfy (i) is shown by the next lemma.Set Λ = P mi =1 | V i | ;this quantity will play a central role in what follows. Lemma 5. Λ ≥ n log n Proof.
This will follow easily from the next general (presumably known)observation, for which we assume T is a tree satisfying: • the nodes of T are subsets of S , an s -set which is also the root of T ; • the children of each internal node U of T form a partition of U withat most b blocks; • the leaves of T are U , . . . , U r , with | U i | = u i ≤ t (any t ) and depth d i . Lemma 6.
With the setup above, P u i d i ≥ s log b ( s/t ) . (Of course this is exact if T is the complete b -ary tree of depth d and allleaves have size 2 − b s ). 5 roof. Recall that the relative entropy between probability distributions p and q on [ r ] is D ( p k q ) = X p i log( q i /p i ) ≤ p i = u i /s and q i the probability that the ordinary random walk down thetree ends at u i . In particular q i ≥ b − d i , which, with nonpositivity of D ( p k q )and the assumption u i ≤ t , gives X ( u i /s ) d i log b ≥ X ( u i /s ) log(1 /q i ) ≥ X ( u i /s ) log( s/u i ) ≥ log( s/t ) . The lemma follows.This gives Lemma 5 since P | V i | = P U | U | d ( U ), with U ranging overleaves of T (and d ( · ) again denoting depth). Lemma 7.
The number m of internal nodes of T is less than n .Proof. A straightforward induction shows that the number of leaves of arooted tree is 1 + P ( b ( w ) − w ranges over internal nodes and b denotes number of children. The lemma follows since here the number ofleaves is at most n (actually at most 3 √ n ) and each d ( w ) is at least 2.Recalling that A i = { σ ∈ S n : fit( σ, S i ) ≥ ε | S i |} and that E refers to P ,we have E [fit( σ, S i )] ≥ ε | S i | , which with E [fit( σ, S i )] ≤ P ( A i ) | S i | + (1 − P ( A i )) ε | S i | ≤ ( P ( A i ) + ε ) | S i | gives P ( A i ) ≥ ε (essentially Markov’s Inequality applied to | S i | − fit( σ, S i )).Set ξ i = | V i | A i and ξ = P i ξ i , and let Q be the event { ξ ≥ ε Λ / } .Then E [ ξ i ] = | V i | P ( A i ) ≥ ε | V i | , implying E [ ξ ] = P E [ ξ i ] ≥ ε Λ , and (since ξ i ≤ | V i | ) ξ ≤ Λ; so using Markov’s Inequality as above gives P ( Q ) ≥ ε/ σ chosen from S n according to P , we have H ( σ ) ≤ − P ( Q )) log n ! + P ( Q ) log | Q | = 1 + log n ! + P ( Q ) log µ ( Q ) ≤ n ! + ( ε/
2) log µ ( Q ) (6)(recall µ is the uniform measure on S n ).Let J = { I ⊆ [ m ] : P i ∈ I | V i | ≥ ε Λ / } I ∈ J , let A I = ∩ i ∈ I A i . Set b = ε δβ /
33 (7)(see (12) for the reason for the choice of b ). We will show, for each I ∈ J , µ ( A I ) ≤ e − bε Λ / , (8)which implieslog µ ( Q ) = log µ ( ∪ I ∈J A I ) ≤ log |J | − ( bε Λ log e ) / ≤ n − ( bε Λ log e ) / , the second inequality following from |J | ≤ m together with Lemma 7. With c = ε δβ / < ( bε log e ) /
4, this bounds (for large n ) the r.h.s. of (6) by(1 − εc/
2) log n ! , which proves Theorem 1 with ϑ = ε δβ /
300 = exp[ − ε − O (1) ].The rest of our discussion is devoted to the proof of (8). For a digraph D ⊆ L × R with L, R disjoint subsets of V , say a pair ( X, Y ) of disjointsubsets of [ n ] with | X | = | L | , | Y | = | R | is safe for D iffit( τ, D ) < ε | L || R | / τ : L ∪ R → X ∪ Y with τ ( L ) = X (where fit( τ, D ) hasthe obvious meaning). We also say σ ∈ S n is safe for D if ( σ ( L ) , σ ( R ))is. Note that since S i has density at least 1/2 in L i × R i , the σ ’s in A i areunsafe for S i . Lemma 8.
Assume the above setup with | L | + | R | = l and | L | = γl , and set λ = 2 δ and ζ = εδγ (1 − γ ) / . Let I ∪ · · · ∪ I r be the natural partition of X ∪ Y into intervals of size λl . If D is δ -regular and | X ∩ I j | = ( γλ ± ζ ) l ∀ j ∈ [ r ] , (10) then ( X, Y ) is safe for D . (Of course an interval of Z = { i < · · · < i u } is one of the sets { i s , . . . , i s + t } .) Proof.
For τ as in the line after (9), let L j = L ∩ τ − ( I j ) and R j = R ∩ τ − ( I j )( j ∈ [ r ]). Then | fit( τ, D ) | ≤ X ≤ i
For D and parameters as in Lemma 8, and σ uniform from S n , Pr( σ is unsafe for D ) < r exp[ − ζ l/λ ] . Proof.
Let (
X, Y ) = ( σ ( L ) , σ ( R )). Once we’ve chosen X ∪ Y (determining I , . . . , I r ), 2 exp[ − ζ l/λ ] is the usual Hoeffding bound [3, Eq. (2.3)] on theprobability that X violates (10) for a given j . (The bound may be morefamiliar when elements of X ∪ Y are in X independently, but also appliesto the hypergeometric r.v. | X ∩ I j | ; see e.g. [4, Thm. 2.10 and (2.12)].) Proof of (8) . Let B i = { σ ∈ S n : σ is unsafe for S i } and B I = ∩ i ∈ I B i . Then A i ⊆ B i (as noted above) and (therefore) A I ⊆ B I .Moreover—perhaps the central point—the B i ’s are independent, since B i depends only on the relative positions of σ ( L i ) and σ ( R i ) within σ ( V i ).8n the other hand, Corollary 9, applied with D = S i (so L = L i , R = R i , l = | L i | + | R i | and γ = | L i | /l ∈ ( β, − β )) givesPr( B i ) < r exp[ − ζ l/λ ] < r exp[ − ε δβ l/ < r exp[ − ε δβ | V i | / < e − b | V i | . (12)(Recall b was defined in (7); since we assume | V i | is large ( | V i | > t = √ n ),the choice leaves a little room to absorb the 2 r .) And of course (12) and theindependence of the B i ’s give (8). Theorem 2 is an easy consequence of the next observation.
Lemma 10.
Let Y a random m -subset of [2 m ] satisfying E |{ ( a, b ) : a < b, a ∈ [2 m ] \ Y , b ∈ Y }| > ( + ε ) m . (13) Then H ( Y ) < (1 − ε / m . To get Theorem 2 from this, let T = { ab : a < b } and, for simplicity, n = 2 k , and decompose T = S ( L i × R i ) as in (5). For each i , say with | L i | (= | R i | ) = m i , let Y i ⊆ [2 m i ] consist of the indices of positions within σ ( L i ∪ R i ) occupied by σ ( R i ); that is, if σ ( L i ∪ R i ) = { j < · · · < j m i } ,then Y i = { l : j l ∈ σ ( R i ) } . Then Lemma 10 (its hypothesis provided by(1)) gives H ( Y i ) ≤ (1 − ε / m i ;so, since σ is determined by the Y i ’s, we have H ( σ ) ≤ P H ( Y i ) ≤ (1 − ε / P (2 m i ) = (1 − ε / n log n. Remark.
Note that the Ω( ε ) of Theorem 2 is the best one can do withoutmore fully exploiting (1) (that is, beyond (13) for the ( L i , R i , Y i )’s, which isall we are using). Proof of Lemma 10.
For a ∈ [2 m ], set P ( a ∈ Y ) = 1 / δ a . Then H ( Y ) ≤ P a H (1 / δ a ) ≤ P a (1 − δ a )(where the 2 could actually be 2 log e ); so it is enough to show P δ a ≥ ε m/ . m -subset Y of [2 m ], we have f ( Y ) := |{ ( a, b ) : a < b, a ∈ [2 m ] \ Y, b ∈ Y }| = P b ∈ Y ( b − − (cid:0) m (cid:1) = P b ∈ Y b − (cid:0) m +12 (cid:1) . (the first sum counts pairs ( a, b ) with a < b and b ∈ Y , and (cid:0) m (cid:1) is thenumber of such pairs with a also in Y ); so we have( + ε ) m < E f ( Y ) = P ( + δ b ) b − (cid:0) m +12 (cid:1) = P δ b b + m / , implying P δ b b > εm . Combining this with 2 m P δ b > δ b ≥ P δ b b , we have P δ b > δ b > εm/ P δ b ≥ P δ b > δ b ≥ m ( εm/ = ε m/ . References [1] W. Fernandez de la Vega, On the maximal cardinality of a consistentset of arcs in a random tournament,
J. Comb. Th. Series B (1983),328-332.[2] P. Erd˝os and J. Moon, On sets of consistent arcs in a tournament,
Canad. Math. Bull. (1965), 269-271.[3] W. Hoeffding, Probability inequalities for sums of bounded randomvariables, J. Amer. Statistical Assoc. (1963), 13-30.[4] S. Janson, T. Luczak and A. Ruci´nski, Random Graphs , Wiley, NewYork, 2000.[5] J. Koml´os and M. Simonovits, Szemer´edi’s regularity lemma and itsapplications in graph theory,
Combinatorics, Paul Erd˝os is eighty, Vol.2 (Keszthely, 1993) , 295-352, Bolyai Soc. Math. Stud. 2, J´anos BolyaiMath. Soc., Budapest, 1996.[6] T. Leighton and A. Moitra, On Entropy and Extensions of Posets,manuscript 2011. http://people.csail.mit.edu/moitra/docs/poset.pdf.[7] E. Szemer´edi, Regular Partitions of Graphs, pp. 399-401 in