Graphs of Joint Types, Noninteractive Simulation, and Stronger Hypercontractivity
11 Graphs of Joint Types, Noninteractive Simulation,and Stronger Hypercontractivity
Lei Yu, Venkat Anantharam, and Jun Chen
Abstract
In this paper, we introduce the concept of a type graph , namely a bipartite graph induced by a joint type. Westudy the maximum edge density of induced bipartite subgraphs of this graph having a number of vertices on eachside on an exponential scale. This can be seen as an isoperimetric problem. We provide asymptotically sharp boundsfor the exponent of the maximum edge density as the blocklength goes to infinity. We also study the biclique rateregion of the type graph, which is defined as the set of ( R , R ) such that there exists a biclique of the type graphwhich has respectively e nR and e nR vertices on two sides. We provide asymptotically sharp bounds for the bicliquerate region as well. We then apply our results and proof ideas to noninteractive simulation problems. We completelycharacterize the exponents of maximum and minimum joint probabilities when the marginal probabilities vanishexponentially fast with given exponents. These results can be seen as strong small-set expansion theorems. Weextend the noninteractive simulation problem by replacing Boolean functions with arbitrary nonnegative functions,and obtain new hypercontractivity inequalities which are stronger than the common hypercontractivity inequalities.Furthermore, as an application of our results, a new outer bound for the zero-error capacity region of the binaryadder channel is provided, which improves the previously best known bound, due to Austrin, Kaski, Koivisto, andNederlof. Our proofs in this paper are based on the method of types, linear algebra, and coupling techniques. Index Terms
Graphs of joint types, noninteractive simulation, small-set expansion, isoperimetric inequalities, hypercontrac-tivity, zero-error capacity of the binary adder channel
I. I
NTRODUCTION
Let X and Y be two finite sets. Let T X be an n -type on X , i.e. an empirical distribution of sequences from X n ,with n considered to be part of the definition of T X . Let T T X be the type class with respect to T X , i.e., the set ofsequences of length n having the type T X ). Similarly, let T XY be a joint n -type on X × Y and T T XY the joint typeclass with respect to T XY . Obviously, T T XY ⊆ T T X × T T Y , where T X , T Y are the marginal types corresponding tothe joint type T XY . In this paper, we consider the undirected bipartite graph G T XY whose vertex set is T T X ∪ T T Y and whose edge set can be identified with T T XY , defined as follows. Consider x ∈ T T X and y ∈ T T Y as verticesof G T XY . Two vertices x , y are joined by an edge if and only if ( x , y ) ∈ T T XY . We term G T XY the graph of T XY or, more succinctly, a type graph . For brevity, when there is no ambiguity, we use the abbreviated notation G for G T XY . For subsets A ⊆ T T X , B ⊆ T T Y , we obtain an induced subgraph G [ A, B ] of G , whose vertex set is theunion of two parts A and B , and where x , y are joined by an edge if and only if they are joined by an edge in G .For the induced subgraph G [ A, B ] , the (edge) density ρ ( G [ A, B ]) is defined as ρ ( G [ A, B ]) := of edges in G [ A, B ] | A | | B | . Obviously, ρ ( G [ A, B ]) = | ( A × B ) ∩T TXY | | A || B | . It is interesting to observe that ρ ( G ) = (cid:12)(cid:12)(cid:12)(cid:12) T T ( n ) XY (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T ( n ) X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T ( n ) Y (cid:12)(cid:12)(cid:12)(cid:12) . = e − nI T ( n ) ( X ; Y ) for any sequence of joint types (cid:110) T ( n ) XY (cid:111) . Moreover, if we only fix T X , T Y , A , and B , then T XY ∈ C n ( T X , T Y ) (cid:55)→ L. Yu is with the School of Statistics and Data Science, Nankai University, Tianjin 300071, China (e-mail: [email protected]).V. Anantharam is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720,USA (e-mail: [email protected]). J. Chen is with Department of Electrical and Computer Engineering, McMaster University, Hamilton,ON L8S 4K1, Canada (e-mail: [email protected]). Research of the first two authors was supported by the NSF grants CNS–1527846,CCF–1618145, CCF-1901004, CIF-2007965, the NSF Science & Technology Center grant CCF–0939370 (Science of Information), and theWilliam and Flora Hewlett Foundation supported Center for Long Term Cybersecurity at Berkeley. Throughout this paper, we write a n . = b n to denote a n = b n e no (1) . a r X i v : . [ c s . I T ] F e b ρ ( G T XY [ A, B ]) forms a probability mass function, i.e., ρ ( G T XY [ A, B ]) ≥ and (cid:80) T XY ∈ C n ( T X ,T Y ) ρ ( G T XY [ A, B ]) =1 , where C n ( T X , T Y ) denotes the set of joint types T XY with marginals T X , T Y . We term this distribution a typedistribution , which, roughly speaking, can be considered as a generalization from binary alphabets to arbitrary finitealphabets of the classic distance distribution in coding theory; please refer to [1] for the distance distribution of asingle code, and [2] for the distance distribution between two codes.Given ≤ M ≤ |T T X | , ≤ M ≤ |T T Y | , define the maximal density of subgraphs with size ( M , M ) as Γ n ( M , M ) := max A ⊆T TX ,B ⊆T TY : | A | = M , | B | = M ρ ( G [ A, B ]) . (1)Recall that T X | Y and T Y | X denote the conditional types corresponding to the joint type T XY . For a sequence x ∈ T T X , let T T Y | X ( x ) denote the corresponding conditional type class. Since N := (cid:12)(cid:12) T T Y | X ( x ) (cid:12)(cid:12) is independent of x ∈ T T X , the degrees of the vertices x ∈ T T X are all equal to the constant N . Similarly, the degrees of the vertices y ∈ T T Y are all equal to the constant N := (cid:12)(cid:12) T T X | Y ( y ) (cid:12)(cid:12) . Hence we have | B | ρ ( G [ A, B ]) + | B c | ρ ( G [ A, B c ]) = N , where B c := T T Y \ B . Thus, over A, B with fixed sizes, maximizing ρ ( G [ A, B ]) is equivalent to minimizing ρ ( G [ A, B c ]) . In other words, determining the maximal density is in fact an edge-isoperimetric problem. Further-more, obviously, Γ n ( M , M ) is nonincreasing in one parameter given the other parameter.Let R ( n ) X := (cid:26) n log M : M ∈ [ |T T X | ] (cid:27) and R ( n ) Y := (cid:26) n log M : M ∈ [ |T T Y | ] (cid:27) . (2)Given a joint n -type T XY , define the exponent of maximal density for a pair ( R , R ) ∈ R ( n ) X × R ( n ) Y as E n ( R , R ) := − n log Γ n (cid:0) e nR , e nR (cid:1) . (3)If the edge density of a subgraph in a bipartite graph G is equal to , then this subgraph is called a biclique of G .Along these lines, we define the biclique rate region of T XY as R n ( T XY ) := (cid:110) ( R , R ) ∈ R ( n ) X × R ( n ) Y : Γ n (cid:0) e nR , e nR (cid:1) = 1 (cid:111) . Observe that any n -type T XY can also be viewed as a kn -type for k ≥ . With an abuse of notation, we continueto use T XY to denote the corresponding kn -type. With this in mind, for an n -type T XY define the asymptoticexponent of maximal density for a pair ( R , R ) ∈ R ( n ) X × R ( n ) Y as E ( R , R ) := lim k →∞ − kn log Γ kn (cid:16) e knR , e knR (cid:17) , (4)and the asymptotic biclique rate region as R ( T XY ) := closure (cid:91) k ≥ R kn ( T XY ) . (5)Han and Kobayashi [3] introduced a concept similar to the asymptotic biclique rate region defined here. However,roughly speaking, their definition is an approximate version of our definition, in the sense that in their definition,for a distribution P XY (not necessarily a type), type classes are replaced with the typical sets with respect to P XY ,and the constraint Γ n (cid:0) e nR , e nR (cid:1) = 1 is replaced with Γ n (cid:16) e nR ( n )1 , e nR ( n )2 (cid:17) → as n → ∞ for a sequence oftypes T ( n ) XY converging to P XY and a sequence of pairs (cid:16) R ( n )1 , R ( n )2 (cid:17) converging to ( R , R ) .In this paper we are interested in characterizing the asymptotic behavior of E n and R n ( T XY ) , as n → ∞ . We use the notation [ m : n ] := { m, m + 1 , ..., n } and [ n ] := [1 : n ] . By definition, it is easy to see that R kn ( T XY ) is nondecreasing in k . Hence R ( T XY ) = closure lim k →∞ R kn ( T XY ) and, moreover, R ( T XY ) is only dependent on T XY and independent of the value of n we attribute to T XY . A. Motivations
Our motivations for studying the type graph have the following three aspects.1) The method of types is a classic and powerful tool in information theory. In this method, the basic unit is the(joint) type or (joint) type class. To the authors’ knowledge, it is not well understood how the sequence pairsare distributed in a joint type class. Our study of the type graph deepens the understanding of the distribution(or structure) of sequence pairs in a joint type class.2) Observe that the type graph can be constructed by permuting two sequences x , y respectively. Thus, unlikeother well-studied large graphs, the type graph is deterministic rather than stochastic. There are relatively fewworks focusing on deterministic large graphs. Hence, as a purely combinatorial problem, studying the typegraph is of independent interest.3) The maximal and minimal density problems for type graphs are closely related to noninteractive simula-tion problems (or isoperimetric problems) and hypercontractivity inequalities. Our results and proof ideascan be applied to prove bounds in noninteractive simulation problems and to strengthen hypercontractivityinequalities. B. Main Contributions
Our contributions in this paper mainly consist of three parts.1) We first completely characterize the asymptotics of the exponent of maximal density and the biclique rateregion for any joint type defined on finite alphabets. We observe that, in general, the asymptotic bicliquerate region defined by us is a subset (in general, a strict subset) of the approximate one defined by Han andKobayashi [3]. In fact, their definition for a distribution P XY is equal to the asymptotic rate region of asequence of n -types { T ( n ) XY } approaching P XY , which satisfy the condition E n (cid:16) R ( n )1 , R ( n )2 (cid:17) → as n → ∞ .Interestingly, our proof for the biclique rate region combines information-theoretic methods and linear algebra,which seems not common in information theory.2) We then apply our results on type graphs to two noninteractive simulation problems - one with sourcesuniformly distributed over a joint n -type and the other with memoryless sources. We study the regime inwhich the marginal probabilities are exponentially small. Note that this regime for memoryless sources wasfirst studied by Ordentlich, Polyanskiy, and Shayevitz [4]. However, they only focused on doubly symmetricbinary sources (DSBSes), and only solved limiting cases. In this paper we completely solve the noninteractivesimulation problems for uniform sources or memoryless sources defined on any finite alphabets. Our proofsfor this part are based on the coupling technique invented by the first author and Tan [5], which was originallyused to prove a converse result for the common information problem. Interestingly, the converse derived bythis technique in [5] seems not tight in general (although it is indeed tight for DSBSes); however, the conversederived here for the noninteractive simulation problem is tight in general.3) Finally, we relax Boolean functions in the noninteractive simulation problems to any nonnegative functions,but still restrict their suppports to be exponentially small. We obtain several stronger (forward and reverse)hypercontractivity inequalities, which, in asymptotic cases, reduce to the common hypercontractivity inequal-ities when the exponents of the sizes of the supports are zero. Similar forward hypercontractivity inequalitieswere previously derived by Polyanskiy, Samorodnitsky, and Kirshner [6], [7]. The hypercontractivity in [6]was derived by a nonlinear log-Sobolev inequality, and the one in [7] by Fourier analysis. In comparison,our proofs are purely information-theoretic. C. Notation
Throughout this paper, for two sequences of reals, we use a n . = b n to denote a n = b n e o ( n ) . We use C ( Q X , Q Y ) to denote the set of couplings Q XY with marginals Q X , Q Y , and C (cid:0) Q X | UW , Q Y | V W (cid:1) to denote the set ofconditional couplings Q XY | UV W with conditional marginals Q X | UW , Q Y | V W . Note that, for any Q XY | UV W ∈ C (cid:0) Q X | UW , Q Y | V W (cid:1) , its marginals satisfy Q X | UV W = Q X | UW , Q Y | UV W = Q Y | V W , i.e., under the conditionaldistribution Q XY | UV W , we have X ↔ ( U, W ) ↔ V and Y ↔ ( V, W ) ↔ U , where the notation A ↔ C ↔ B fora random triple ( A, B, C ) denotes that A and B are conditionally independent given C . For a sequence x , we use T x to denote the type of x . For an m × n matrix B = ( b i,j ) and two subsets H ⊆ [ m ] , L ⊆ [ n ] , we use B H , L to denote ( b i,j ) i ∈H ,j ∈L , i.e. the submatrix of B consisting of the elements with indices in H × L . For an n -lengthvector or sequence x and a subset J ⊆ [ n ] , x J := ( x j ) j ∈J is defined similarly. We will also use notations H Q ( X ) or H ( Q X ) to denote the entropy of X ∼ Q X . If the distribution is denoted by P X , we sometimes write the entropyas H ( X ) for brevity. We use supp( P X ) to denote the support of P X .II. T YPE G RAPHS
In this section, we completely characterize the asymptotic exponent of maximal density and the asymptoticbiclique rate region.
Theorem 1.
Given a joint n -type T XY with n ≥ |X | |Y| + 2) |X | |Y| , for ( R , R ) ∈ R ( n ) X × R ( n ) Y , we have E ∗ ( R , R ) ≤ E n ( R , R ) ≤ E ∗ ( R , R ) + ε n , (6) where ε n := ( |X ||Y| +2) |X ||Y| n log ( n +1) n |X | |Y| , E ∗ ( R , R ) := R + R − F ( R , R ) , and F ( R , R ) := max P XY W : P XY = T XY ,H ( X | W ) ≤ R ,H ( Y | W ) ≤ R H ( XY | W ) , (7) which we think of as being defined for all nonnegative pairs ( R , R ) . In particular, for any n ≥ and any joint n -type T XY , we have E ( R , R ) = E ∗ ( R , R ) . (8) Without loss of optimality, the alphabet size of W in the definition of F can be assumed to be no larger than |X | |Y| + 2 .Remark . Obviously, E ∗ can be also expressed as E ∗ ( R , R ) := R + R − H T ( XY ) + G ( R , R ) , with G ( R , R ) := min P XY W : P XY = T XY ,H ( X | W ) ≤ R ,H ( Y | W ) ≤ R I ( XY ; W ) (9)corresponding to the minimum common rate given marginal rates ( R , R ) in the Gray-Wyner source codingnetwork [8, Theorem 14.3]. Remark . The statement in (8) holds for all n ≥ because every n -type is also a kn -type for all k ≥ . Remark . We call the inequality on the right hand side of (6) the achievability part and the inequality on the lefthand side the converse part.Before proving Theorem 1, we first list several properties of F in the following lemma. The proof is providedin Appendix A. Lemma 1.
For any joint n -type T XY and R , R ≥ , the following properties of F hold.1) Given R , F ( R , R ) is nondecreasing in R and, given R , F ( R , R ) is nondecreasing in R .2) F ( R , R ) ≤ min { H T ( XY ) , R + R , R + H T ( Y | X ) , R + H T ( X | Y ) } .3) F (0 , R ) = min { R , H T ( Y | X ) } and, similarly, F ( R ,
0) = min { R , H T ( X | Y ) } .4) F ( R , R ) is concave in ( R , R ) on { ( R , R ) : R ≥ , R ≥ } .5) For δ , δ ≥ , we have ≤ F ( R + δ , R + δ ) − F ( R , R ) ≤ δ + δ for all R ≥ , R ≥ .Proof of Theorem 1: Achievability part: For a joint n -type P XY W such that P XY = T XY , H ( X | W ) ≤ R , H ( Y | W ) ≤ R and for a fixed sequence w with type P W , we choose A as the union of T P X | W ( w ) and anumber e nR − |T P X | W ( w ) | of arbitrary sequences outside T P X | W ( w ) , and choose B in a similar way, but with T P X | W ( w ) replaced by T P Y | W ( w ) . Then | A | = e nR and | B | = e nR . Observe that | ( A × B ) ∩ T T XY | ≥ (cid:12)(cid:12) T P XY | W ( w ) (cid:12)(cid:12) ≥ e n ( H ( XY | W ) − |W||X||Y| log( n +1) n ) , where the second inequality follows from [9, Lemma 2.5]. Thus we have ρ ( G [ A, B ]) ≥ | ( A × B ) ∩ T T XY | e nR e nR ≥ e − n ( R + R − H ( XY | W )+ |W||X||Y| log( n +1) n ) . (10)Optimizing the exponent in (10) over all joint n -types P XY W such that P XY = T XY , H ( X | W ) ≤ R , H ( Y | W ) ≤ R yields the upper bound E n ( R , R ) ≤ R + R − F n ( R , R ) + |W| |X | |Y| log ( n + 1) n , (11)where F n is defined similarly as F in (7) but with the P XY W in (7) restricted to be a joint n -type.We claim that F n ( R , R ) ≥ F ( R , R ) + |W| |X | |Y| n log |X | |Y| n . (12)Substituting this into the upper bound in (11) and combining with the claim |W| ≤ |X | |Y| + 2 yields the desiredupper bound. Hence the rest is to prove these claims.For a joint n -type T XY and a distribution P XY W with P XY = T XY , one can find a n -type Q XY W with Q XY = T XY such that (cid:107) P XY W − Q XY W (cid:107) ≤ |W||X ||Y| n , where (cid:107)·(cid:107) denotes the TV distance [10, Lemma 3]. Combining itwith [9, Lemma 2.7], i.e., that if (cid:107) P X − Q X (cid:107) ≤ Θ ≤ then we have | H P ( X ) − H Q ( X ) | ≤ −
2Θ log |X | , wehave for |W||X ||Y| n ≤ that | H P ( X | W ) − H Q ( X | W ) | ≤ | H P ( X, W ) − H Q ( X, W ) | + | H P ( W ) − H Q ( W ) |≤ − |W| |X | |Y| n log 2 |W||X ||Y| n |X | |W| − |W| |X | |Y| n log 2 |W||X ||Y| n |W| = − |W| |X | |Y| n log |X | |Y| n , (13)and similarly, | H P ( Y | W ) − H Q ( Y | W ) | ≤ − |W| |X | |Y| n log |X | |Y| n , (14) | H P ( XY | W ) − H Q ( XY | W ) | ≤ − |W| |X | |Y| n log |X | |Y| n . (15)Combining (13)-(15) yields that F n ( R , R ) ≥ F (cid:32) R + |W| |X | |Y| n log |X | |Y| n , R + |W| |X | |Y| n log |X | |Y| n (cid:33) + |W| |X | |Y| n log |X | |Y| n . Applying Statement 5) of Lemma 1, we obtain (12). The claim that we can restrict attention to the case |W| ≤|X | |Y| + 2 in the definitions of F n ( R , R ) and F ( R , R ) comes from the support lemma in [8].Converse part: Let C := ( A × B ) ∩ T T XY for some optimal ( A, B ) attaining Γ n (cid:0) e nR , e nR (cid:1) . Let ( X , Y ) ∼ Unif ( C ) . Then Γ n (cid:0) e nR , e nR (cid:1) = | C || A | | B | = e H ( X , Y ) e nR e nR , n H ( X ) ≤ R , n H ( Y ) ≤ R . Therefore, E n ( R , R ) = R + R − n H ( X , Y )= R + R − n n (cid:88) i =1 H (cid:0) X i Y i | X i − Y i − (cid:1) = R + R − H (cid:0) X J Y J | X J − Y J − J (cid:1) where J ∼ Unif [ n ] is a random time index independent of ( X n , Y n ) . On the other hand, H (cid:0) X J | X J − Y J − J (cid:1) ≤ H (cid:0) X J | X J − J (cid:1) = 1 n H ( X ) ≤ R ,H (cid:0) Y J | X J − Y J − J (cid:1) ≤ R . Using the notation X := X J , Y := Y J , W := X J − Y J − J , we obtain ( X, Y ) ∼ T XY , and E n ( R , R ) ≥ inf P XY W : P XY = T XY ,H ( X | W ) ≤ R ,H ( Y | W ) ≤ R R + R − H ( XY | W )= E ∗ ( R , R ) . By the support lemma in [8], the alphabet size of W can be upper bounded by |X | |Y| + 2 .Theorem 1 is an edge-isoperimetric result for the bipartite graph induced by a joint n -type T XY . For the case inwhich X = Y and T X = T Y , the bipartite graph of T XY can be considered as a directed graph (allowing self-loopsif X = Y under T XY ) in which the vertices consist of x ∈ T T X and there is a directed edge from x to y if andonly if ( x , y ) ∈ T T XY . Hence, for this case, Theorem 1 can be also considered as an edge-isoperimetric result fora directed graph induced by T XY . Specifically, for a subset A ⊆ T T X , let G [ A ] be the induced subgraph of thedirected graph of T XY . The (edge) density ρ ( G [ A ]) is defined as ρ ( G [ A ]) := of directed edges in G [ A ] | A | = | ( A × A ) ∩ T T XY || A | . Given ≤ M ≤ |T T X | , define the maximal density of subgraphs with size M as Γ n ( M ) := max A ⊆T TX : | A | = M ρ ( G [ A ]) . Given a joint n -type T XY , for R ∈ R ( n ) X as defined in (2), define the exponent of maximal density as E n ( R ) := − n log Γ n (cid:0) e nR (cid:1) . (16)For any subsets A, B of X n , we have | A | | B | ρ ( G [ A, B ]) ≤ | A ∪ B | ρ ( G [ A ∪ B ]) . On the other hand, Γ n ( M ) ≤ Γ n ( M, M ) . Hence
14 Γ n (cid:18) M , M (cid:19) ≤ Γ n ( M ) ≤ Γ n ( M, M ) . Combining the inequalities above with Theorem 1 yields the following result.
Corollary 1.
For any n ≥ |X | + 2) |X | , T XY , and R ∈ R ( n ) X , we have | E n ( R ) − E ∗ ( R, R ) | = O ( log nn ) , (17) where the asymptotic constant in the O ( log nn ) term on the right hand side depends only on |X | , and E ∗ ( R , R ) is defined in Theorem 1. For the case of X = Y and T X = T Y , the bipartite graph of T XY can be also considered as an undirectednon-bipartite graph (allowing self-loops if X = Y under T XY ) in which the vertices consist of x ∈ T T X and ( x , y ) is an edge if and only if ( x , y ) or ( y , x ) ∈ T T XY . By a similar argument to the above, Corollary 1 still holds forthis case, which hence can be considered as a generalization of [7, Theorem 1.6] from binary alphabets to arbitraryfinite alphabets. We next consider the biclique rate region. We use the same notation as the one in (1) for the bipartite graph case, but here the edge density has only one parameter. The differencebetween these two maximal densities is that in (1) the maximization is taken over a pair of sets ( A, B ) , but here only over one set (equivalently,under the restriction A = B ). Theorem 2.
For any n ≥ |X | |Y| ) / and any T XY , ( R ∗ ( T XY ) − [0 , ε ,n ] × [0 , ε ,n ]) ∩ (cid:16) R ( n ) X × R ( n ) Y (cid:17) ⊆ R n ( T XY ) ⊆ R ∗ ( T XY ) ∩ (cid:16) R ( n ) X × R ( n ) Y (cid:17) (18) where “ − ” is the Minkowski difference (i.e., for A, B ⊆ R m , A − B := (cid:84) b ∈ B ( A − b ) ), ε ,n := |X ||Y| n log n ( n +1)16 |X | , ε ,n := |X ||Y| n log n ( n +1)16 |Y| , and R ∗ ( T XY ) := (cid:91) ≤ α ≤ ,P XY ,Q XY : αP XY +(1 − α ) Q XY = T XY { ( R , R ) : R ≤ αH P ( X | Y ) , R ≤ (1 − α ) H Q ( Y | X ) } . (19) In particular, R ( T XY ) = R ∗ ( T XY ) , (20) where R ( T XY ) is the asymptotic biclique rate region, defined in (5) .Remark . Given T XY , R ∗ ( T XY ) is a closed convex set. To see this, using the continuity of H P ( X | Y ) in P XY ,it is straightforward to establish that R ∗ ( T XY ) is closed. Convexity follows by the following argument. For any ( R , R ) , (cid:16) (cid:98) R , (cid:98) R (cid:17) ∈ R ∗ ( T XY ) , there exist ( α, P XY , Q XY ) and (cid:16)(cid:98) α, (cid:98) P XY , (cid:98) Q XY (cid:17) such that αP XY + (1 − α ) Q XY = T XY (cid:98) α (cid:98) P XY + (1 − (cid:98) α ) (cid:98) Q XY = T XY R ≤ αH P ( X | Y ) , R ≤ (1 − α ) H Q ( Y | X ) (cid:98) R ≤ (cid:98) αH (cid:98) P ( X | Y ) , (cid:98) R ≤ (1 − (cid:98) α ) H (cid:98) Q ( Y | X ) . Then for any λ ∈ [0 , , λR + (1 − λ ) (cid:98) R ≤ λαH P ( X | Y ) + (1 − λ ) (cid:98) αH (cid:98) P ( X | Y ) ≤ βH P ( θ ) ( X | Y ) (21)where β = λα + (1 − λ ) (cid:98) α , and P ( θ ) XY := θP XY + (1 − θ ) (cid:98) P XY with θ = λαλα +(1 − λ ) (cid:98) α if β > ; P ( θ ) XY is chosenas an arbitrary distribution if β = 0 . Here (21) follows since H P ( X | Y ) is concave in P XY . By symmetry, λR + (1 − λ ) (cid:98) R ≤ (1 − β ) H Q ( (cid:98) θ ) ( Y | X ) , where Q ( (cid:98) θ ) XY := (cid:98) θP XY + (cid:16) − (cid:98) θ (cid:17) (cid:98) P XY with (cid:98) θ = λ (1 − α ) λ (1 − α )+(1 − λ )(1 − (cid:98) α ) if β < ; Q ( (cid:98) θ ) XY is chosen as an arbitrary distribution if β = 1 . Since βP ( θ ) XY + (1 − β ) Q ( (cid:98) θ ) XY = T XY , it follows that λ ( R , R ) + (1 − λ ) (cid:16) (cid:98) R , (cid:98) R (cid:17) ∈ R ∗ ( T XY ) , i.e., R ∗ ( T XY ) is convex. Remark . Theorem 2 can be easily generalized to the k -variables case with k ≥ . For this case, let T X ,...,X k bea joint n -type. Then the graph G induced by T X ,...,X k is in fact a k -partite hypergraph. The (edge) density of thesubgraph of G with vertex sets ( A , ..., A k ) is defined as ρ ( G [ A , ..., A k ]) := (cid:12)(cid:12)(cid:12)(cid:16)(cid:81) ki =1 A i (cid:17) ∩ T T X ,...,Xk (cid:12)(cid:12)(cid:12)(cid:81) ki =1 | A i | . It is interesting to observe that ρ ( G ) . = e − nI T ( n ) ( X ; ... ; X k ) for a sequence of joint types (cid:110) T ( n ) X ,...,X k (cid:111) , where I T ( n ) ( X ; ... ; X k ) := (cid:80) ki =1 H T ( n ) ( X i ) − H T ( n ) ( X , ..., X k ) . Given a joint n -type T X ,...,X k , we define the k -cliquerate region as R n ( T X ,...,X k ) := (cid:26)(cid:18) n log | A | , ..., n log | A k | (cid:19) : ρ ( G [ A , ..., A k ]) = 1 (cid:27) . Following similar steps to our proof of Theorem 2, it is easy to show that for this case we have (cid:32) R ∗ ( T X ,...,X k ) − (cid:20) , O (cid:18) log nn (cid:19)(cid:21) k (cid:33) ∩ (cid:32) k (cid:89) i =1 R ( n ) X i (cid:33) ⊆ R n ( T X ,...,X k ) ⊆ R ∗ ( T X ,...,X k ) ∩ (cid:32) k (cid:89) i =1 R ( n ) X i (cid:33) , where R ∗ ( T X ,...,X k ) := (cid:91) α i ≥ ,P ( i ) X ,...,Xk ,i ∈ [ k ]: (cid:80) ki =1 α i =1 (cid:80) ki =1 α i P ( i ) X ,...,Xk = T X ,...,Xk (cid:8) ( R , ..., R k ) : R i ≤ α i H P ( i ) (cid:0) X i | X [ k ] \ i (cid:1)(cid:9) , (22)with X [ k ] \ i := ( X , ..., X i − , X i +1 , ..., X k ) . Remark . We call the inclusion on the left in (18) the achievability part and the inclusion on the right in (18) theconverse part.
Proof:
We now prove Theorem 2. Obviously, (20) follows from (18). It suffices to prove (18).Achievability part: Let d be an integer such that ≤ d ≤ n − . Let ( P XY , Q XY ) be a pair of d -joint typeand ( n − d ) -joint type on X × Y such that dn P XY + (cid:0) − dn (cid:1) Q XY = T XY . For a fixed d -length sequence y with type P Y an a fixed ( n − d ) -length sequence x with type Q X , we choose A = T P X | Y ( y ) × { x } and B = { y } × T Q Y | X ( x ) . Then, from [9, Lemma 2.5], we have | A | ≥ e d ( H P ( X | Y ) − |X||Y| log( d +1) d ) and similarly | B | ≥ e ( n − d ) ( H Q ( Y | X ) − |X||Y| log( n − d +1) n − d ) . On the other hand, for this code, A × B ⊆ T T XY . Hence any rate pair ( R , R ) ∈ (cid:16) R ( n ) X × R ( n ) Y (cid:17) with R ≤ dn (cid:18) H P ( X | Y ) − |X | |Y| log ( d + 1) d (cid:19) ,R ≤ (cid:18) − dn (cid:19) (cid:18) H Q ( Y | X ) − |X | |Y| log ( n − d + 1) n − d (cid:19) is achievable, which in turn implies that a pair of smaller rates ( R , R ) ∈ (cid:16) R ( n ) X × R ( n ) Y (cid:17) with R ≤ dn H P ( X | Y ) − |X | |Y| log ( n + 1) n , (23) R ≤ (cid:18) − dn (cid:19) H Q ( Y | X ) − |X | |Y| log ( n + 1) n (24)is achievable.We next remove the constraint that ( P XY , Q XY ) are joint types. For ≤ α ≤ , let (cid:16) (cid:98) P XY , (cid:98) Q XY (cid:17) be a pair ofdistributions such that α (cid:98) P XY + (1 − α ) (cid:98) Q XY = T XY . Define d := (cid:13)(cid:13)(cid:13)(cid:106) nα (cid:98) P XY (cid:107)(cid:13)(cid:13)(cid:13) . Obviously, nα − |X | |Y| ≤ d ≤ nα .We first consider the case |X | |Y| ≤ d ≤ n − |X | |Y| . (25)Define P XY := (cid:98) nα (cid:98) P XY (cid:99) d . Then, P XY is a joint d -type and (cid:13)(cid:13)(cid:13) P XY − (cid:98) P XY (cid:13)(cid:13)(cid:13) ≤ |X ||Y| d ≤ . Define Q XY := nT XY − dP XY n − d , which is obviously an ( n − d ) -joint type and satisfies (cid:13)(cid:13)(cid:13) Q XY − (cid:98) Q XY (cid:13)(cid:13)(cid:13) ≤ |X ||Y| n − d ≤ . By [9, Lemma2.7], and combining with the equality H ( X | Y ) = H ( X, Y ) − H ( Y ) , we have H P ( X | Y ) ≥ H (cid:98) P ( X | Y ) + 2 |X | |Y| d log 4 |X | d ,H Q ( Y | X ) ≥ H (cid:98) Q ( Y | X ) + 2 |X | |Y| n − d log 4 |Y| ( n − d ) . These inequalities, together with (23) and (24), imply thatRHS of (23) ≥ dn H (cid:98) P ( X | Y ) − |X | |Y| n log n |X | − |X | |Y| log ( n + 1) n ≥ αH (cid:98) P ( X | Y ) − |X | |Y| n log |X | − |X | |Y| n log n |X | − |X | |Y| log ( n + 1) n = αH (cid:98) P ( X | Y ) − (cid:15) ,n , (26)RHS of (24) ≥ (cid:18) − dn (cid:19) H (cid:98) Q ( Y | X ) − |X | |Y| n log n |Y| − |X | |Y| log ( n + 1) n ≥ (1 − α ) H (cid:98) Q ( Y | X ) − |X | |Y| n log n |Y| − |X | |Y| log ( n + 1) n = (1 − α ) H (cid:98) Q ( Y | X ) − (cid:15) ,n . (27)Therefore, any rate pair ( R , R ) ∈ (cid:16) R ( n ) X × R ( n ) Y (cid:17) with R ≤ αH (cid:98) P ( X | Y ) − (cid:15) ,n and R ≤ (1 − α ) H (cid:98) Q ( Y | X ) − (cid:15) ,n , for any ≤ α ≤ and (cid:16) (cid:98) P XY , (cid:98) Q XY (cid:17) a pair of distributions such that α (cid:98) P XY + (1 − α ) (cid:98) Q XY = T XY , isachievable as long as the condition in (25) holds.We next consider the case ≤ d < |X | |Y| . For this case, we have αH (cid:98) P ( X | Y ) ≤ d + |X | |Y| n log |X | ≤ |X | |Y| n log |X | ≤ ε ,n . (28)Hence (cid:110) ( R , R ) : R ≤ αH (cid:98) P ( X | Y ) , R ≤ (1 − α ) H (cid:98) Q ( Y | X ) (cid:111) − [0 , ε ,n ] × [0 , ε ,n ] is empty, and so its inter-section with (cid:16) R ( n ) X × R ( n ) Y (cid:17) is also empty. Therefore, there is nothing to prove in this case. The case when n − |X | |Y| < d ≤ n can be handled similarly. This completes the proof for the achievability part.Converse part: We next prove the converse part by combining information-theoretic methods and linear algebra.Observe that the biclique rate region only depends on the probability values of T XY , rather than the alphabets X , Y . With this in mind, we observe that we can identify X and Y with subsets of R by one-to-one mappingssuch that, for any probability distribution P XY , if ( X, Y ) ∈ X × Y satisfies ( X, Y ) ∼ P XY we can talk about theexpectations E P [ X ] , E P [ Y ] , the covariance Cov P ( X, Y ) , and the correlation E P [ XY ] . Translating the choices of X and/or Y (as subsets of R ) does not change Cov P ( X, Y ) , so we can ensure that we make these choices in sucha way that E P [ XY ] = Cov P ( X, Y ) + E P [ X ] E P [ Y ] = 0 .Let us now choose X , Y ⊆ R in this way, such that for the given joint n -type T XY we have E T [ XY ] = 0 . Then,for A × B ⊆ T T XY , we will have (cid:104) x , y (cid:105) = 0 for any ( x , y ) ∈ A × B , where x , y are now viewed as vectors in R n .Let A denote the linear space spanned by all the vectors in A , and let B denote the linear space spanned by all thevectors in B . Hence B ⊆ A ⊥ , where A ⊥ denotes the orthogonal complement of a subspace A . As an importantproperty of the orthogonal complement, dim( A ) + dim( A ⊥ ) = n. Hence dim( A ) + dim( B ) ≤ n. We next establish the following lemma, whose proof is provided in Appendix B.
Lemma 2 (Probabilistic exchange lemma) . Let k ≥ be an integer. Let V i , ≤ i ≤ k be mutually orthogonallinear subspaces of R n with dimensions, denoted as n i , ≤ i ≤ k , satisfying (cid:80) ki =1 n i = n . Let X ( i ) , ≤ i ≤ k be k (possibly dependent) random vectors with X ( i ) defined on V i . We write the coordinates of X ( i ) as X ( i )1 , . . . , X ( i ) n .Then there always exists a partition {J i , ≤ i ≤ k } of [ n ] such that such that |J i | = n i and X ( i ) = f i (cid:16) X ( i ) J i (cid:17) , ≤ i ≤ k for some deterministic linear functions f i : R n i → R n , where X ( i ) J i := ( X ( i ) j ) j ∈J i .Remark . From our proof, it is easy to observe that the condition “mutually orthogonal linear subspaces of R n ”can be replaced by “mutually (linearly) independent linear subspaces of R n ” (i.e. such that dimension of the spanof any subset of the subspaces equals the sum of the dimensions of the subspaces in that subset), or more generally,“affine subspaces that are translates of mutually independent linear subspaces of R n ”.In other words, under the assumption in this lemma there always exists a permutation σ of [ n ] such that Y ( i ) = f i (cid:18) Y ( i ) [ (cid:80) i − j =1 n j +1: (cid:80) ij =1 n j ] (cid:19) , ≤ i ≤ k , for some deterministic functions f i : R n i → R n , where Y ( i ) is the randomvector obtained by permuting the components of X ( i ) using σ . Let d denote dim( A ) , so we have dim( A ⊥ ) = n − d . Let X ∼ Unif ( A ) , Y ∼ Unif ( B ) be two independentrandom vectors, i.e., ( X , Y ) ∼ P X , Y := Unif ( A ) Unif ( B ) . Now we choose k = 2 , V = A, V = A ⊥ , X = X , X = Y in Lemma 2. Then there exists a partition {J , J c } of [ n ] such that such that |J | = d and X = f ( X J ) , Y = f ( Y J c ) for some deterministic functions f : R d → R n , f : R n − d → R n . By this property, onthe one hand we have R = 1 n H ( X ) = 1 n H ( X | Y ) = 1 n H ( X J | Y ) ≤ n H ( X J | Y J ) ≤ n (cid:88) j ∈J H ( X j | Y j )= dn H ( X J | Y J J ) ≤ dn H ( X J | Y J ) = dn H (cid:16) (cid:101) X | (cid:101) Y (cid:17) where J ∼ Unif ( J ) , (cid:101) X := X J , (cid:101) Y := Y J , with J being independent of ( X , Y ) . Similarly, R = 1 n H ( Y ) = 1 n H ( Y | X ) = 1 n H ( Y J c | X ) ≤ n H ( Y J c | X J c ) ≤ n (cid:88) j ∈J c H ( Y j | X j ) = (cid:18) − dn (cid:19) H (cid:16) Y (cid:98) J | X (cid:98) J (cid:98) J (cid:17) ≤ (cid:18) − dn (cid:19) H (cid:0) Y (cid:98) J | X (cid:98) J (cid:1) = (cid:18) − dn (cid:19) H (cid:16) (cid:98) Y | (cid:98) X (cid:17) , where (cid:98) J ∼ Unif ( J c ) , (cid:98) X := X (cid:98) J , (cid:98) Y := Y (cid:98) J , with (cid:98) J being independent of ( X , Y , J ) . On the other hand, dn P (cid:101) X (cid:101) Y + (cid:18) − dn (cid:19) P (cid:98) X (cid:98) Y = 1 n (cid:88) j ∈J P X j Y j + 1 n (cid:88) j ∈J c P X j Y j = 1 n n (cid:88) j =1 P X j Y j = E ( X , Y ) [ T XY ] = T XY . This completes the proof of the converse part.We next study when the asymptotic biclique rate region is a triangle region. We obtain the following necessaryand sufficient condition. The proof is provided in Appendix C.
Proposition 1.
Let T XY be a joint n -type such that H T ( X | Y ) , H T ( Y | X ) > . Then the asymptotic biclique rateregion R ( T XY ) is a triangle region, i.e., R ( T XY ) = R (cid:52) ( T XY ) := (cid:91) ≤ α ≤ { ( R , R ) : R ≤ αH T ( X | Y ) , R ≤ (1 − α ) H T ( Y | X ) } , if and only if T XY satisfies that T X | Y ( x | y ) /H T ( X | Y ) = T Y | X ( y | x ) /H T ( Y | X ) for all x, y . The condition in Proposition 1 is satisfied by the joint n -types T XY which have marginals T X = Unif ( X ) , T Y =Unif ( Y ) and satisfy at least one of the following two conditions:1) |X | = |Y| ;2) X, Y are independent under the distribution T XY .Hence, Proposition 1 implies that the asymptotic biclique rate region is a triangle region if the joint n -type T XY is a DSBS.Han and Kobayashi [3] proved that their approximate version of the asymptotic biclique rate region is equal to R ∗∗ ( T XY ) := (cid:91) P XY W : P XY = T XY ,X ↔ W ↔ Y { ( R , R ) : R ≤ H ( X | W ) , R ≤ H ( Y | W ) } . By Theorem 1, R ∗∗ ( T XY ) also coincides with the region { ( R , R ) : E ∗ ( R , R ) = 0 } and hence R ∗ ( T XY ) ⊆R ∗∗ ( T XY ) . This implies that the asymptotic biclique rate region defined in this paper is a subset of Han andKobayashi’s approximate version (in fact, it is a strict subset if the joint n -type T XY is a DSBS or Unif(
X × Y ) .).The difference between the exact and approximate versions of asymptotic biclique rate regions is caused by the“type overflow” effect, which was crystallized by the first author and Tan in [5]. Let ( R , R ) be a pair such that E ∗ ( R , R ) = 0 . Let ( A, B ) be an optimal pair of subsets attaining E ∗ ( R , R ) . All the sequences in A havetype T X , and all the sequences in B have type T Y . However, in general, the joint types of ( x , y ) ∈ A × B might“overflow” from the target joint type T XY . The number of non-overflowed sequence pairs (i.e., | ( A × B ) ∩ T T XY | ) has exponent R + R , since E ∗ ( R , R ) = 0 . This means that not too many sequence pairs have overflowed.However, if type overflow is forbidden, then we must reduce the rates of A and B to satisfy this requirement. Thisleads to the exact version of the asymptotic biclique rate region being strictly smaller than the approximate version.In other words, the exact asymptotic biclique rate region is more sensitive to the type overflow effect than theapproximate version. A similar conclusion was previously drawn by the first author and Tan in [5] for the commoninformation problem. Technically speaking, the type overflow effect corresponds to the fact that optimization overcouplings is involved in our expressions. Intuitively, it is caused by the Markov chain constraints in the problem.We believe that the type overflow effect usually accompanies problems involving Markov chains.III. N ONINTERACTIVE S IMULATION
In this section, we apply our results in the previous section to two noninteractive simulation problems, one withsources uniformly distributed over a joint n -type and the other with memoryless sources. A. Noninteractive Simulation with Sources
Unif ( T T XY ) In this subsection, we assume ( X , Y ) ∼ P X , Y := Unif ( T T XY ) . Assume that ( U n , V n ) are random variablessuch that U n ↔ X ↔ Y ↔ V n forms a Markov chain. What are the possible joint distributions of ( U n , V n ) ? Thisproblem is termed the non-interactive simulation problem . In this paper, we restrict U n , V n to be Boolean randomvariables taking values in { , } . For this case, the joint distribution of ( U n , V n ) is entirely determined by the tripleof scalars ( P ( U n = 1) , P ( V n = 1) , P ( U n = V n = 1)) .Define E ( n ) X := { n log |T T X | − R : R ∈ R ( n ) X } and E ( n ) Y := { n log |T T Y | − R : R ∈ R ( n ) Y } , where R ( n ) X and R ( n ) Y are defined in (2). Given a joint n -type T XY , for ( E , E ) ∈ E ( n ) X × E ( n ) Y , define Υ n ( E , E ) := − n log max A ⊆T TX ,B ⊆T TY : P X ( A )= e − nE ,P Y ( B )= e − nE P X , Y ( A × B ) , (29) Υ n ( E , E ) := − n log min A ⊆T TX ,B ⊆T TY : P X ( A )= e − nE ,P Y ( B )= e − nE P X , Y ( A × B ) . (30)We determine the asymptotic behavior of Υ n in the following theorem. However, the asymptotic behavior of Υ n is currently unclear; see the discussion in Section V. Theorem 3.
For any T XY and ( E , E ) ∈ E ( n ) X × E ( n ) Y , we have | Υ n ( E , E ) − Υ ∗ ( E , E ) | = O ( log nn ) , (31) where the asymptotic constant in the O ( log nn ) bound depends only on |X | , |Y| , and where Υ ∗ ( E , E ) := min P XY W : P XY = T XY ,I ( X ; W ) ≥ E ,I ( Y ; W ) ≥ E I ( XY ; W ) . Proof:
Observe that P X ( A ) = | A | / |T T X | , (32) P Y ( B ) = | B | / |T T Y | , (33) P X , Y ( A × B ) = ρ ( G [ A, B ]) | A || B | / |T T XY | . (34)Let R := 1 n log |T T X | − E , R := 1 n log |T T Y | − E . (35)Then, by (32)-(34), we obtain Υ n ( E , E ) = E n ( R , R ) + 1 n log |T T XY | − R − R = E ∗ ( R , R ) + H ( T XY ) − R − R + O (cid:18) log nn (cid:19) = G ( R , R ) + O (cid:18) log nn (cid:19) , (36) where G is defined in (9) and the asymptotic constant in the O ( log nn ) bound depends only on |X | , |Y| .Substituting (35) into (36) yields that Υ n ( E , E ) = G (cid:18) n log |T T X | − E , n log |T T Y | − E (cid:19) + O (cid:18) log nn (cid:19) = Υ ∗ ( E , E ) + O (cid:18) log nn (cid:19) , where the asymptotic constant in the O ( log nn ) bound again depends only on |X | , |Y| , and where the last equalityfollows by part 5) of Lemma 1. B. Noninteractive Simulation with Sources P nXY In this subsection, we study the noninteractive simulation problem with ( X , Y ) ∼ P nXY , where P XY is a jointdistribution defined on X × Y . We still assume that X , Y are finite and that P X ( x ) > , P Y ( y ) > for all ( x, y ) ∈ X × Y , where P X , P Y denote the marginal distributions of P XY . Ordentlich, Polyanskiy, and Shayevitz[4] focused on binary symmetric distributions P X,Y , and studied the exponent of P ( U n = V n = 1) given that P ( U n = 1) , P ( V n = 1) vanish exponentially fast with exponents E , E , respectively. Let E , max := − log (cid:16) min x P X ( x ) (cid:17) , E , max := − log (cid:18) min y P Y ( y ) (cid:19) . In this subsection, we consider an arbitrary distribution P X,Y satisfying P X ( x ) > , P Y ( y ) > for all ( x, y ) ∈ X ×Y and, for E ∈ [0 , E , max ] , E ∈ [0 , E , max ] , we aim at characterizing the exponents Θ ( E , E ) := lim n →∞ Θ n ( E , E ) and Θ ( E , E ) := lim n →∞ Θ n ( E , E ) , (37)where Θ n ( E , E ) := − n log max A ⊆X n ,B ⊆Y n : P nX ( A ) ≤ e − nE ,P nY ( B ) ≤ e − nE P nXY ( A × B ) , (38) Θ n ( E , E ) := − n log min A ⊆X n ,B ⊆Y n : P nX ( A ) ≥ e − nE ,P nY ( B ) ≥ e − nE P nXY ( A × B ) . (39)We prove the following two theorems, whose proofs are provided in Appendices D and E, respectively. Theorem 4 (Strong Forward Small-Set Expansion Theorem) . For E ∈ [0 , E , max ] , E ∈ [0 , E , max ] , Θ ( E , E ) = Θ ∗ ( E , E ) := min Q XY W : D ( Q X | W (cid:107) P X | Q W ) ≥ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≥ E D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) . (40) Moreover, Θ n ( E , E ) ≥ Θ ∗ ( E , E ) for any n ≥ and E ∈ [0 , E , max ] , E ∈ [0 , E , max ] . Without loss ofoptimality, the alphabet size of W can be assumed to be no larger than . Theorem 5 (Strong Reverse Small-Set Expansion Theorem) . For E ∈ [0 , E , max ] , E ∈ [0 , E , max ] , Θ ( E , E ) = Θ ∗ ( E , E ) E , E > E E = 0 E E = 0 , (41) where for E ∈ [0 , E , max ] , E ∈ [0 , E , max ] , Θ ∗ ( E , E ) := max Q W ,Q X | W ,Q Y | W : D ( Q X | W (cid:107) P X | Q W ) ≤ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≤ E min Q XY | W ∈ C ( Q X | W ,Q Y | W ) D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) . By time-sharing arguments, given ( E , E ) , { n Θ n ( E , E ) } n ≥ is subadditive. Hence, by Fekete’s Subadditive Lemma, the first limitin (37) exists and equals inf n ≥ Θ n ( E , E ) . Similar observations serve to define the second limit in (37). Moreover, Θ n ( E , E ) ≤ the RHS of (41) for any n ≥ and E ∈ [0 , E , max ] , E ∈ [0 , E , max ] . Without loss ofoptimality, the alphabet size of W can be assumed to be no larger than . Define the minimum relative entropy over couplings of ( Q X , Q Y ) with respect to P XY , as D ( Q X , Q Y (cid:107) P XY ) := min Q XY ∈ C ( Q X ,Q Y ) D ( Q XY (cid:107) P XY ) . For s ∈ [0 , E , max ] , t ∈ [0 , E , max ] , define ϕ ( s, t ) := min Q XY : D ( Q X (cid:107) P X )= s,D ( Q Y (cid:107) P Y )= t D ( Q XY (cid:107) P XY ) (42) = min Q X ,Q Y : D ( Q X (cid:107) P X )= s,D ( Q Y (cid:107) P Y )= t D ( Q X , Q Y (cid:107) P XY ) , (43)and ψ ( s, t ) := max Q X ,Q Y : D ( Q X (cid:107) P X )= s,D ( Q Y (cid:107) P Y )= t D ( Q X , Q Y (cid:107) P XY ) . (44)Define ˘ ϕ ( s, t ) as the lower convex envelope of ϕ ( s, t ) , and “ ψ ( s, t ) as the upper concave envelope of ψ ( s, t ) . Then Θ ∗ ( E , E ) = min s ≥ E ,t ≥ E ˘ ϕ ( s, t ) , (45) Θ ∗ ( E , E ) = max s ≤ E ,t ≤ E “ ψ ( s, t ) . (46)Hence Θ ∗ ( E , E ) is convex in ( E , E ) , and Θ ∗ ( E , E ) is concave in ( E , E ) . Theorems 4 and 5 imply thatboth Θ ( E , E ) and Θ ( E , E ) are achieved by a sequence of the time-sharing of at most three type codes. Herea type code is a code of the form ( A, B ) := ( T T X , T T Y ) for a pair of types ( T X , T Y ) .We next compare our strong version of forward and reverse small-set expansion theorems with the existingforward and reverse small-set expansion theorems in [11], [12]. Since the existing small-set expansion theorems areconsequences of hypercontractivity inequalities, we first introduce hypercontractivity [13], [14]. Define the forwardhypercontractivity region as R + ( P XY ) := (cid:110) ( p, q ) ∈ [1 , ∞ ) : E [ f ( X ) g ( Y )] ≤ (cid:107) f ( X ) (cid:107) p (cid:107) g ( Y ) (cid:107) q , ∀ f, g ≥ (cid:111) (47) = (cid:26) ( p, q ) ∈ [1 , ∞ ) : D ( Q X , Q Y (cid:107) P XY ) ≥ p D ( Q X (cid:107) P X ) + q D ( Q Y (cid:107) P Y ) , ∀ Q X (cid:28) P X , Q Y (cid:28) P Y (cid:27) (48) = (cid:26) ( p, q ) ∈ [1 , ∞ ) : ϕ ( E , E ) ≥ p E + 1 q E , ∀ E ∈ [0 , E , max ] , E ∈ [0 , E , max ] (cid:27) = (cid:26) ( p, q ) ∈ [1 , ∞ ) : Θ ∗ ( E , E ) ≥ p E + 1 q E , ∀ E ∈ [0 , E , max ] , E ∈ [0 , E , max ] (cid:27) (49) = (cid:26) ( p, q ) ∈ [1 , ∞ ) : lim t ↓ t Θ ∗ ( tE , tE ) ≥ p E + 1 q E , ∀ E ∈ [0 , E , max ] , E ∈ [0 , E , max ] (cid:27) (50) = (cid:110) ( p, q ) ∈ [1 , ∞ ) : P nXY ( A × B ) ≤ P nX ( A ) p P nY ( B ) q , ∀ A ⊆ X n , B ⊆ Y n , ∀ n ≥ (cid:111) , (51)and the reverse hypercontractivity region as R − ( P XY ) := (cid:110) ( p, q ) ∈ (0 , : E [ f ( X ) g ( Y )] ≥ (cid:107) f ( X ) (cid:107) p (cid:107) g ( Y ) (cid:107) q , ∀ f, g ≥ (cid:111) (52) = (cid:26) ( p, q ) ∈ (0 , : D ( Q X , Q Y (cid:107) P XY ) ≤ p D ( Q X (cid:107) P X ) + q D ( Q Y (cid:107) P Y ) , ∀ Q X (cid:28) P X , Q Y (cid:28) P Y (cid:27) (53) = (cid:26) ( p, q ) ∈ (0 , : ψ ( E , E ) ≤ p E + 1 q E , ∀ E ∈ [0 , E , max ] , E ∈ [0 , E , max ] (cid:27) = (cid:26) ( p, q ) ∈ (0 , : Θ ∗ ( E , E ) ≤ p E + 1 q E , ∀ E ∈ [0 , E , max ] , E ∈ [0 , E , max ] (cid:27) (54) = (cid:26) ( p, q ) ∈ (0 , : lim t ↓ t Θ ∗ ( tE , tE ) ≤ p E + 1 q E , ∀ E ∈ [0 , E , max ] , E ∈ [0 , E , max ] (cid:27) (55) = (cid:110) ( p, q ) ∈ (0 , : P nXY ( A × B ) ≥ P nX ( A ) p P nY ( B ) q , ∀ A ⊆ X n , B ⊆ Y n , ∀ n ≥ (cid:111) . (56) If p (cid:48) := p/ ( p − then the set of ≤ q ≤ p (cid:48) such that ( p, q ) ∈ R + ( P XY ) is called the forward hypercontractivityribbon [15], [16], [17] and the set of ≥ q ≥ p (cid:48) such that ( p, q ) ∈ R − ( P XY ) is called the reverse hypercontractivityribbon [18]. The equivalent characterizations of R + ( P XY ) and R − ( P XY ) in (48) and (53) were proven in [16],[18]. (49) and (54) follow by (45) and (46), and they imply that ( p, q ) ∈ R + ( P XY ) if and only if the plane p E + q E is below the surface Θ ∗ ( E , E ) for the region [0 , E , max ] × [0 , E , max ] , and ( p, q ) ∈ R − ( P XY ) if andonly if the plane p E + q E is above the surface Θ ∗ ( E , E ) for the same region [0 , E , max ] × [0 , E , max ] . (50)and (55) follow since Θ ∗ ( E , E ) is convex in ( E , E ) and Θ ∗ ( E , E ) is concave in ( E , E ) . (51) and (56)imply that R + ( P XY ) and R − ( P XY ) can be characterized by the joint and marginal probabilities in noninteractivesimulation. Hence the forward and reverse hypercontractivity sets are closely related to noninteractive simulationproblems. The forward and reverse small-set expansion theorems in [11], [12], shown as follows, follow by thetensorization property of R + ( P XY ) and R − ( P XY ) , and by setting f, g to Boolean functions in the n -letter versionsof (47) and (52). Proposition 2 (Forward and Reverse Small-Set Expansion Theorems) . [11], [12] For any n ≥ and E ∈ [0 , E , max ] , E ∈ [0 , E , max ] , Θ n ( E , E ) ≥ max ( p,q ) ∈R + ( P XY ) p E + 1 q E (57) Θ n ( E , E ) ≤ min ( p,q ) ∈R − ( P XY ) p E + 1 q E . (58) In particular, for the DSBS with correlation coefficient ρ > , Θ n ( E , E ) ≥ E + E − ρ √ E E − ρ , if ρ E ≤ E ≤ E ρ ,E , if E < ρ E ,E , if E > E ρ , (59) Θ n ( E , E ) ≤ E + E + 2 ρ √ E E − ρ . (60) Proposition 3.
For E ∈ [0 , E , max ] , E ∈ [0 , E , max ] , the RHS of (40) ≥ the RHS of (57), and the RHS of (41) ≤ the RHS of (58). Moreover, lim t ↓ t Θ ∗ ( tE , tE ) = RHS of (57) , (61) lim t ↓ t Θ ∗ ( tE , tE ) = RHS of (58) . (62) Remark . This proposition implies that, in general, our strong forward and reverse small-set expansion theorems aresharper than O’Donnell’s forward small-set expansion theorem and Mossel, O’Donnell, Regev, Steif, and SudakoVsreverse small-set expansion theorem. Moreover, for the limiting case of n → ∞ , our theorems reduce to theirs fora sequence of pairs (cid:16) E ( n )1 , E ( n )2 (cid:17) such that lim n →∞ E ( n )1 = lim n →∞ E ( n )2 = 0 , e.g., e − nE ( n )1 = a, e − nE ( n )2 = b forsome fixed < a, b < . Proof:
By (49) and (54), Θ ∗ ( E , E ) ≥ max ( p,q ) ∈R + ( P XY ) p E + 1 q E Θ ∗ ( E , E ) ≤ min ( p,q ) ∈R − ( P XY ) p E + 1 q E . Furthermore, (61) and (62) follow by (50) and (55).
1) Example: DSBS:
Consider a DSBS P XY with correlation ρ , i.e., P XY = X \ Y ρ − ρ − ρ ρ . Define k = (cid:16) ρ − ρ (cid:17) . Define D α,β ( p ) := D (cid:18) ( p, α − p, β − p, p − α − β ) (cid:107) (cid:18) ρ , − ρ , − ρ , ρ (cid:19)(cid:19) , D ( α, β ) := min ,α + β − ≤ p ≤ α,β D α,β ( p )= D α,β ( p ∗ ) , where p ∗ = ( k −
1) ( α + β ) + 1 − (cid:113) (( k −
1) ( α + β ) + 1) − k ( k − αβ k − . For the DSBS, ϕ ( s, t ) = D (cid:0) H − (1 − s ) , H − (1 − t ) (cid:1) ,ψ ( s, t ) = D (cid:0) H − (1 − s ) , − H − (1 − t ) (cid:1) , where H − is the inverse of the restriction of the binary entropy function H : t (cid:55)→ − t log t − (1 − t ) log (1 − t ) to the set (cid:2) , (cid:3) . Then Θ ∗ ( E , E ) and Θ ∗ ( E , E ) are determined by ϕ ( s, t ) and ψ ( s, t ) via (45) and (46).Moreover, by Theorems 4 and 5, Θ ( E , E ) is attained by a sequence involving the time-sharing of at most threepairs of concentric Hamming spheres, and Θ ( E , E ) is attained by a sequence involving the time-sharing of atmost three pairs of anti-concentric Hamming spheres. Proposition 4. Θ ( E , E ) is achieved by a sequence of pairs of concentric Hamming spheres, if min s ≥ E ,t ≥ E ϕ ( s, t ) is convex.2) Θ ( E , E ) is achieved by a sequence of pairs of anti-concentric Hamming spheres, if ψ ( s, t ) is concave in ( s, t ) .Remark . Ordentlich, Polyanskiy, and Shayevitz [4] conjectured that
Θ ( E , E ) is achieved by a sequence of pairsof concentric Hamming spheres, and Θ ( E , E ) is achieved by a sequence of pairs of anti-concentric Hammingspheres. Hence their conjecture is true under the assumptions in Proposition 4.The functions ϕ ( s, t ) and ψ ( s, t ) for ρ = 0 . are plotted in Fig. 1. Numerical results show that the assumptionsin Proposition 4 are true. Furthermore, interestingly, as mentioned in [4], the minimization part of the conjectureof Ordentlich, Polyanskiy, and Shayevitz implies a sharper outer bound for the zero-error capacity region of thebinary adder channel. This will be explained in Subsection III-B2.The study of the noninteractive simulation problem dates back to Gács and Körner’s and Witsenhausen’s seminalpapers [19], [20]. By utilizing the tensorization property of maximal correlation, Witsenhausen proved sharp bounds − ρ ≤ P nXY ( A × B ) ≤ ρ for the case P nX ( A ) = P nY ( B ) = , where the upper and lower bounds are respectivelyattained by symmetric ( n − -subcubes (e.g., A = B = { x : x = 1 } ) and anti-symmetric ( n − -subcubes (e.g., A = − B = { x : x = 1 } ). Recently, by combining Fourier analysis with a coding-theoretic result (i.e., Fu-Wei-Yeung’s linear programming bound on the average distance [21]), the first author and Tan [2] derived bounds − ρ − ρ ≤ P nXY ( A × B ) ≤ (cid:16) ρ (cid:17) for the case P nX ( A ) = P nY ( B ) = , where the upper bound is attained bysymmetric ( n − -subcubes (e.g., A = B = { x : x = x = 1 } ). Next, the first author and Tan [22] strengthened thebounds in [2] by improving Fu-Wei-Yeung’s linear programming bound on the average distance. Numerical resultsshow that the upper bound in [22] is strictly tighter than existing bounds for the case P nX ( A ) = P nY ( B ) = . Kahn,Kalai, and Linial [23] first applied the single-function version of (forward) hypercontractivity inequalities to obtainbounds for the noninteractive simulation problem, by substituting the nonnegative functions in the hypercontractivityinequalities with the Boolean functions. Mossel and O’Donnell [11], [12] applied the two-function version ofhypercontractivity inequalities to obtain bounds in a similar way. Kamath and the second author [17] utilizedhypercontractivity inequalities in a slightly different way, specifically by substituting the nonnegative functionswith two-valued functions (not restricted to be { , } -valued). In particular, Kamath-Anantharam’s bounds [17]outperform Witsenhausen’s maximal correlation bounds for any values of P nX ( A ) , P nY ( B ) , and also outperformother existing bounds for the cases P nX ( A ) = a, P nY ( B ) = b with a, b fixed but close to zero or the cases P nX ( A n ) = a n , P nY ( B n ) = b n with a n , b n vanishing subexponentially fast. Furthermore, as mentioned previously, ϕ ( s, t ) Θ ∗ ( s, t )Θ ∗ ( s, t ) = ψ ( s, t ) Figure 1. Illustration of ϕ ( s, t ) , ψ ( s, t ) , Θ ∗ ( E , E ) , and Θ ∗ ( E , E ) for ρ = 0 . . Note that Θ ∗ ( E , E ) and Θ ∗ ( E , E ) are expressedin terms of ϕ ( s, t ) and ψ ( s, t ) in (45) and (46). Ordentlich, Polyanskiy, and Shayevitz [4] studied the regime in which P nX ( A n ) , P nY ( B n ) vanish exponentiallyfast, and they solved the limiting cases ρ → , . The symmetric case P nX ( A n ) = P nY ( B n ) in this exponentialregime was solved by Kirshner and Samorodnitsky [7]. The bounds in [4] and [7] were derived by variousversions of stronger hypercontractivity inequalities. In this paper, we completely solved the noninteractive simulationproblem for any P nX ( A n ) , P nY ( B n ) in this exponential regime, by using information-theoretic methods. Furthermore,the noninteractive simulation problem with the noise model of Markov chains and multi-terminal versions ofnoninteractive simulation problems have also been studied in the literature; e.g., [11], [24].
2) Applications to Zero-Error Coding:
The zero-error capacity region C of the two-user binary adder channel(BAC) (or the rate region of uniquely decodable code pairs) is defined as the set of ( R , R ) for which there is asequence of pairs A n , B n ⊆ { , } n with | A n | = 2 n ( R + o (1)) , | B n | = 2 n ( R + o (1)) such that | A n + B n | = | A n | · | B n | for every n , where A n + B n denotes the sumset { a + b : a ∈ A n , b ∈ B n } , and a + b denotes addition over Z n . The reverse small-set expansion inequality given in (60) for the DSBS was used to prove the best knownbound R ≤ . when R = 1 by Austrin, Kaski, Koivisto, and Nederlof [25]. As mentioned by Ordentlich,Polyanskiy, and Shayevitz [4], repeating the arguments in [25] with improved bounds on Θ ( E , E ) will yieldtighter bounds on R when R = 1 . Replacing the reverse small-set expansion inequality in the proof given in [25]with our complete characterization of Θ ( E , E ) in Theorem 5, we obtain the following result. Theorem 6. If (1 − (cid:15), R ) ∈ C , then for any ρ ∈ (0 , , there exists some λ ∈ ± (cid:113) ln(2) (cid:15) such that Θ ∗ ( (cid:15), λ + (cid:15) − R ) ≥ λ (cid:18) − log (3 − ρ ) (cid:19) − − (cid:15) − (cid:114) ln(2) (cid:15) , where Θ ∗ is defined for the DSBS with correlation ρ . In particular, if (cid:15) = 0 , we obtain for any ρ ∈ (0 , , Θ ∗ (cid:18) , − R (cid:19) ≥ (cid:18) − log (3 − ρ ) (cid:19) . (63)Numerical results show that if R = 1 (i.e., (cid:15) = 0 ), by choosing the almost best ρ = 0 . , (63) implies R ≤ . , which improves the previously best known bound R ≤ . established in [25].IV. H YPERCONTRACTIVITY I NEQUALITIES
In this section, we relax Boolean functions in noninteractive simulation problems to any nonnegative functions,but still restrict their supports to be exponentially small. Let ( X , Y ) ∼ P nXY , where P XY is a joint distributiondefined on X × Y . We still assume that X , Y are finite. We next derive the following stronger (forward andreverse) hypercontractivity inequalities (more precisely, stronger versions of Hölder’s inequalities, hypercontractivityinequalities, and their reverses) by using Theorems 4 and 5. Our hypercontractivity inequalities reduce to the usualones when α = β = 0 . For α ≥ , β ≥ , define Λ p,q ( α, β ) := Θ ∗ ( α, β ) − αp − βq for p, q ∈ [1 , ∞ ) , Λ p,q ( α, β ) := αp + βq − Θ ∗ ( α, β ) for p, q ∈ (0 , . Theorem 7.
Let α ∈ [0 , E , max ] , β ∈ [0 , E , max ] . Let f, g be nonnegative functions such that P nX (supp ( f )) ≤ e − nα , P nY (supp ( g )) ≤ e − nβ . Then E [ f ( X ) g ( Y )] ≤ e − n Λ p,q ( α,β ) (cid:107) f ( X ) (cid:107) p (cid:107) g ( Y ) (cid:107) q , ∀ ( p, q ) ∈ R + ( P XY ) , (64) E [ f ( X ) g ( Y )] ≥ e n Λ p,q ( α,β ) (cid:107) f ( X ) (cid:107) p (cid:107) g ( Y ) (cid:107) q , ∀ ( p, q ) ∈ R − ( P XY ) . (65) where R + ( P XY ) and R − ( P XY ) are respectively defined in (47) and (52).Remark . By definition, Λ p,q ( α, β ) ≥ for ( p, q ) ∈ R + ( P XY ) , and Λ p,q ( α, β ) ≥ for ( p, q ) ∈ R − ( P XY ) . Remark . Given α, β ≥ , the inequalities (64) and (65) are sharp, in the sense that the exponents on the two sidesof (64) (resp. (65)) are asymptotically equal as n → ∞ , for a sequence of Boolean functions f n = 1 A n , g n = 1 B n with ( A n , B n ) asymptotically attaining Θ ( α, β ) (resp. Θ ( α, β ) ). Proof:
Our proof combines Theorems 4 and 5 with ideas from [7]. By the convexity of Θ ∗ , the concavity of Θ ∗ , and the definitions of R + ( P XY ) and R − ( P XY ) , we have that for ( p, q ) ∈ R + ( P XY ) , Λ p,q ( α, β ) = inf α (cid:48) ≥ α,β (cid:48) ≥ β Θ ∗ (cid:0) α (cid:48) , β (cid:48) (cid:1) − α (cid:48) p − β (cid:48) q , and for ( p, q ) ∈ R − ( P XY ) , Λ p,q ( α, β ) = inf α (cid:48) ≥ α,β (cid:48) ≥ β α (cid:48) p + β (cid:48) q − Θ ∗ (cid:0) α (cid:48) , β (cid:48) (cid:1) . We may assume, by homogeneity, that (cid:107) f (cid:107) = (cid:107) g (cid:107) = 1 . This means that f ≤ /P nX, min , g ≤ /P nY, min , andmoreover, n log (cid:107) f (cid:107) p and n log (cid:107) g (cid:107) q are uniformly bounded for all n ≥ . By the forward and reverse Hölderinequalities, n log E [ f ( X ) g ( Y )] is also uniformly bounded for all n ≥ . This implies that for sufficiently large a , the points at which f or g < e − na contribute little to (cid:107) f (cid:107) p , (cid:107) g (cid:107) q , and E [ f ( X ) g ( Y )] , in the sense that if weset f, g to be zero at these points, then n log (cid:107) f (cid:107) p , n log (cid:107) g (cid:107) q , and n log E [ f ( X ) g ( Y )] only change amountsin order of o n (1) , where o n (1) denotes a term vanishing as n → ∞ . All the remaining points of X n can bepartitioned into r = r ( a, b ) level sets A , ..., A r such that f varies by a factor of e nb at most in each level set,where b > . Similarly, all the remaining points of Y n can be partitioned into s = s ( a, b ) level sets B , ..., B s . Let α i = − n log P nX ( A i ) , β i = − n log P nY ( B i ) , and let µ i = n log( u i ) , ν i = n log( v i ) , where u i , v i are respectively the median value of f on A i and the median value of g on B i . Obviously, f = u i e no b (1) on the set A i , and g = v i e no b (1) on the set B i , where o b (1) denotes a term vanishing as b → . Moreover, α i ≥ α, β j ≥ β, ∀ i, j . Then, n log (cid:107) f (cid:107) p = 1 np log (cid:32) r (cid:88) i =1 P nX ( A i ) u i (cid:33) + o b (1) + o n (1)= N X ( p ) + o b (1) + o n | b (1) , where N X ( p ) := max ≤ i ≤ r (cid:110) − α i p + µ i (cid:111) and o n | b (1) is a term that vanishes as n → ∞ for any given b . Similarly, n log (cid:107) g (cid:107) q = N Y ( q ) + o b (1) + o n | b (1) , where N Y ( q ) := max ≤ i ≤ s (cid:110) − β i q + ν i (cid:111) .Utilizing these equations, we obtain n log E [ f ( X ) g ( Y )] = 1 n log E r (cid:88) i =1 s (cid:88) j =1 P nXY ( A i × B j ) u i v j + o b (1) + o n (1) ≤ max ≤ i ≤ r, ≤ j ≤ s {− Θ ∗ ( α i , β j ) + µ i + ν j } + o b (1) + o n | b (1) ≤ max ≤ i ≤ r, ≤ j ≤ s (cid:26) − Θ ∗ ( α i , β j ) + α i p + β j q (cid:27) + N X ( p ) + N Y ( q ) + o b (1) + o n | b (1) ≤ − Λ p,q ( α, β ) + N X ( p ) + N Y ( q ) + o b (1) + o n | b (1) , (66)where (104) follows since Θ ∗ is convex. Similarly, n log E [ f ( X ) g ( Y )] = 1 n log E r (cid:88) i =1 s (cid:88) j =1 P nXY ( A i × B j ) u i v j + o b (1) + o n (1) ≥ max ≤ i ≤ r, ≤ j ≤ s (cid:110) − Θ ∗ ( α i , β j ) + µ i + ν j (cid:111) + o b (1) + o n | b (1) ≥ − Θ ∗ ( α i ∗ , β j ∗ ) + µ i ∗ + ν j ∗ + o b (1) + o n | b (1)= − Θ ∗ ( α i ∗ , β j ∗ ) + α i ∗ p + β j ∗ q + N X ( p ) + N Y ( q ) + o b (1) + o n | b (1) ≥ Λ p,q ( α, β ) + N X ( p ) + N Y ( q ) + o b (1) + o n | b (1) , where i ∗ is the optimal i attaining N X ( p ) and j ∗ is the optimal j attaining N Y ( q ) . These imply the inequalities(64) and (65), up to a factor of e n ( o b (1)+ o n | b (1)) . To remove this factor, applying these results to k copies of ( f, g ) and letting k → ∞ , we then obtain (64) and (65) , up to a factor of e no b (1) . Lastly, letting b → , we obtain (64)and (65).For α ∈ [0 , E , max ] , β ∈ [0 , E , max ] , define R + α,β ( P XY ) := (cid:110) ( p, q ) ∈ [1 , ∞ ) : Θ ∗ ( E , E ) ≥ p E + q E , ∀ E ∈ [ α, E , max ] , E ∈ [ β, E , max ] (cid:111) , (67) R − α,β ( P XY ) := (cid:110) ( p, q ) ∈ (0 , : Θ ∗ ( E , E ) ≤ p E + q E , ∀ E ∈ [ α, E , max ] , E ∈ [ β, E , max ] (cid:111) . (68)Following similar steps to the proof of Theorem 7, it is easy to obtain the following new version of hypercontractivity. Theorem 8.
Under the assumption in Theorem 7, it holds that E [ f ( X ) g ( Y )] ≤ (cid:107) f ( X ) (cid:107) p (cid:107) g ( Y ) (cid:107) q , ∀ ( p, q ) ∈ R + α,β ( P XY ) , (69) E [ f ( X ) g ( Y )] ≥ (cid:107) f ( X ) (cid:107) p (cid:107) g ( Y ) (cid:107) q , ∀ ( p, q ) ∈ R − α,β ( P XY ) . (70) Remark . Given α, β , the inequality (69) is sharp, in the sense that for any ( p, q ) ∈ [1 , ∞ ) \R + α,β ( P XY ) , thereexists a pair ( f, g ) that satisfies the assumption in Theorem 7 but violates (69). The inequality (70) is sharp in asimilar sense. Note that the hypercontractivity inequalities in Theorem 7 differ from the common ones in the factors e − n Λ p,q ( α,β ) and e n Λ p,q ( α,β ) ; while the ones in Theorem 8 differ from the common ones in the region of parameters p, q .Strengthening the forward hypercontractivity was previously studied in [6], [7]. Polyanskiy and Samorodnitsky [6]strengthened the hypercontractivity inequalities in a similar sense to Theorem 7; while Kirshner and Samorodnitsky[7] strengthened the hypercontractivity inequalities in a similar sense to Theorem 8. However, both works in [6], [7]focused on strengthening the single-function version of forward hypercontractivity. Moreover, the hypercontractivityinequalities in [6] are only sharp at extreme cases, and only DSBSes were considered in [7].V. C ONCLUDING R EMARKS
The maximal density and the biclique rate region have been studied in this paper. One may be also interestedin their counterparts—the minimal density and the independent-set rate region. Given ≤ M ≤ |T T X | , ≤ M ≤|T T Y | , define the minimal density of subgraphs with size ( M , M ) respectively as Γ n ( M , M ) := min A ⊆T TX ,B ⊆T TY : | A | = M , | B | = M ρ ( G [ A, B ]) . If the edge density of a subgraph in G is equal to , then the set of vertices in this subgraph is called an independentset of G . Given n and T XY , similar to the biclique rate region, we define the independent-set rate region as R n ( T XY ) := (cid:110) ( R , R ) ∈ R ( n ) X × R ( n ) Y : Γ n (cid:0) e nR , e nR (cid:1) = 0 (cid:111) . Then one can easily obtain the following inner bound and outer bound on R n ( T XY ) . Proposition 5.
For any n and T XY , (cid:16) R ( i ) ( T XY ) − [0 , ε ,n ] × [0 , ε ,n ] (cid:17) ∩ (cid:16) R ( n ) X × R ( n ) Y (cid:17) ⊆ R n ( T XY ) ⊆ R ( o ) ( T XY ) ∩ (cid:16) R ( n ) X × R ( n ) Y (cid:17) for some positive sequences { ε ,n } and { ε ,n } which both vanish as n → ∞ , where R ( o ) ( T XY ) := { ( R , R ) : R ≤ H ( X ) , R ≤ H ( Y ) } , (71) R ( i ) ( T XY ) := (cid:91) P W ,P X | W ,P Y | W : P W P X | W ,P W P Y | W are n -types ,P X = T X ,P Y = T Y ,Q XY (cid:54) = T XY , ∀ Q XY | W ∈ C ( P X | W ,P Y | W ) { ( R , R ) : R ≤ H ( X | W ) , R ≤ H ( Y | W ) } . (72)The inner bound above can be proven by using the codes used in proving the achievability part of Theorem 1.The outer bound above is trivial. Determining the asymptotics of R n ( T XY ) is of interest to the present authors.However, currently, we have no idea how to tackle it. In addition, if R n ( T XY ) is not asymptotically equal to R ( o ) ( T XY ) , then determining the exponent of the minimal density is also interesting.A PPENDIX AP ROOF OF L EMMA F .Statement 3): By symmetry, it suffices to only consider the case R = 0 . By Statement 2), F (0 , R ) ≤ min { R , H T ( Y | X ) } . On the other hand, if R ≥ H T ( Y | X ) , then we choose W = X , which leads to H ( X | W ) =0 and H ( Y | W ) = H ( XY | W ) = H T ( Y | X ) . Hence we have F (0 , R ) = H T ( Y | X ) for R ≥ H T ( Y | X ) . If R ≤ H T ( Y | X ) , then one can find a random variable U such that H ( Y | XU ) = R . For example, we choose U = ( V, J ) with V defined on X ∪ Y and J defined on { , } such that V = X if J = 0 and V = Y if J = 1 ,where J ∼ Bern ( α ) for α := R /H T ( Y | X ) is independent of ( X, Y ) . Set W = ( X, U ) . We have H ( X | W ) = 0 and H ( Y | W ) = H ( XY | W ) = H ( XY | W, J ) = R . Hence we have F (0 , R ) = R for R ≤ H T ( Y | X ) .Statement 4): Let P XY W attain F ( R , R ) , and P XY W attain F (cid:16) (cid:98) R , (cid:98) R (cid:17) . For < α < , define J ∼ Bern ( α ) independent of ( X, Y, W , W ) and let W := W J , taking values in W ∪ W , where W j denotes the alphabet of W j for j = 0 , . Note that J is a deterministic function of W . Then P XY W induces H ( XY | W ) = αH ( XY | W ) + (1 − α ) H ( XY | W ) ,H ( X | W ) = αH ( X | W ) + (1 − α ) H ( X | W ) ,H ( Y | W ) = αH ( Y | W ) + (1 − α ) H ( Y | W ) . Therefore, F (cid:16) αR + (1 − α ) (cid:98) R , αR + (1 − α ) (cid:98) R (cid:17) ≥ αF ( R , R ) + (1 − α ) F (cid:16) (cid:98) R , (cid:98) R (cid:17) . Statement 5): If δ = δ = 0 , there is nothing to prove. If δ > δ = 0 , then, for t ≥ , G ( t ) := F ( R , t ) is nondecreasing and concave, by Statements 1) and 4). Hence, for fixed δ , G ( t + δ ) − G ( t ) δ is nonincreasing in t . Combining this with Statements 2) and 3) yields G ( t + δ ) − G ( t ) δ ≤ G ( δ ) − G (0) δ ≤ δ + min { R , H T ( X | Y ) } − min { R , H T ( X | Y ) } δ = 1 . Setting t = R , we obtain F ( R , R + δ ) − F ( R , R ) ≤ δ , as desired.By symmetry, the claim also holds in the case δ > δ = 0 . Now we consider the case δ , δ > . Without lossof generality, we assume R δ ≥ R δ . For t ≥ − R δ , define G ( t ) := F ( R + δ t, R + δ t ) . By Statements 1) and 4), G is nondecreasing and concave. Hence, for fixed δ , G ( t + 1) − G ( t ) is nonincreasing in t . Combining this with Statements 2) and 3) yields that for t ≥ − R δ , G ( t + 1) − G ( t ) ≤ G (cid:18) − R δ + 1 (cid:19) − G (cid:18) − R δ (cid:19) = F (cid:18) R − δ R δ + δ , δ (cid:19) − F (cid:18) R − δ R δ , (cid:19) ≤ min (cid:26) R − δ R δ + δ , H T ( X | Y ) (cid:27) + δ − min (cid:26) R − δ R δ , H T ( X | Y ) (cid:27) ≤ δ + δ . Setting t = 0 , we obtain F ( R + δ , R + δ ) − F ( R , R ) ≤ δ + δ , as desired.A PPENDIX BP ROOF OF L EMMA k = 2 . Cases of k ≥ follow similarly. For the purposes of this argument wetreat vectors in R n as row vectors.For the pair of orthogonal subspaces (cid:0) V, V ⊥ (cid:1) with dimensions respectively n , n − n , let { u j : 1 ≤ j ≤ n } bean orthogonal basis of V , and { u j : n + 1 ≤ j ≤ n } be an orthogonal basis of V ⊥ . Then { u j : 1 ≤ j ≤ n } formsan orthogonal basis of R n . Denote U as the n -square matrix with j -th row being u j . Then U is orthogonal. Wenow express x ∈ V and y ∈ V ⊥ by using this new orthogonal basis. x = (cid:98) xUy = (cid:98) yU where (cid:98) x := xU (cid:62) , (cid:98) y := yU (cid:62) , and U (cid:62) is the transpose of U . Since for any x ∈ V , (cid:104) x , u j (cid:105) = 0 for all n + 1 ≤ j ≤ n , we obtain that (cid:98) x j = 0 for all n + 1 ≤ j ≤ n . Similarly, (cid:98) y j = 0 for all ≤ j ≤ n . Hence we canrewrite (cid:98) x = ( (cid:98) x , ) , (cid:98) y = ( , (cid:98) y ) . We write U in a block form: U = (cid:20) U U (cid:21) where U , U are respectively of size n × n, ( n − n ) × n . Then x = (cid:98) x U , y = (cid:98) y U . (73)We now need the following algebraic version of exchange lemma. Lemma 3 (Algebraic Exchange Lemma) . [26, Theorem 3.2] Let k ≥ be an integer. Let B be an n × n nonsingularmatrix, and {H l , ≤ l ≤ k } be a partition of [ n ] . Then there always exists another partition {L l , ≤ l ≤ k } of [ n ] with |L l | = |H l | such that all the sub-matrices B H l , L l , ≤ l ≤ k are nonsingular. The proof of this lemma follows easily from repeated use of the Laplace expansion for determinants, given asfollows. Let B = ( b i,j ) be an n × n matrix and H a subset of [ n ] . Then the determinant of B can be expanded asfollows: det ( B ) = (cid:88) L⊆ [ n ]: |L| = |H| ε H , L det ( B H , L ) det ( B H c , L c ) where ε H , L is the sign of the permutation determined by H and L , equal to ( − (cid:80) h ∈H h ) + ( (cid:80) (cid:96) ∈L (cid:96) ) .Substituting B ← U , H ← [ n ] , H ← [ n + 1 : n ] in this lemma, we obtain that there exists a partition {J , J c } of [ n ] with |J | = n such that both the sub-matrices U [ n ] , J , U [ n +1: n ] , J c are nonsingular. Denote U , J as the submatrix of U consisting of J - indexed columns of U , and define U , J c , U , J , U , J c similarly. Then,by definition, U , J = U [ n ] , J , U , J c = U [ n +1: n ] , J c . Therefore, from (73), we have (cid:98) x = x J U − , J , (cid:98) y = y J c U − , J c . Substituting these back into (73), we obtain that ( x J , x J c ) = x J U − , J ( U , J , U , J c )= (cid:16) x J , x J U − , J U , J c (cid:17) and ( y J , y J c ) = (cid:16) y J c U − , J c U , J , y J c (cid:17) . Hence the proof is completed. A
PPENDIX CP ROOF OF P ROPOSITION R ( T XY ) = R ∗ ( T XY ) . Furthermore, R ∗ ( T XY ) is a closed convex set (seeRemark 4). Hence R ( T XY ) = R (cid:52) ( T XY ) if and only if max ≤ α ≤ ,P XY ,Q XY : αP XY +(1 − α ) Q XY = T XY ϕ α ( P XY , Q XY ) ≤ , (74)where ϕ α ( P XY , Q XY ) := αβ H P ( X | Y ) + − αβ H Q ( Y | X ) with β := H T ( X | Y ) , β := H T ( Y | X ) . Here thedomain of ϕ α can be taken to be the set of pairs of probability distributions ( P XY , Q XY ) such that supp( P XY ) =supp( Q XY ) ⊆ supp( T XY ) . Moreover, (74) can be rewritten as that for any ≤ α ≤ , max P XY ,Q XY : αP XY +(1 − α ) Q XY = T XY ϕ α ( P XY , Q XY ) ≤ . (75)Observe that ϕ α ( T XY , T XY ) = 1 . Hence (75) can be rewritten as that for any < α < , P XY = Q XY = T XY is an optimal solution to the LHS of (75). Next we study for what kind of T XY it holds for all < α < that P XY = Q XY = T XY is an optimal solution to the LHS of (75).Given < α < , observe that αP XY + (1 − α ) Q XY is linear in ( P XY , Q XY ) , and ϕ α ( P XY , Q XY ) is concavein ( P XY , Q XY ) (which can be shown by the log sum inequality [27, Theorem 2.7.1]). Hence the LHS of (75)is a linearly-constrained convex optimization problem. This means that showing that the pair ( T XY , T XY ) is anextremum for this convex optimization problem iff T XY satisfies the conditions given in Corollary 1 is equivalentto establishing that ( T XY , T XY ) is an optimum for the convex optimization problem (thus establishing (75) for < α < ) iff T XY satisfies the conditions given in Corollary 1. Since the notion of extremality is local, to show this it suffices to consider the modified version of this convex optimization problem where the domain of ϕ α is takento be the set of pairs of probability distributions ( P XY , Q XY ) such that supp( P XY ) = supp( Q XY ) = supp( T XY ) .We are thus led to consider the Lagrangian L = ϕ α ( P XY , Q XY ) + (cid:88) ( x,y ) ∈ supp( T XY ) η ( x, y ) ( αP ( x, y ) + (1 − α ) Q ( x, y ) − T ( x, y ))+ µ (cid:88) ( x,y ) ∈ supp( T XY ) P ( x, y ) − + µ (cid:88) ( x,y ) ∈ supp( T XY ) Q ( x, y ) − , whose extrema are given by the Karush–Kuhn–Tucker (KKT) conditions: ∂L∂P ( x, y ) = − αβ log P ( x | y ) + αη ( x, y ) + µ = 0 , ∀ ( x, y ) ∈ supp( T XY ) , (76) ∂L∂Q ( x, y ) = − − αβ log Q ( y | x ) + (1 − α ) η ( x, y ) + µ = 0 , ∀ ( x, y ) ∈ supp( T XY ) , (77) αP ( x, y ) + (1 − α ) Q ( x, y ) = T ( x, y ) , ∀ ( x, y ) ∈ supp( T XY ) , (78) (cid:88) ( x,y ) ∈ supp( T XY ) P ( x, y ) = 1 , (79) (cid:88) ( x,y ) ∈ supp( T XY ) Q ( x, y ) = 1 , (80) P ( x, y ) , Q ( x, y ) > , ∀ ( x, y ) ∈ supp( T XY ) , (81)for some reals η ( x, y ) , µ , µ with ( x, y ) ∈ supp( T XY ) . Here the conditions in (81) come from the restriction wehave imposed on the domain of ϕ α .We first prove “if” part. That is, for T XY satisfying the conditions given in Corollary 1, given any < α < , P XY = Q XY = T XY together with some reals η ( x, y ) , µ , µ must satisfy (76)-(81). To this end, we choose η ( x, y ) = β log T ( x | y ) = β log T ( y | x ) , µ = µ = 0 , which satisfy (76) and (77).We next consider the “only if” part. Substituting P = Q = T and taking expectations with respect to the type T XY for the both sides of (76) and (77), we obtain that µ α = µ − α . (82)Substituting this back to (76) and (77) yields that T X | Y ( x | y ) /H T ( X | Y ) = T Y | X ( y | x ) /H T ( Y | X ) for all x, y .A PPENDIX DP ROOF OF T HEOREM
Θ ( E , E ) ≤ Θ ∗ ( E , E ) for all E ∈ [0 , E , max ] and E ∈ [0 , E , max ] ,which we call the achievability part of the theorem. Define g n ( E , E ) := min n -type Q XY W : D ( Q X | W (cid:107) P X | Q W ) ≥ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≥ E D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) . Then g n ( E , E ) differs from Θ ∗ ( E , E ) in that Q XY W is restricted to be a joint n -type. Let Q XY W be anoptimal joint n -type attaining g n ( E , E ) . For a fixed sequence w with type Q W , we choose A = T Q X | W ( w ) , B = T Q Y | W ( w ) . Then P nX ( A ) = (cid:12)(cid:12) T Q X | W ( w ) (cid:12)(cid:12) e n (cid:80) x Q X ( x ) log P X ( x ) ≤ e nH Q ( X | W ) e n (cid:80) x Q X ( x ) log P X ( x ) = e − nD ( Q X | W (cid:107) P X | Q W ) ≤ e − nE . Similarly, P nY ( B ) ≤ e − nE . Furthermore, P nXY ( A × B ) = P nXY (cid:0) T Q X | W ( w ) × T Q Y | W ( w ) (cid:1) = (cid:88) T WXY : T WX = Q WX ,T WY = Q WY P nXY (cid:0) T T XY | W ( w ) (cid:1) ≥ P nXY (cid:0) T Q XY | W ( w ) (cid:1) ≥ e − nD ( Q XY | W (cid:107) P XY | Q W ) −|W||X ||Y| log( n +1) , where the last inequality follows from [9, Lemma 2.6]. Therefore, Θ n ( E , E ) ≤ g n ( E , E ) + |W| |X | |Y| n log ( n + 1) . (83)Taking limits, we have Θ ( E , E ) ≤ lim inf n →∞ g n ( E , E ) . (84)We next prove that the RHS above is in turn upper bounded by Θ ∗ ( E , E ) .We first consider the case of E < E , max , E < E , max . Let δ > be such that E + δ < E , max , E + δ < E , max .For any Q XY W attaining Θ ∗ ( E + δ, E + δ ) , one can find a sequence of joint n -types Q ( n ) W XY that converges to Q XY W (under the TV distance). Observe that D (cid:0) Q X | W (cid:107) P X | Q W (cid:1) , D (cid:0) Q Y | W (cid:107) P Y | Q W (cid:1) , D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) are continuous in Q XY W because we have assumed that P X and P Y have full support. Hence D (cid:16) Q ( n ) X | W (cid:107) P X | Q ( n ) W (cid:17) = D (cid:0) Q X | W (cid:107) P X | Q W (cid:1) + o (1) ≥ E + δ + o (1) , (85) D (cid:16) Q ( n ) Y | W (cid:107) P Y | Q ( n ) W (cid:17) = D (cid:0) Q Y | W (cid:107) P Y | Q W (cid:1) + o (1) ≥ E + δ + o (1) , (86) D (cid:16) Q ( n ) XY | W (cid:107) P XY | Q ( n ) W (cid:17) = D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) + o (1) = Θ ∗ ( E + δ, E + δ ) + o (1) . (87)For fixed δ > and sufficiently large n , we have E + δ + o (1) ≥ E , E + δ + o (1) ≥ E , which implies, that Θ ∗ ( E + δ, E + δ ) ≥ g n ( E , E ) + o (1) . Combining this with (84) yields Θ ( E , E ) ≤ Θ ∗ ( E + δ, E + δ ) . (88)On the other hand, since Θ ∗ ( E , E ) is convex in ( E , E ) and nondecreasing in one parameter given the otherparameter, we have that given ( E , E ) , δ (cid:55)→ Θ ∗ ( E + δ, E + δ ) is convex and nondecreasing. Hence δ (cid:55)→ Θ ∗ ( E + δ, E + δ ) is continuous on [0 , min { E , max − E , E , max − E } ) . Taking δ ↓ , we obtain Θ ( E , E ) ≤ Θ ∗ ( E , E ) .We next consider the case of E = E , max , E < E , max . Let δ > be such that E + δ < E , max . Let (cid:98) X := { x : − log P X ( x ) = E , max } . Since we consider the case E = E , max , for this case, we claim that anyoptimal Q XY W attaining Θ ∗ ( E , max , E + δ ) satisfies that supp( Q X ) ⊆ (cid:98) X , and X is a function of W under thedistribution Q XY W . This is because, on one hand, D ( Q X | W (cid:107) P X | Q W ) = − H Q ( X | W ) − E Q X log P X ( X ) ≤ − E Q X log P X ( X ) ≤ E , max ; (89)and on the other hand, we require D ( Q X | W (cid:107) P X | Q W ) ≥ E = E , max . Hence, all the inequalities in (89) are infact equalities, which can happen only under the conditions in our claim above. Hence, our claim above holds.By our claim, without loss of optimality, we can restrict Q XY W to be a distribution on (cid:98)
X × Y × W . For anyoptimal Q XY W on (cid:98) X × Y × W , one can find a sequence of joint n -types Q ( n ) Y W that converges to Q Y W (under theTV distance), which in turn implies that Q ( n ) W Y Q X | W converges to Q Y W X . It is easy to verify that for this case, D (cid:16) Q X | W (cid:107) P X | Q ( n ) W (cid:17) = D (cid:0) Q X | W (cid:107) P X | Q W (cid:1) = E , max . Moreover, (86) and (87) still hold for this case. Hence, byan argument similar to the one below (87), we obtain Θ ( E , E ) ≤ Θ ∗ ( E , E ) .By symmetry, the case of E < E , max , E = E , max follows similarly.We lastly consider the case of E = E , max , E = E , max . For this case, it is easy to verify that Θ ( n ) ( E , E ) =Θ ∗ ( E , E ) = − log max x ∈ (cid:98) X ,y ∈ (cid:98) Y P XY ( x, y ) , where (cid:98) X := { x : − log P X ( x ) = E , max } and (cid:98) Y := { y : − log P Y ( y ) = E , max } .Combining all the cases above, we complete the achievability proof. Converse part: Now we prove that
Θ ( E , E ) ≥ Θ ∗ ( E , E ) for all E ∈ [0 , E , max ] and E ∈ [0 , E , max ] ,which we call the converse part. We first provide a multiletter bound. Let A ⊆ X n and B ⊆ Y n satisfy P nX ( A ) ≤ e − nE and P nY ( B ) ≤ e − nE respectively. Define Q XY ( x , y ) := P nXY ( x , y )1 { x ∈ A, y ∈ B } P nXY ( A × B ) . Then Q X ( x ) = P nXY ( x ,B )1 { x ∈ A } P nXY ( A × B ) , Q Y ( y ) = P nXY ( A, y )1 { y ∈ B } P nXY ( A × B ) , and moreover D ( Q XY (cid:107) P nXY ) = (cid:88) x ∈ A, y ∈ B P nXY ( x , y ) P nXY ( A × B ) log 1 P nXY ( A × B ) = log 1 P nXY ( A × B ) , (90)i.e., P nXY ( A × B ) = e − D ( Q XY (cid:107) P nXY ) . On the other hand, by D (cid:16) Q X (cid:107) (cid:101) P X (cid:17) ≥ where (cid:101) P X ( x ) = P nX ( x )1 { x ∈ A } P nX ( A ) , weobtain − n log P nX ( A ) ≤ n D ( Q X (cid:107) P nX ) . (91)Combining this with P nX ( A ) ≤ e − nE yields n D ( Q X (cid:107) P nX ) ≥ E . (92)Similarly, n D ( Q Y (cid:107) P nY ) ≥ E . (93)We now relax Q XY to an arbitrary joint distribution with marginals satisfying (92) and (93). Then by (90), weobtain the following multiletter bound. Θ n ( E , E ) ≥ inf Q X , Y : n D ( Q X (cid:107) P nX ) ≥ E , n D ( Q Y (cid:107) P nY ) ≥ E n D ( Q XY (cid:107) P nXY ) . (94)We next single-letterize the multiletter bound in (94). Observe that n D ( Q XY (cid:107) P nXY ) = D (cid:0) Q X J Y J | X J − Y J − J (cid:107) P XY | Q X J − Y J − J (cid:1) = D (cid:0) Q XY | UV (cid:107) P XY | Q UV (cid:1) = D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) , (95)where J ∼ Unif [ n ] , X := X J , Y := Y J , U := X J − J, V := Y J − J, W = U V, (96)with J independent of X , Y in Q XY J . On the other hand, n D ( Q X (cid:107) P nX ) = D (cid:0) Q X J | X J − J (cid:107) P X | Q X J − J (cid:1) = D (cid:0) Q X | U (cid:107) P X | Q U (cid:1) ≤ D (cid:0) Q X | UV (cid:107) P X | Q UV (cid:1) = D (cid:0) Q X | W (cid:107) P X | Q W (cid:1) , (97)where the inequality follows by the data processing inequality concerning the relative entropy (i.e., the relativeentropy of two joint distributions is not smaller than the one of their marginals). Similarly, n D ( Q Y (cid:107) P nY ) ≤ D (cid:0) Q Y | W (cid:107) P Y | Q W (cid:1) . (98)Substituting (95), (97), and (98) into (94), we have Θ n ( E , E ) ≥ min Q XY W : D ( Q X | W (cid:107) P X | Q W ) ≥ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≥ E D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) . (99)The alphabet bound of W follows by the support lemma in [8]. A PPENDIX EP ROOF OF T HEOREM E = 0 or E = 0 can be verified easily. We next consider the case E , E > .We first deal with the achievability part of the claim, i.e. that Θ ( E , E ) ≥ Θ ∗ ( E , E ) , and then with the conversepart, i.e. that Θ ( E , E ) ≤ Θ ∗ ( E , E ) .Achievability part: Define g n ( E , E ) := max Q W ,Q X | W ,Q Y | W : Q WX ,Q WY are n -types D ( Q X | W (cid:107) P X | Q W ) ≤ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≤ E min Q XY | W ∈ C ( Q X | W ,Q Y | W ) D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) . Then g n ( E , E ) differs from Θ ∗ ( E , E ) in that Q W X and Q W Y are restricted to be n -types. Hence g n ( E , E ) ≤ Θ ∗ ( E , E ) . Let δ > be sufficiently small. Let (cid:0) Q W , Q X | W , Q Y | W (cid:1) be such that Q W X and Q W Y are optimal joint n -types attaining g n ( E − δ, E − δ ) . For a fixed sequence w with type Q W , we choose A := T Q X | W ( w ) , B := T Q Y | W ( w ) . Then P nX ( A ) = (cid:12)(cid:12) T Q X | W ( w ) (cid:12)(cid:12) e n (cid:80) x Q X log P X ≥ n + 1) |W||X | e nH Q ( X | W ) e n (cid:80) x Q X log P X = 1( n + 1) |W||X | e − nD ( Q X | W (cid:107) P X | Q W ) ≥ n + 1) |W||X | e − n ( E − δ ) , where the first inequality follows from [9, Lemma 2.5]. For sufficiently large n , we have P nX ( A ) ≥ e − nE . Similarly,for sufficiently large n , we have P nY ( B ) ≥ e − nE . Furthermore, we have P nXY ( A × B ) = P nXY (cid:0) T Q X | W ( w ) × T Q Y | W ( w ) (cid:1) = (cid:88) Q XY | W ∈ C ( Q X | W ,Q Y | W ): Q WXY is an n -type P nXY (cid:0) T Q XY | W ( w ) (cid:1) ≤ ( n + 1) |W||X ||Y| max Q XY | W ∈ C ( Q X | W ,Q Y | W ): Q WXY is an n -type P nXY (cid:0) T Q XY | W ( w ) (cid:1) ≤ e − n min QXY | W ∈ C ( QX | W ,QY | W ) D ( Q XY | W (cid:107) P XY | Q W )+ |W||X ||Y| log( n +1) = e − ng n ( E − δ,E − δ )+ |W||X ||Y| log( n +1) , where the first inequality follows from [9, Lemma 2.6]. Therefore, Θ n ( E , E ) ≥ g n ( E − δ, E − δ ) − |W| |X | |Y| n log ( n + 1) . (100)Taking limits, we have Θ ( E , E ) ≥ lim sup n →∞ g n ( E − δ, E − δ ) . (101)Observe that D (cid:0) Q X | W (cid:107) P X | Q W (cid:1) , D (cid:0) Q Y | W (cid:107) P Y | Q W (cid:1) are continuous in (cid:0) Q W , Q X | W , Q Y | W (cid:1) , because we haveassumed that P X and P Y have full support. We claim that f (cid:0) Q W , Q X | W , Q Y | W (cid:1) := min Q XY | W ∈ C ( Q X | W ,Q Y | W ) D (cid:0) Q XY | W (cid:107) P XY | Q W (cid:1) is also continuous in (cid:0) Q W , Q X | W , Q Y | W (cid:1) , which follows by the following lemma. Lemma 4. [28, Lemma 13] Let P X , Q X be distributions on X , and P Y , Q Y distributions on Y . Then for any Q XY ∈ C ( Q X , Q Y ) , there exists P XY ∈ C ( P X , P Y ) such that (cid:107) P XY − Q XY (cid:107) ≤ (cid:107) P X − Q X (cid:107) + (cid:107) P Y − Q Y (cid:107) . (102) Proof:
Let Q X (cid:48) X ∈ C ( P X , Q X ) and Q Y (cid:48) Y ∈ C ( P Y , Q Y ) . Define Q X (cid:48) XY Y (cid:48) = Q X (cid:48) | X Q XY Q Y (cid:48) | Y . Hence Q X (cid:48) Y (cid:48) ∈ C ( P X , P Y ) . Obviously, Q X (cid:48) XY Y (cid:48) (cid:8) ( x (cid:48) , x, y, y (cid:48) ) : ( x, y ) (cid:54) = ( x (cid:48) , y (cid:48) ) (cid:9) ≤ Q XX (cid:48) (cid:8) ( x, x (cid:48) ) : x (cid:54) = x (cid:48) (cid:9) + Q Y Y (cid:48) (cid:8) ( y, y (cid:48) ) : y (cid:54) = y (cid:48) (cid:9) . (103)Taking the infimum over all Q X (cid:48) X ∈ C ( P X , Q X ) , Q Y (cid:48) Y ∈ C ( P Y , Q Y ) for both sides of (103), we have (cid:107) Q X (cid:48) Y (cid:48) − Q XY (cid:107) = inf P X (cid:48) Y (cid:48) XY ∈ C ( Q X (cid:48) Y (cid:48) ,Q XY ) P X (cid:48) XY Y (cid:48) (cid:8) ( x (cid:48) , x, y, y (cid:48) ) : ( x, y ) (cid:54) = ( x (cid:48) , y (cid:48) ) (cid:9) ≤ inf Q X (cid:48) X ∈ C ( P X ,Q X ) ,Q Y (cid:48) Y ∈ C ( P Y ,Q Y ) Q X (cid:48) XY Y (cid:48) (cid:8) ( x (cid:48) , x, y, y (cid:48) ) : ( x, y ) (cid:54) = ( x (cid:48) , y (cid:48) ) (cid:9) ≤ inf Q X (cid:48) X ∈ C ( P X ,Q X ) Q XX (cid:48) (cid:8) ( x, x (cid:48) ) : x (cid:54) = x (cid:48) (cid:9) + inf Q Y (cid:48) Y ∈ C ( P Y ,Q Y ) Q Y Y (cid:48) (cid:8) ( y, y (cid:48) ) : y (cid:54) = y (cid:48) (cid:9) = (cid:107) P X − Q X (cid:107) + (cid:107) P Y − Q Y (cid:107) . Hence Q X (cid:48) Y (cid:48) is a distribution of the desired form.By Lemma 4, given ( Q W , Q X | W , Q Y | W , P W , P X | W , P Y | W ) , for any Q XY | W ∈ C ( Q X | W , Q Y | W ) , there exists P XY | W ∈ C ( P X | W , P Y | W ) such that (cid:107) P W XY − Q W XY (cid:107) ≤ (cid:107) P W − Q W (cid:107) + max w (cid:13)(cid:13) P X | W = w − Q X | W = w (cid:13)(cid:13) + max w (cid:13)(cid:13) P Y | W = w − Q Y | W = w (cid:13)(cid:13) . (104)Hence, for any sequence (cid:16) P ( k ) W , P ( k ) X | W , P ( k ) Y | W (cid:17) convergent to (cid:0) Q W , Q X | W , Q Y | W (cid:1) , lim sup k →∞ f (cid:16) P ( k ) W , P ( k ) X | W , P ( k ) Y | W (cid:17) ≤ f (cid:0) Q W , Q X | W , Q Y | W (cid:1) , and f (cid:0) Q W , Q X | W , Q Y | W (cid:1) ≤ lim inf k →∞ f (cid:16) P ( k ) W , P ( k ) X | W , P ( k ) Y | W (cid:17) . Hence f (cid:0) Q W , Q X | W , Q Y | W (cid:1) is continuous in (cid:0) Q W , Q X | W , Q Y | W (cid:1) . By an argument similar to the one below (84), it is easy to obtain Θ ( E , E ) ≥ Θ ∗ ( E , E ) . This completes the achievability proof.Converse part: We first provide a multiletter bound. Define Q X ( x ) := P nX ( x )1 { x ∈ A } P nX ( A ) , Q Y ( y ) := P nY ( y )1 { y ∈ B } P nY ( B ) .Then, similarly to (90), we have D ( Q X (cid:107) P nX ) = log P nX ( A ) . Combining this with P nX ( A ) ≥ e − nE yields n D ( Q X (cid:107) P nX ) ≤ E . (105)Similarly, we have n D ( Q Y (cid:107) P nY ) ≤ E . (106)On the other hand, for any (cid:101) Q X , Y ∈ C ( Q X , Q Y ) , similarly to (91), we have − n log P nXY ( A × B ) ≤ n D (cid:16) (cid:101) Q X , Y (cid:107) P nXY (cid:17) , which implies that − n log P nXY ( A × B ) ≤ inf (cid:101) Q X , Y ∈ C ( Q X ,Q Y ) n D (cid:16) (cid:101) Q X , Y (cid:107) P nXY (cid:17) . We now relax Q X , Q Y to any distributions satisfying (105) and (106). Then we obtain the following multiletterbound. Θ n ( E , E ) ≤ sup Q X ,Q Y : n D ( Q X (cid:107) P nX ) ≤ E , n D ( Q Y (cid:107) P nY ) ≤ E inf (cid:101) Q X , Y ∈ C ( Q X ,Q Y ) n D (cid:16) (cid:101) Q X , Y (cid:107) P nXY (cid:17) . (107)We next single-letterize the multiletter bound in (107) by using coupling techniques invented by the first authorand Tan [5]. Let J ∼ Q J := Unif [ n ] be a random time index. Without loss of optimality, we can rewrite (107) as the one with all the distributions in the supremization and infimization replaced by the corresponding conditionalversions (conditioned on J ) but which are restricted to be independent of the parameter J (i.e., the correspondingr.v.’s X and/or Y are independent of J ). In this way, we associate the r.v.’s X , Y with the time index J . Hence,we can write that n D ( Q X (cid:107) P nX ) = D (cid:0) Q X J | X J − J (cid:107) P X | Q X J − J (cid:1) = D (cid:0) Q X | UJ (cid:107) P X | Q UJ (cid:1) , and similarly, n D ( Q Y (cid:107) P nY ) = D (cid:0) Q Y | V J (cid:107) P Y | Q V J (cid:1) , where X := X J , Y := Y J , U := X J − , V := Y J − . (108)Substituting this notation into (107), we have Θ n ( E , E ) ≤ sup Q X ,Q Y : D ( Q X | UJ (cid:107) P X | Q UJ ) ≤ E ,D ( Q Y | V J (cid:107) P Y | Q V J ) ≤ E inf (cid:101) Q X , Y ∈ C ( Q X ,Q Y ) n D (cid:16) (cid:101) Q X , Y (cid:107) P nXY (cid:17) . (109)On the other hand, to single-letterize the minimization part in (109), we need the following “chain rule” forcoupling sets. Lemma 5 (“Chain Rule” for Coupling Sets) . [5, Lemma 9] For a pair of conditional distributions ( P X n | W , P Y n | W ) ,we have n (cid:89) i =1 C ( P X i | X i − W , P Y i | Y i − W ) ⊆ C ( P X n | W , P Y n | W ) , (110) where C ( P X i | X i − W , P Y i | Y i − W ) := (cid:8) Q X i Y i | X i − Y i − W : Q X i | X i − Y i − W = P X i | X i − W , Q Y i | X i − Y i − W = P Y i | Y i − W (cid:9) , i ∈ [ n ] (111) and n (cid:89) i =1 C ( P X i | X i − W , P Y i | Y i − W ) := (cid:40) n (cid:89) i =1 Q X i Y i | X i − Y i − W : Q X i Y i | X i − Y i − W ∈ C ( P X i | X i − W , P Y i | Y i − W ) , i ∈ [ n ] (cid:41) . (112) By this lemma, inf (cid:101) Q X , Y ∈ C ( Q X ,Q Y ) D (cid:16) (cid:101) Q X , Y (cid:107) P nXY (cid:17) ≤ inf (cid:101) Q X , Y ∈ (cid:81) ni =1 C ( Q Xi | Xi − ,Q Yi | Y i − ) D (cid:16) (cid:101) Q X , Y (cid:107) P nXY (cid:17) = inf (cid:101) Q XiYi | Xi − Y i − ∈ C ( Q Xi | Xi − ,Q Yi | Y i − ) ,i ∈ [ n ] n (cid:88) i =1 D i = inf (cid:101) Q X Y ∈ C ( Q X ,Q Y ) (cid:32) D + ... + inf (cid:101) Q Xn − Yn − | Xn − Y n − ∈ C ( Q Xn − | Xn − ,Q Yn − | Y n − ) (cid:18) D n − + inf (cid:101) Q XnYn | Xn − Y n − ∈ C ( Q Xn | Xn − ,Q Yn | Y n − ) D n (cid:19)(cid:33) (113) ≤ inf (cid:101) Q X Y ∈ C ( Q X ,Q Y ) (cid:32) D + ... + sup (cid:101) Q Xn − Y n − ∈ C ( Q Xn − ,Q Y n − ) inf (cid:101) Q Xn − Yn − | Xn − Y n − ∈ C ( Q Xn − | Xn − ,Q Yn − | Y n − ) (cid:18) D n − + sup (cid:101) Q Xn − Y n − ∈ C ( Q Xn − ,Q Y n − ) inf (cid:101) Q XnYn | Xn − Y n − ∈ C ( Q Xn | Xn − ,Q Yn | Y n − ) D n (cid:19)(cid:33) (114) = n (cid:88) i =1 sup (cid:101) Q Xi − Y i − ∈ C ( Q Xi − ,Q Y i − ) inf (cid:101) Q XiYi | Xi − Y i − ∈ C ( Q Xi | Xi − ,Q Yi | Y i − ) D i = n (cid:88) i =1 sup (cid:101) Q UiVi ∈ C ( Q Ui ,Q Vi ) inf (cid:101) Q XiYi | UiVi ∈ C ( Q Xi | Ui ,Q Yi | Vi ) D (cid:16) (cid:101) Q X i Y i | U i V i (cid:107) P XY | (cid:101) Q U i V i (cid:17) (115) = sup (cid:101) Q UiVi ∈ C ( Q Ui ,Q Vi ) ,i ∈ [ n ] inf (cid:101) Q XiYi | UiVi ∈ C ( Q Xi | Ui ,Q Yi | Vi ) ,i ∈ [ n ] n (cid:88) i =1 D (cid:16) (cid:101) Q X i Y i | U i V i (cid:107) P XY | (cid:101) Q U i V i (cid:17) = n sup (cid:101) Q UJVJ | J ∈ C ( Q UJ | J ,Q VJ | J ) inf (cid:101) Q XJYJ | UJVJJ ∈ C ( Q XJ | UJJ ,Q YJ | VJJ ) D (cid:16) (cid:101) Q X J Y J | U J V J J (cid:107) P XY | (cid:101) Q U J V J | J Q J (cid:17) (116) = n sup (cid:101) Q UV | J ∈ C ( Q U | J ,Q V | J ) inf (cid:101) Q XY | UV J ∈ C ( Q X | UJ ,Q Y | V J ) D (cid:16) (cid:101) Q XY | UV J (cid:107) P XY | (cid:101) Q UV | J Q J (cid:17) , (117)where D i := D (cid:16) (cid:101) Q X i Y i | X i − Y i − (cid:107) P XY | (cid:101) Q X i − Y i − (cid:17) ; (113) follows since the joint minimization is equivalent to thesequential minimizations; (114) follows since, by Lemma 5, (cid:101) Q X k Y k := (cid:81) ki =1 (cid:101) Q X i Y i | X i − Y i − with (cid:101) Q X i Y i | X i − Y i − ∈ C ( Q X i | X i − , Q Y i | Y i − ) is in the set C ( Q X k , Q Y k ) ; in (115) U i := X i − , V i := Y i − ; and in (117) U = U J , V = V J .Substituting (117) into (109), we obtain that Θ n ( E , E ) ≤ sup Q X ,Q Y : D ( Q X | UJ (cid:107) P X | Q UJ ) ≤ E ,D ( Q Y | V J (cid:107) P Y | Q V J ) ≤ E sup (cid:101) Q UV | J ∈ C ( Q U | J ,Q V | J ) inf (cid:101) Q XY | UV J ∈ C ( Q X | UJ ,Q Y | V J ) D (cid:16) (cid:101) Q XY | UV J (cid:107) P XY | (cid:101) Q UV | J Q J (cid:17) ≤ sup Q J ,Q U | J ,Q V | J ,Q X | UJ ,Q Y | V J : D ( Q X | UJ (cid:107) P X | Q UJ ) ≤ E ,D ( Q Y | V J (cid:107) P Y | Q V J ) ≤ E sup (cid:101) Q UV | J ∈ C ( Q U | J ,Q V | J ) inf (cid:101) Q XY | UV J ∈ C ( Q X | UJ ,Q Y | V J ) D (cid:16) (cid:101) Q XY | UV J (cid:107) P XY | (cid:101) Q UV | J Q J (cid:17) (118) = sup Q UV J Q X | UJ Q Y | V J : D ( Q X | UJ (cid:107) P X | Q UJ ) ≤ E ,D ( Q Y | V J (cid:107) P Y | Q V J ) ≤ E inf (cid:101) Q XY | UV J ∈ C ( Q X | UJ ,Q Y | V J ) D (cid:16) (cid:101) Q XY | UV J (cid:107) P XY | Q UV J (cid:17) (119) = sup Q UV J Q X | UJ Q Y | V J : D ( Q X | UV J (cid:107) P X | Q UV J ) ≤ E ,D ( Q Y | UV J (cid:107) P Y | Q UV J ) ≤ E inf (cid:101) Q XY | UV J ∈ C ( Q X | UV J ,Q Y | UV J ) D (cid:16) (cid:101) Q XY | UV J (cid:107) P XY | Q UV J (cid:17) (120) = sup Q WXY : D ( Q X | W (cid:107) P X | Q W ) ≤ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≤ E inf (cid:101) Q XY | W ∈ C ( Q X | W ,Q Y | W ) D (cid:16) (cid:101) Q XY | W (cid:107) P XY | Q W (cid:17) (121) = sup Q W ,Q X | W ,Q Y | W : D ( Q X | W (cid:107) P X | Q W ) ≤ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≤ E inf (cid:101) Q XY | W ∈ C ( Q X | W ,Q Y | W ) D (cid:16) (cid:101) Q XY | W (cid:107) P XY | Q W (cid:17) , (122)where (119), follows since the distribution tuple (cid:16) Q J , Q U | J , Q V | J , (cid:101) Q UV | J , Q X | UJ , Q Y | V J (cid:17) with (cid:101) Q UV | J ∈ C ( Q U | J , Q V | J ) and the joint distribution Q UV J Q X | UJ Q Y | V J are mutually determined by each other; (120) follows since Q X | UV J = Q X | UJ , Q Y | UV J = Q Y | V J , and C (cid:0) Q X | UV J , Q Y | UV J (cid:1) = C (cid:0) Q X | UJ , Q Y | V J (cid:1) ; and in (121), W := ( U, V, J ) .Observe that (122) can be rewritten as sup Q W ,Q X | W ,Q Y | W : D ( Q X | W (cid:107) P X | Q W ) ≤ E ,D ( Q Y | W (cid:107) P Y | Q W ) ≤ E (cid:88) w Q W ( w ) g (cid:0) Q X | W = w , Q Y | W = w (cid:1) , (123)where g (cid:0) Q X | W = w , Q Y | W = w (cid:1) := inf (cid:101) Q XY | W = w ∈ C ( Q X | W = w ,Q Y | W = w ) D (cid:16) (cid:101) Q XY | W = w (cid:107) P XY (cid:17) . (124)The infimum in (124) is in fact a minimum since given (cid:0) Q X | W = w , Q Y | W = w (cid:1) , C (cid:0) Q X | W = w , Q Y | W = w (cid:1) is a compactsubset of the probability simplex, and moreover, D (cid:16) (cid:101) Q XY | W = w (cid:107) P XY (cid:17) is continuous in (cid:101) Q XY | W = w . For thesupremum in (123), by the support lemma [8], without loss of optimality, the alphabet size of W can be assumed tobe no larger than . When the alphabet size of W is assumed to be no larger than , the region of feasible solutionsin (123) is compact. Moreover, observe that the objective function in (123) is continuous in (cid:0) Q W , Q X | W , Q Y | W (cid:1) .Therefore, the supremum in (123) is in fact a maximum. Remark . The proof of the converse part here is almost same as that for the common information in [5]. However,it is somewhat surprising that in the proof here, the combination of two suprema in (118) leads to the tight converse,which seems impossible for the common information problem in [5].R
EFERENCES [1] F. J. MacWilliams and N. J. A. Sloane.
The Theory of Error-Correcting Codes , volume 16. Elsevier, 1977.[2] L. Yu and V. Y. F. Tan. On non-interactive simulation of binary random variables.
IEEE Trans. Inf. Theory , 2021.[3] T. S. Han and K. Kobayashi. Maximal rectangular subsets contained in the set of partially jointly typical sequences for dependentrandom variables.
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete , 70(1):15–32, 1985.[4] O. Ordentlich, Y. Polyanskiy, and O. Shayevitz. A note on the probability of rectangles for correlated binary strings.
IEEE Trans. Inf.Theory , 2020.[5] L. Yu and V. Y. F. Tan. On exact and ∞ -Rényi common informations. IEEE Trans. Inf. Theory , 66(6):3366–3406, 2020.[6] Y. Polyanskiy and A. Samorodnitsky. Improved log-sobolev inequalities, hypercontractivity and uncertainty principle on the hypercube.
Journal of Functional Analysis , 277(11):108280, 2019.[7] N. Kirshner and A. Samorodnitsky. A moment ratio bound for polynomials and some extremal properties of Krawchouk polynomialsand Hamming spheres. arXiv preprint arXiv:1909.11929 , 2019.[8] A. El Gamal and Y.-H. Kim.
Network Information Theory . Cambridge University Press, 2011.[9] I. Csiszar and J. Körner.
Information Theory: Coding Theorems for Discrete Memoryless Systems . Cambridge University Press, 2011.[10] L. Yu and V. Y. F. Tan. Rényi resolvability and its applications to the wiretap channel.
IEEE Trans. Inf. Theory , 65(3):1862–1897,2018.[11] E. Mossel, R. O’Donnell, O. Regev, J. E. Steif, and B. Sudakov. Non-interactive correlation distillation, inhomogeneous Markov chains,and the reverse Bonami-Beckner inequality.
Israel Journal of Mathematics , 154(1):299–336, 2006.[12] R. O’Donnell.
Analysis of Boolean Functions . Cambridge University Press, 2014.[13] E. Nelson. The free markoff field.
Journal of Functional Analysis , 12(2):211–227, 1973.[14] C. Borell. Positivity improving operators and hypercontractivity.
Mathematische Zeitschrift , 180(3):225–234, 1982.[15] S. Kamath and V. Anantharam. Non-interactive simulation of joint distributions: the hirschfeld-gebelein-rényi maximal correlation andthe hypercontractivity ribbon. In
Fiftieth Annual Allerton Conference , pages 1057–1064. IEEE, 2012.[16] Chandra Nair. Equivalent formulations of hypercontractivity using information measures. In
International Zurich Seminar , 2014.[17] S. Kamath and V. Anantharam. On non-interactive simulation of joint distributions.
IEEE Trans. Inf. Theory , 62(6):3419–3435, 2016.[18] S. Kamath. Reverse hypercontractivity using information measures. In , pages 627–633. IEEE, 2015.[19] P. Gács and J. Körner. Common information is far less than mutual information.
Problems of Control and Information Theory ,2(2):149–162, 1973. [20] H. S. Witsenhausen. On sequences of pairs of dependent random variables. SIAM Journal on Applied Mathematics , 28(1):100–113,1975.[21] F.-W. Fu, V. K. Wei, and R. W. Yeung. On the minimum average distance of binary codes: Linear programming approach.
DiscreteApplied Mathematics , 111(3):263–281, 2001.[22] L. Yu and V. Y. F. Tan. An improved linear programming bound on the average distance of a binary code. arXiv preprintarXiv:1910.09416 , 2019.[23] J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. In , pages 68–80. IEEE, 1988.[24] E. Mossel and R. O’Donnell. Coin flipping from a cosmic source: On error correction of truly random bits.
Random Structures &Algorithms , 26(4):418–436, 2005.[25] P. Austrin, P. Kaski, M. Koivisto, and J. Nederlof. Sharper upper bounds for unbalanced uniquely decodable code pairs.
IEEE Trans.Inf. Theory , 64(2):1368–1373, 2017.[26] C. Greene and T. L. Magnanti. Some abstract pivot algorithms.
SIAM Journal on Applied Mathematics , 29(3):530–539, 1975.[27] T. M. Cover and J. A. Thomas.
Elements of Information Theory . Wiley-Interscience, 2nd edition, 2006.[28] L. Yu. Asymptotics of Strassen’s optimal transport problem. arXiv preprint arXiv:1912.02051arXiv preprint arXiv:1912.02051