Lower bound on Wyner's Common Information
aa r X i v : . [ c s . I T ] F e b Lower bound on Wyner’s Common Information
Erixhen Sula
École Polytechnique Fédéralede LausanneLausanne, Switzerlanderixhen.sula@epfl.ch
Michael Gastpar
École Polytechnique Fédéralede LausanneLausanne, Switzerlandmichael.gastpar@epfl.ch
Abstract —An important notion of common information be-tween two random variables is due to Wyner. In this paper,we derive a lower bound on Wyner’s common information forcontinuous random variables. The new bound improves on theonly other general lower bound on Wyner’s common information,which is the mutual information. We also show that the newlower bound is tight for the so-called “Gaussian channels” case,namely, when the joint distribution of the random variables canbe written as the sum of a single underlying random variableand Gaussian noises. We motivate this work from the recentvariations of Wyner’s common information and applications tonetwork data compression problems such as the Gray-Wynernetwork.
I. I
NTRODUCTION
Extracting and assessing common features amongst multiplevariables is a natural task occurring in many different problemsettings. Wyner’s common information [1] provides one an-swer to this, which was originally defined for finite alphabetsas follows C ( X ; Y ) = inf W : X − W − Y I ( X, Y ; W ) . (1)For a pair of random variables, it seeks to find the mostcompact third variable that makes the pair conditionally in-dependent. Compactness is measured in terms of the mutualinformation between the pair and the third variable. In [1],Wyner also identifies two operational interpretations. The firstconcerns a source coding network often referred to as theGray-Wyner network. For this scenario, Wyner’s commoninformation characterizes the smallest common rate requiredto enable two decoders to recover X and Y, respectively,in a lossless fashion. The second operational interpretationpertains to the distributed simulation of common randomness.Here, Wyner’s common information characterizes the smallestnumber of random bits that need to be shared between theprocessors. In subsequent work, Wyner’s common informa-tion was extended to continuous random variables and wascomputed for a pair of Gaussian random variables [2], [3]and for a pair of additive “Gaussian channel” distributions[4]. Other related works include [5], [6]. Wyner’s common in-formation has many applications, including to communicationnetworks [1], to caching [7, Section III.C], to source coding[8], and to feature extraction [9].In this paper, we derive a new lower bound on Wyner’scommon information for continuous random variables. Theproof is based on a method known as factorization of convex envelopes, which was originally introduced in [10]. The proofstrategy is fundamentally different from the techniques thatwere used to solve Wyner’s original common informationproblem. Specifically, for the latter, the generic approach is tofirst characterize the class of variables that enable conditionalindependence, and then inside this class to find the optimalvariable. By contrast, we lower bound the Wyner’s commoninformation problem by a convex problem, which we can thensolve via optimizing.We illustrate the promise of the new lower bound by consid-ering Gaussian mixture distributions and Laplace distributions.We also establish that the new lower bound is tight for asimple case of the so-called “Gaussian channels” distribution.Here, X and Y can be written as the sum of a single arbitraryrandom variable and jointly Gaussian noises. We note that forthis special case, Wyner’s common information was previouslyfound, using different methods, in [4].We use the following notation. Random variables are de-noted by uppercase letters X, Y, Z and their realizations bylowercase letters x, y, z . For the cross-covariance matrix of X and Y , we use the shorthand notation K XY , and for thecovariance matrix of a random vector X we use the shorthandnotation K X := K XX . Let p X ( x ) denote the probabilitydensity function of random variable X at realisation x . Let N ( m, σ ) be the Gaussian probability density function withmean m and variance σ .II. M AIN R ESULT
Here we present our lower bound on Wyner’s commoninformation. The bound is given in terms of the entropy of thepair, entropy and Wyner’s common information for Gaussianrandom variables. The theorem says:
Theorem 1:
Let ( X, Y ) have probability density function p ( X,Y ) that satisfy the covariance constraint K ( X,Y ) . Let, ( X g , Y g ) ∼ N (0 , K ( X,Y ) ) , then C ( X ; Y ) ≥ max { C ( X g ; Y g ) + h ( X, Y ) − h ( X g , Y g ) , } . (2)where C ( X g ; Y g ) = 12 log 1 + | ρ | − | ρ | , (3)and ρ is the correlation coefficient between X and Y .The proof is given in Section V. A similar argument isused for the max-entropy bound where the probability densityunctions have covariance constraints. Interestingly, once weplug in Gaussian random variables and additive “Gaussianchannel” distributions, then the bound is attained with equality. Remark 1:
In [1], it is showed that C ( X ; Y ) ≥ I ( X ; Y ) . InSection III-IV we show that our lower bound from Theorem1 can be tighter. Remark 2:
The bound of Theorem 1 can be expressedequivalently as C ( X ; Y ) ≥ C ( X g ; Y g ) − D (cid:0) p ( X,Y ) (cid:13)(cid:13) p ( X g ,Y g ) (cid:1) . (4) Remark 3:
The bound of Theorem 1 can be negative (if notfor the correction). If we choose X and Y to be independent,then X g and Y g will be independent as well. Thus, the boundin (4) becomes C ( X ; Y ) ≥ − D (cid:0) p X (cid:13)(cid:13) p X g (cid:1) − D (cid:0) p Y (cid:13)(cid:13) p Y g (cid:1) , (5)that is a negative bound from the positivity of the Kullback-Leibler divergence.In the latter section, we provide pairs of random variable andcompute our lower bounds on Wyner’s common informationto verify the usefulness of the derived bound.III. A DDITIVE “G AUSSIAN C HANNEL ” D
ISTRIBUTIONS
In this section, we consider the distributions that are de-scribed as follows. Let ( ˆ X, ˆ Y ) be a Gaussian distribution withmean zero and covariance matrix K ( ˆ X, ˆ Y ) = (cid:18) ρ ˆ ρ (cid:19) . (6)Then, we consider the two-dimensional source given by (cid:18) XY (cid:19) = (cid:18) ˆ X ˆ Y (cid:19) + (cid:18) AB (cid:19) . (7)Let ( A, B ) be arbitrary random variables with mean zero andcovariance K ( A,B ) = (cid:18) σ A rσ A σ B rσ A σ B σ B (cid:19) , (8)where σ A = σ B and ( A, B ) is independent of the pair ( ˆ X, ˆ Y ) .For this particular distribution, we evaluate our lower boundin (1) and also provide an upper bound. A. Lower Bound
We have that E [ X ] = E [ Y ] = 0 and E [ X ] = E [ ˆ X ] + E [ A ] = 1 + σ A , (9) E [ XY ] = E [ ˆ X ˆ Y ] + E [ AB ] = ˆ ρ + rσ A . (10)By symmetry E [ Y ] = E [ X ] and ρ = E [ XY ] p E [ X ] E [ Y ] = ˆ ρ + rσ A σ A . (11) Therefore, the formula given in Theorem 1 evaluates to C ( X ; Y ) ≥ C ( X g ; Y g ) + h ( X, Y ) − h ( X g , Y g ) (12) = 12 log 1 + ρ − ρ + h ( X, Y ) −
12 log (2 πe ) (cid:0) (1 + σ A ) − (ˆ ρ + rσ A ) (cid:1) (13) = h ( X, Y ) − log (cid:0) πe (1 − ˆ ρ + (1 − r ) σ A (cid:1) . (14)where (13) follows from substituting for K ( X,Y ) and (14)follows from substituting for ρ computed in (11). B. Upper Bound
Next we give an upper bound on Wyner’s common infor-mation for the example of this section. To accomplish this,rewrite the pair ( ˆ X, ˆ Y ) as ˆ X = p ˆ ρV + Z x , ˆ Y = p ˆ ρV + Z y , (15)where V, Z x , Z y are mutually independent, V ∼ N (0 , and Z x , Z y ∼ N (0 , − ˆ ρ ) . Then, a valid choice to make X and Y conditionally independent on W is W = ( √ ˆ ρV + A, √ ˆ ρV + B ) . By combining (7) and (15) we can rewrite the pair ( X, Y ) as X = p ˆ ρV + A + Z x ,Y = p ˆ ρV + B + Z y , (16)where W is independent of Z x and Z y . So we have I ( X ; Y | W )= I ( p ˆ ρV + A + Z x ; p ˆ ρV + B + Z y | W ) (17) = I ( Z x ; Z y | W ) (18) = I ( Z x ; Z y ) (19) = 0 , (20)where (18) follows by subtracting the parts that are in theconditioning by recalling that W = ( √ ˆ ρV + A, √ ˆ ρV + B ) ,(19) follows from independence of W and ( Z x , Z y ) and (20)follows from the independence of Z x and Z y .Thus, the upper bound is C ( X ; Y ) ≤ I ( X, Y ; W ) (21) = h ( X, Y ) − h ( p ˆ ρV + A + Z x , p ˆ ρV + B + Z y | W ) (22) = h ( X, Y ) − h ( Z x , Z y | W ) (23) = h ( X, Y ) − h ( Z x , Z y ) (24) = h ( X, Y ) − log (2 πe (1 − ˆ ρ )) . (25)where (21) follows from the definition of C ( X ; Y ) where W satisfies X − W − Y , (22) follows by rewriting the mutualinformation, (23) follows from subtracting the parts that arein the conditioning and (24) follows from independence of W and ( Z x , Z y ) . . Example 1Lemma 1: For the additive “Gaussian channel” distributionsdescribed in (7) and A = B , we have C ( X ; Y ) = h ( X, Y ) − log (2 πe (1 − ˆ ρ )) . (26)The proof follows from the fact that the lower bound (14)and upper bound (25) coincide when A = B , which means r = 1 . The same result is derived by a different approach in[4]. To illustrate Lemma 1, let A be binary ± σ A with uniformprobability. Then, we get Figure 1. σ A . C ( X ; Y ) I ( X ; Y ) Fig. 1. The o-line is the exact Wyner’s common information C ( X ; Y ) forthe specified Gaussian mixture distribution. The dashed line is the mutualinformation I ( X ; Y ) . In this setup we plot C ( X ; Y ) and I ( X ; Y ) in natsversus σ A for ˆ ρ = 0 . . D. Example 2
Another example is to choose ( A, B ) be doubly symmet-ric binary distribution where p ( A,B ) ( A = B = σ A ) = p ( A,B ) ( A = B = − σ A ) = r and p ( A,B ) ( A = − B = σ A ) = p ( A,B ) ( A = − B = − σ A ) = − r . Note that for thesechoices, the covariance matrix of A and B is given by Equation(8). If we select A = B or r = 1 , this model is precisely themodel studied in Example 1. A numerical evaluation is shownin Figure 2. σ A . . . − . . . Lower bound from Theorem 1 I ( X ; Y ) Upper Bound from Section III-B
Fig. 2. The ∗ -line is the lower bound on C ( X ; Y ) from Theorem 1 andthe ⋄ -line is the upper bound on C ( X ; Y ) from Section III-B. The dashedline is the mutual information I ( X ; Y ) . In this setup we plot the bounds on C ( X ; Y ) in nats versus σ A for ˆ ρ = 0 . and r = 0 . . IV. L
APLACE D ISTRIBUTIONS
In this section, we consider the case when ( X, Y ) isdistributed according to the bivariate Laplace distributiondescribed [11, Section 5.1.3] by p ( X,Y ) ( x, y ) = 1 π p − ρ ℓ K s x − ρ ℓ xy + y )1 − ρ ℓ ! , (27)where K is the modified Bessel function of the second kinddescribed by K ( z ) = 12 Z ∞−∞ e izt √ t + 1 dt. (28)The variances of X and Y are unity and the correlationcoefficient is ρ ℓ . Define the entropy power of ( X, Y ) as N ( X, Y ) = πe exp( h ( X, Y )) . (29)Then, the bound of Theorem 1 can be expressed as C ( X ; Y ) ≥ log N ( X, Y )1 − ρ ℓ . (30)Computation of the joint entropy h ( X, Y ) as well as themutual information I ( X ; Y ) leads to the curves in Figure 3,further illustrating the potential of the new bound. ρ ℓ . . . . . . . Lower bound from Theorem 1 I ( X ; Y ) Fig. 3. The ∗ -line is the lower bound on C ( X ; Y ) from Theorem 1 andthe dashed line is the mutual information I ( X ; Y ) for the described Laplacedistribution. In this setup we plot the bounds on C ( X ; Y ) in nats versus ρ ℓ . V. P
ROOF OF T HEOREM A. PreliminaryTheorem 2 (Theorem 2 in [12]):
For K (cid:23) , < λ < ,there exists a (cid:22) K ′ (cid:22) K and ( X ′ , Y ′ ) ∼ N (0 , K ′ ) such that ( X, Y ) have distribution p ( X,Y ) with covariance constraint K ,the following inequality holds inf W h ( Y | W )+ h ( X | W ) − (1 + λ ) h ( X, Y | W ) ≥ h ( Y ′ ) + h ( X ′ ) − (1 + λ ) h ( X ′ , Y ′ ) . (31) Proof 1:
The theorem is a consequence of [12, Theorem 2],for a specific choice of p = λ + 1 . The proof regarding thexistence of the infimum that is missing in [12] is given in[13].Before we jump into details it is important to realise that inf W h ( Y | W ) + h ( X | W ) − (1 + λ ) h ( X, Y | W ) is indeed thelower convex envelope of h ( Y ) + h ( X ) − (1 + λ ) h ( X, Y ) bythinking of W as a time sharing random variable. In otherwords, we are taking the infimum over all convex envelopessuch that for a covariance constraint on the pair ( X, Y ) itsatisfies the following inf ( X,Y ) inf W h ( Y | W ) + h ( X | W ) − (1 + λ ) h ( X, Y | W )= inf ( X,Y ) h ( Y ) + h ( X ) − (1 + λ ) h ( X, Y ) . (32)The next lemma that is an optimization problem on thecovariance matrix constraint for Gaussian random variables isneeded for the proof of the theorems. Lemma 2:
For ( X ′ , Y ′ ) ∼ N (0 , K ′ ) , the following inequal-ity holds min K ′ :0 (cid:22) K ′ (cid:22) ρρ h ( X ′ ) + h ( Y ′ ) − (1 + λ ) h ( X ′ , Y ′ ) ≥
12 log 11 − λ − λ πe ) (1 − ρ ) (1 + λ )1 − λ , (33)where λ ≤ ρ . Proof 2:
The proof outline is given in Appendix A. For thefull proof, refer to [13].
B. Lower bound on (relaxed) Wyner’s common information
Here, we consider a slightly more general case that is, wegive a lower bound on relaxed Wyner’s common informationin Theorem 3. Thus, as a special case we obtain Theorem 1.Let us define the relaxed Wyner’s common information as in[13], [14]. For jointly continuous random variables X and Y with joint distribution p ( x, y ) , we define C γ ( X ; Y ) = inf W : I ( X ; Y | W ) ≤ γ I ( X, Y ; W ) , (34)where the constraint of conditional independence is relaxedinto an upper bound on the conditional mutual information. For γ = 0 , we have C ( X ; Y ) = C ( X ; Y ) , the standard Wyner’scommon information. A lower bound on relaxed Wyner’scommon information is given in the following theorem. Theorem 3:
Let ( X, Y ) have probability density functions p ( X,Y ) that satisfy the covariance constraint K ( X,Y ) . Let, ( X g , Y g ) ∼ N (0 , K ( X,Y ) ) , then C γ ( X ; Y ) ≥ max { C γ ( X g ; Y g ) + h ( X, Y ) − h ( X g , Y g ) , } , (35)where C γ ( X g ; Y g ) = 12 log + | ρ | − | ρ | · − √ − e − γ √ − e − γ ! , (36)and ρ is the correlation coefficient between X and Y . Proof 3:
Note that the mean of the random variables doesnot affect the Wyner’s common information and its relaxed variant thus, we assume mean zero for both X and Y . Also, therelaxed Wyner’s common information is invariant to scaling of X and Y . Thus, without loss of generality we assume X and Y to be mean zero, unity variance and correlation coefficient ρ , so we proceed as follows C γ ( X ; Y )= inf W : I ( X ; Y | W ) ≤ γ I ( X, Y ; W ) (37) ≥ inf W (1 + µ ) I ( X, Y ; W ) − µI ( X ; W ) − µI ( Y ; W )+ µI ( X ; Y ) − µγ (38) = µ inf W h ( X | W ) + h ( Y | W ) − (1 + 1 µ ) h ( X, Y | W )+ h ( X, Y ) − µγ (39) ≥ µ min K ′ :0 (cid:22) K ′ (cid:22) ρρ h ( X ′ ) + h ( Y ′ ) − (1 + 1 µ ) h ( X ′ , Y ′ )+ h ( X, Y ) − µγ (40) ≥ h ( X, Y ) + µ µ µ − −
12 log (2 πe ) (1 − ρ ) ( µ + 1) µ − − µγ (41) ≥ h ( X, Y ) − h ( X g , Y g ) + C ( X g ; Y g ) (42)where (38) follows from weak duality and the bound is validfor all µ ≥ ; (39) follows from simplification; (40) followsfrom Theorem 2 under the assumption that µ > where ( X ′ , Y ′ ) ∼ N (0 , K ′ ) ; (41) follows from Lemma 2 underthe assumption µ ≥ ρ and (42) follows by maximizing thefunction g ( µ ) = h ( X, Y ) − µγ + µ µ µ − −
12 log (2 πe ) (1 − ρ ) ( µ + 1) µ − , (43)for µ ≥ ρ . Now we need to solve max µ ≥ ρ g ( µ ) . The function g is concave in µ , ∂ g∂µ = − µ ( µ − < , (44)and by studying the monotonicity we obtain ∂g∂µ = −
12 log µ − µ − γ. (45)Since the function is concave, the maximum is attained whenthe first derivative vanishes. That leads to the optimal solution µ ∗ = √ − e − γ , where µ ∗ has to satisfy µ ∗ ≥ ρ . Substitutingfor the optimal solution we get C γ ( X ; Y ) ≥ g (cid:18) √ − e − γ (cid:19) (46) = h ( X, Y ) − h ( X g , Y g ) + C γ ( X g ; Y g ) . (47)I. V ECTOR W YNER ’ S COMMON INFORMATION
It is well-known that for n independent pairs of randomvariables, we have C ( X n ; Y n ) = n X i =1 C ( X i ; Y i ) . (48)For the proof see [13, Lemma 2] by letting γ = 0 .By making use of Theorem 1 and (48) we can lower boundthe Wyner’s common information for n independent pairs ofrandom variables as C ( X n ; Y n ) ≥ n X i =1 C ( X g i ; Y g i ) + h ( X i , Y i ) − h ( X g i , Y g i ) . (49)An interesting problem is finding a bound for arbitrary ( X n , Y n ) , for any dependencies between X n and Y n . This isnot studied here and is left for future investigation.A PPENDIX AP ROOF O UTLINE OF L EMMA K ′ as K ′ = (cid:18) σ X qσ X σ Y qσ X σ Y σ Y (cid:19) (cid:23) .By substituting we obtain min K ′ :0 (cid:22) K ′ (cid:22) ρρ h ( X ′ ) + h ( Y ′ ) − (1 + λ ) h ( X ′ , Y ′ )= min ( σ X ,σ Y ,q ) ∈A ρ
12 log (2 πe ) σ X σ Y (50) − λ πe ) σ X σ Y (1 − q ) (51)where the set A ρ is A ρ = (cid:26) ( σ X , σ Y , q ) : (cid:18) σ X − qσ X σ Y − ρqσ X σ Y − ρ σ Y − (cid:19) (cid:22) (cid:27) . (52)Another way of rewriting A ρ is A ρ = n ( σ X , σ Y , q ) : σ X + σ Y ≤ , (1 − q ) σ X σ Y +2 ρqσ X σ Y +1 − ρ − ( σ X + σ Y ) ≥ o . (53)Let us define B ρ = n ( σ X , σ Y , q ) : σ X σ Y ≤ , (1 − q ) σ X σ Y +2 ρqσ X σ Y +1 − ρ − σ X σ Y ) ≥ o , (54)and the inequality σ X + σ Y ≥ σ X σ Y , implies that A ρ ⊆ B ρ .By reparametrizing σ = σ X σ Y , the set B ρ becomes D ρ = n ( σ , q ) : σ ≤ , ( σ (1 − q ) − ρ )( σ (1+ q ) − − ρ ) ≥ o . (55)The set D ρ is rewritten as D ρ = n ( σ , q ) : for ρ ≥ q, σ (1 − q ) ≤ − ρ for ρ 12 log (2 πe ) σ X σ Y (57) − λ πe ) σ X σ Y (1 − q ) ≥ min ( σ ,q ) ∈D ρ f ( λ, σ , q ) (58)where, f ( λ, σ , q ) = 12 log (2 πe ) σ − λ πe ) σ (1 − q ) . (59) • Let us consider the case ρ ≥ q for ρ is positive. Then,by weak duality we have min ( σ ,q ) ∈D ρ f ( λ, σ , q ) ≥ min σ ,q f ( λ, σ , q ) + µ ( σ (1 − q ) − ρ ) , (60)for any µ ≥ . By applying Karush-Kuhn-Tucker (KKT)conditions on (60) we get ∂∂σ = − λσ + µ (1 − q ) = 0 , (61) ∂∂q = (1 + λ ) q − q − µσ = 0 , (62) µ ( σ (1 − q ) − ρ )) = 0 . (63)The optimal solutions to satisfy the KKT conditions are q ∗ = λ, µ ∗ = λ − ρ , σ ∗ = 1 − ρ − λ . (64)Since the KKT conditions are satisfied by q ∗ , σ ∗ and µ ∗ then strong duality holds, thus min ( σ ,q ) ∈D ρ f ( λ, σ , q ) (65) = max µ min σ ,q f ( λ, σ , q ) + µ ( σ (1 − q ) − ρ )) (66) = f ( λ, − ρ − λ , λ ) (67) = 12 log 11 − λ − λ πe ) (1 − ρ ) (1 + λ )1 − λ . (68)By combining (51), (58), (65) and (68) we get the desiredlower bound. • For the case ρ < q we omit the details due to lack ofspace. The optimal solutions are q ∗ = ρ, σ ∗ = 1 + ρ q . (69)To conclude we show that f ( λ, − ρ − λ , λ ) ≤ f ( λ, , ρ ) for λ ≤ ρ . The argument goes through also for the case when ρ is negative, which completes the proof.A CKNOWLEDGMENT This work was supported in part by the Swiss NationalScience Foundation under Grant 169294. EFERENCES[1] A. Wyner, “The common information of two dependent random vari-ables,” IEEE Transactions on Information Theory , vol. 21, no. 2, pp.163–179, March 1975.[2] G. Xu, W. Liu, and B. Chen, “Wyner’s common information forcontinuous random variables - a lossy source coding interpretation,” in Annual Conference on Information Sciences and Systems , Baltimore,MD, USA, March 2011.[3] ——, “A lossy source coding interpretation of Wyner’s common infor-mation,” IEEE Transactions on Information Theory , vol. 62, no. 2, pp.754–768, 2016.[4] P. Yang and B. Chen, “Wyner’s common information in Gaussianchannels,” in IEEE International Symposium on Information Theory ,Honolulu, HI, USA, 2014, pp. 3112–3116.[5] G. O. Veld and M. Gastpar, “Total correlation of Gaussian vectorsources on the Gray-Wyner network,” in Annual Allerton Conferenceon Communication, Control, and Computing (Allerton) , Monticello, IL,USA, September 2016.[6] A. Lapidoth and M. Wigger, “Conditional and relevant common infor-mation,” in IEEE International Conference on the Science of ElectricalEngineering (ICSEE) , Eilat, Israel, November 2016.[7] C.-Y. Wang, S. H. Lim, and M. Gastpar, “Information-theoretic caching:Sequential coding for computing,” IEEE Transactions on InformationTheory , vol. 62, no. 11, pp. 6393 – 6406, August 2016.[8] S. Satpathy and P. Cuff, “Gaussian secure source coding and Wyner’scommon information,” in IEEE International Symposium on InformationTheory (ISIT) , Hong Kong, China, October 2015.[9] E. Sula and M. Gastpar, “Common information components analysis,” Entropy Special Issue on The Role of Signal Processing and InformationTheory in Modern Machine Learning , vol. 23, no. 2, 2021.[10] Y. Geng and C. Nair, “The capacity region of the two-receiver Gaussianvector broadcast channel with private and common messages,” IWCIT ,vol. 60, no. 4, April 2014.[11] S. Kotz, T. J. Kozubowski, and K. Podgórski, The Laplace Distributionand Generalizations . Birkhäuser, Boston, MA, 2001.[12] C. Nair, “An extremal inequality related to hypercontractivity of Gaus-sian random variables,” in Proceedings of the Information Theory andApplications Workshop (ITA) , San Diego, CA, USA, February 2014, pp.1–7.[13] E. Sula and M. Gastpar, “On Wyner’s common information in theGaussian case,” CoRR , vol. abs/1912.07083, 2019. [Online]. Available:http://arxiv.org/abs/1912.07083[14] M. Gastpar and E. Sula, “Relaxed Wyner’s common information,” in