[PDF] Entropy under disintegrations

Abstract

We consider the differential entropy of probability measures absolutely continuous with respect to a given \sigma-finite reference measure on an arbitrary measurable space. We state the asymptotic equipartition property in this general case; the result is part of the folklore but our presentation is to some extent novel. Then we study a general framework under which such entropies satisfy a chain rule: disintegrations of measures. We give an asymptotic interpretation for conditional entropies in this case. Finally, we apply our result to Haar measures in canonical relation.

Full PDF

aa r X i v : . [ c s . I T ] F e b Entropy under disintegrations ⋆ Juan Pablo Vigneaux [email protected]

Abstract.

We consider the diﬀerential entropy of probability measuresabsolutely continuous with respect to a given σ -ﬁnite reference measureon an arbitrary measurable space. We state the asymptotic equipartitionproperty in this general case; the result is part of the folklore but ourpresentation is to some extent novel. Then we study a general frame-work under which such entropies satisfy a chain rule: disintegrations ofmeasures. We give an asymptotic interpretation for conditional entropiesin this case. Finally, we apply our result to Haar measures in canonicalrelation. Keywords:

Generalized entropy · Diﬀerential entropy · AEP · Concen-tration of measure · Chain rule · Disintegration · Topological group · Haar measure.

It is part of the “folklore” of information theory that given any measurable space( E, B ) with a σ -ﬁnite measure µ , and a probability measure ρ on E that isabsolutely continuous with respect to µ (i.e. ρ ≪ µ ), one can deﬁne a diﬀerentialentropy S µ ( ρ ) = − R E log( d ρ d µ ) d ρ that gives the exponential growth rate of the µ ⊗ n -volume of a typical set of realizations of ρ ⊗ n . Things are rarely treated atthis level of generality in the literature, so the ﬁrst purpose of this article is tostate the asymptotic equipartition property (AEP) for S µ ( ρ ). This constitutesa uniﬁed treatment of the discrete and euclidean cases, which shows (again)that the diﬀerential entropy introduced by Shannon is not an unjustiﬁed ad hoc device as some claim.Then we concentrate on a question that has been largely neglected: what isthe most general framework in which one can make sense of the chain rule? Thisis at least possible for any disintegration of a measure [2]. Deﬁnition 1 (Disintegration).

Let T : ( E, B ) → ( E T , B T ) be a measurablemap, ν a σ -ﬁnite measure on ( E, B ) , and ξ a σ -ﬁnite measure on ( E T , B T ) .The measure ν has a disintegration { ν t } t ∈ E T with respect to T and ξ , or a ( T, ξ ) -disintegration, if1. ν t is a σ -ﬁnite measure on B concentrated on { T = t } , which means that ν t ( T = t ) = 0 for ξ -almost every t ; ⋆ Parts of this work were written while the author worked at the IMJ-PRG in Parisand at the Max Planck Institute for Mathematics in the Sciences in Leipzig. J. P. Vigneaux

2. for each measurable nonnegative function f : E → R ,(a) t R E f d ν t is measurable,(b) R E f d ν = R E T (cid:0)R E f ( x ) d ν t ( x ) (cid:1) d ξ ( t ) . We shall see that if the reference measure µ has a ( T, ξ )-disintegration { µ t } t ∈ E T ,then any probability ρ absolutely continuous with respect to it has a ( T, T ∗ ρ )-disintegration; each ρ t is absolutely continuous with respect to µ t , and its densitycan be obtained normalizing the restriction of d ρ d µ to { T = t } . Moreover, the fol-lowing chain rule holds: S µ ( ρ ) = S ξ ( T ∗ ρ ) + Z E T S µ t ( ρ t ) d T ∗ ρ ( t ) . (1)We study the meaning of R E T S µ t ( ρ t ) d T ∗ ρ ( t ) in terms of asymptotic volumes.Finally, we show that our generalized chain rule can be applied to Haar measuresin canonical relation. Let ( E X , B ) be a measurable space, supposed to be the range of some randomvariable X , and let µ be a σ -ﬁnite measure µ on it. In applications, severalexamples appear:1. E X a countable set, B the corresponding atomic σ -algebra, and µ the count-ing measure;2. E X euclidean space, B its Borel σ -algebra, and µ the Lebesgue measure;3. More generally: E X a locally compact topological group, B its Borel σ -algebra, and µ some Haar measure;4. ( E X , B ) arbitrary and µ a probability measure on it, that might be a priorin a Bayesian setting or an initial state in a physical/PDE setting.The reference measure µ gives the relevant notion of volume.Let ρ is a probability measure on ( E X , B ) absolutely continuous with respectto µ , and f a representative of the Radon-Nikodym derivative d ρ d µ ∈ L ( E X , µ X ).The generalized diﬀerential entropy of ρ with respect to (w.r.t.) µ is deﬁned as S µ ( ρ ) := E ρ (cid:18) − ln d ρ d µ (cid:19) = − Z E X f ( x ) log f ( x ) d µ ( x ) . (2)This was introduced by Csisz´ar in [5], see also Eq. (8) in [7]. Remark that theset where f = 0, hence log( f ) = −∞ , is ρ -negligible.Let { X i : ( Ω, F , P ) → ( E X , B , µ ) } i ∈ N be a collection of i.i.d random variableswith law ρ . The density of the joint variable ( X , ..., X n ) w.r.t. µ ⊗ n is given by f X ,...,X n ( x , ..., x n ) = Q ni =1 f ( x i ). If the Lebesgue integral in (2) is ﬁnite, then − n log f X ,...,X n ( X , ..., X n ) → S µ ( ρ ) (3) ntropy under disintegrations 3 P -almost surely (resp. in probability) as a consequence of the strong (resp. weak)law of large numbers. The convergence in probability is enough to establish thefollowing result. Proposition 1 (Asymptotic Equipartition Property).

Let ( E X , B , µ ) be a σ -ﬁnite measure space, and ρ a probability measure on ( E X , B ) such that ρ ≪ µ and S µ ( ρ ) is ﬁnite. For every δ > , set A ( n ) δ ( ρ ; µ ) := (cid:26) ( x , ..., x n ) ∈ E nX (cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) − n log f X ,...,X n ( X , ..., X n ) − S µ ( ρ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:27) . Then,1. for every ε > , there exists n ∈ N such that, for all n ≥ n , P (cid:16) A ( n ) δ ( ρ ; µ ) (cid:17) > − ε ;

2. for every n ∈ N , µ ⊗ n ( A ( n ) δ ( ρ ; µ )) ≤ exp { n ( S µ ( ρ ) + δ ) } ;

3. for every ε > , there exists n ∈ N such that, for all n ≥ n , µ ⊗ n ( A ( n ) δ ( ρ ; µ )) ≥ (1 − ε ) exp { n ( S µ ( ρ ) − δ ) } . We proved these claims in [11, Ch. 12]; our proofs are very similar to thestandard ones for (euclidean) diﬀerential entropy, see [4, Ch. 8].Below, we write A ( n ) δ if ρ and µ are clear from context.When E X is a countable set and µ the counting measure, every probabilitylaw ρ on E X is absolutely continuous with respect to µ ; if p : E X → R is itsdensity, S µ ( ρ ) corresponds to the familiar expression − P x ∈ E X p ( x ) log p ( x ).If E X = R n , µ is the corresponding Lebesgue measure, and ρ a probabilitylaw such that ρ ≪ µ , then the derivative d ρ/ d µ ∈ L ( R n ) corresponds to the ele-mentary notion of density, and the quantity S µ ( ρ ) is the diﬀerential entropy thatwas also introduced by Shannon in [10]. He remarked that the covariance of thediﬀerential entropy under diﬀeomorphisms is consistent with the measurementof randomness “ relative to an assumed standard .” For example, consider a linearautomorphism of R n , ϕ ( x , ..., x n ) = ( y , ..., y n ), represented by a matrix A . Set µ = d x · · · d x n and ν = d y · · · d y n . It can be easily deduced from the change-of-variables formula that ν ( ϕ ( V )) = | det A | µ ( V ). Similarly, ϕ ∗ ρ has density f ( ϕ − ( y )) | det A | − w.r.t. ν , and this implies that S ν ( ϕ ∗ ρ ) = S µ ( ρ )+ log | det A | ,cf. [4, Eq. 8.71]. Hence (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − n log n Y i =1 d ϕ ∗ ρ d ν ( y i ) − S ν ( ϕ ∗ ρ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − n log n Y i =1 d ρ d µ ( ϕ − ( y i )) − S µ ( ρ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (4)from which we deduce that A ( n ) δ ( ϕ ∗ ρ, ν ) = ϕ × n ( A ( n ) δ ( ρ ; µ )) and consequently ν ⊗ n ( A ( n ) δ ( ϕ ∗ ρ ; ν )) = | det A | n µ ⊗ n ( A ( n ) δ ( ρ ; µ )) , (5) J. P. Vigneaux which is consistent with the corresponding estimates given by Proposition 1.In the discrete case one could also work with any multiple of the countingmeasure, ν = αµ , for α >

0. In this case, the chain rule for Radon-Nikodymderivatives (see [6, Sec. 19.40]) givesd ρ d µ = d ρ d ν d ν d µ = α d ρ d ν , (6)and therefore S µ ( ρ ) = S ν ( ρ ) − log α. Hence the discrete entropy depends on thechoice of reference measure, contrary to what is usually stated. This function isinvariant under a bijection of ﬁnite sets, but taking on both sides the countingmeasure as reference measure. The proper analogue of this in the euclidean caseis a measure-preserving transformation (e.g. | det A | = 1 above), under whichthe diﬀerential entropy is invariant.For any E X , if µ is a probability law, the expression S µ ( ρ ) is the oppositeof the Kullback-Leibler divergence D KL ( ρ || µ ) := − S µ ( ρ ) . The positivity of thedivergence follows from a customary application of Jensen’s inequality or fromthe asymptotic argument given in the next subsection.The asymptotic relationship between volume and entropy given by the AEPcan be summarized as follows:

Corollary 1. lim δ → lim n →∞ n log µ ⊗ n ( A ( n ) δ ( ρ ; µ )) = S µ ( ρ ) . Proposition 1 gives a meaning to the divergence and the positivity/negativity of S µ ( ρ ).1. Discrete case: let E X be a countable set and µ be the counting measure.Irrespective of ρ , the cardinality of µ ⊗ n ( A ( n ) δ ( ρ ; µ )) is at least 1, hence thelimit in Corollary 1 is always positive, which establishes S µ ( ρ ) ≥

0. The case S µ ( ρ ) = 0 corresponds to certainty: if ρ = δ x , for certain x ∈ E X , then A ( n ) δ = { ( x , ..., x ) } .2. Euclidean case: E X Euclidean space, µ Lebesgue measure. The diﬀerentialentropy is negative if the volume of the typical set is (asymptotically) smallerthan 1. Moreover, the divergence of the diﬀerential entropy to −∞ corre-spond to asymptotic concentration on a µ -negligible set. For instance, if ρ has µ ( B ( x , ε )) − χ B ( x ,ε ) , then S λ d ( ρ ) = log( | B ( x , ε ) | ) = log( c d ε d ), where c d is a constant characteristic of each dimension d . By part (2) of Proposition1, | A ( n ) δ | ≤ exp( nd log ε + Cn ) , which means that, for ﬁxed n , the volumegoes to zero as ε →

0, as intuition would suggest. Therefore, the divergententropy is necessary to obtain the good volume estimates.

3. Whereas the positivity of the (discrete) entropy arises from a lower bound to the volume of typical sets, the positivity of the Kullback-Leibler is of adiﬀerent nature: it comes from an upper bound . In fact, when µ and ρ are ntropy under disintegrations 5 probability measures such that ρ ≪ µ , the inequality µ ⊗ n ( A ( n ) δ ( ρ ; µ )) ≤ δ , which translates into S µ ( ρ ) ≤ D KL ( ρ || µ ) ≥ We summarize in this section some fundamental results on disintegrations aspresented in [2]. Throughout it, ( E, B ) and ( E T , B T ) are measurable spacesequipped with σ -ﬁnite measures ν and ξ , respectively, and T : ( E, B ) → ( E T , B T )is a measurable map.Deﬁnition 1 is partly motivated by the following observation: when E T isﬁnite and B T is its algebra of subsets 2 E T , we can associate to any probability P on ( E, B ) a ( T, T ∗ P )-disintegration given by the conditional measures P t : B → R , B P ( B ∩ { T = t } ) /P ( T = t ), indexed by t ∈ E T . In particular, P ( B ) = X t ∈ E T P ( T = t ) P t ( B ) . (7)Remark that P t is only well deﬁned on the maximal set of t ∈ E T such that T ∗ P ( t ) >

0, but only these t play a role in the disintegration (7).General disintegrations give regular versions of conditional expectations.Let ν be a probability measure, ξ = T ∗ ν , and { ν t } the corresponding ( T, ξ )-disintegration. Then the function x ∈ E R E χ B ( x ) d ν T ( x ) —where χ B denotesthe characteristic function—is σ T measurable and a regular version of the con-ditional probability ν σ ( T ) ( B ) as deﬁned by Kolmogorov.Disintegrations exist under very general hypotheses. For instance, if ν isRadon, T ∗ ν ≪ ξ , and B T is countably generated and contains all the singletons { t } , then ν has a ( T, ξ )-disintegration. The resulting measures ν t measures areuniquely determined up to an almost sure equivalence. See [2, Thm. 1].As we explained in the introduction, a disintegration of a reference measureinduces disintegrations of all measures absolutely continuous with respect to it. Proposition 2.

Let ν have a ( T, ξ ) -disintegration { ν t } and let ρ be absolutelycontinuous with respect to ν with ﬁnite density r ( x ) , with each ν , ξ and ρ σ -ﬁnite.1. The measure ρ has a ( T, ξ ) -disintegration { ˜ ρ t } where each ˜ ρ t is dominatedby the corresponding ν t , with density r ( x ) .2. The image measure T ∗ ρ is absolutely continuous with respect to ξ , with den-sity R E r d ν t .3. The measures { ˜ ρ t } are ﬁnite for ξ -almost all t if and only if T ∗ ρ is σ -ﬁnite.4. The measures { ˜ ρ t } are probabilities for ξ -almost all t if and only if ξ = T ∗ ρ . J. P. Vigneaux

5. If T ∗ ρ is σ -ﬁnite then < ν t r < ∞ T ∗ ν -almost surely, and the measures { ρ t } given by Z E f d ρ t = R E f r d ν t R E r d ν t are probabilities that give a ( T, T ∗ ρ ) -disintegration of ρ .Example 1 (Product spaces). We suppose that ( E, B , ν ) is the product of twomeasured spaces spaces ( E T , B T , ξ ) and ( E S , B S , ν ), with ξ and ν both σ -ﬁnite.Let ν t be the image of ν under the inclusion s ( t, s ). Then Fubini’s theoremimplies that ν t is a ( T, ξ )-disintegration of ν . (Remark that ξ = T ∗ ν . In general,the measure T ∗ ν is not even σ -ﬁnite.) If r ( t, s ) is the density of a probability ρ on ( E, B ), then ρ t ≪ ν t with density r ( t, s )—the value of t being ﬁxed—and ˜ ρ t is a probability supported on { T = t } with density r ( t, s ) / R E S r ( t, s ) d ν ( s ) . Any disintegration gives a chain rule for entropy.

Proposition 3 (Chain rule for general disintegrations).

Let T : ( E X , B X ) → ( E Y , B Y ) be a measurable map between arbitrary measurable spaces, µ (respec-tively ν ) a σ -ﬁnite measure on ( E X , B X ) (resp. ( E Y , B Y ) ), and { µ y } a ( T, ν ) -disintegration of µ . Then any probability measure ρ absolutely continuous w.r.t. µ , with density r , has a ( T, ν ) -disintegration { ˜ ρ y } y ∈ Y such that for each y , ˜ ρ y = r · µ y . Additionally, ρ has a ( T, T ∗ ρ ) -disintegration { ρ y } y ∈ Y such that each ρ y is a probability measure with density r/ R E X r d µ y w.r.t. µ y , and the followingchain rule holds: S µ ( ρ ) = S ν ( T ∗ ρ ) + Z E Y S µ y ( ρ y ) d T ∗ ρ ( y ) . (8) Proof.

For convenience, we use here linear-functional notation: R X f ( x ) d µ ( x ) isdenoted µ ( f ) or µ x ( f ( x )) if we want to emphasize the variable integrated.Almost everything is a restatement of Proposition 2. Remark that y µ y ( r )is the density of T ∗ ρ with respect to ν .Equation (8) is established as follows: S µ ( ρ ) ( def ) = ρ (cid:18) − log d ρ d µ (cid:19) = T ∗ ρ y (cid:18) ρ y (cid:18) − log d ρ d µ (cid:19)(cid:19) (9)= T ∗ ρ y (cid:18) ρ y (cid:18) − log d ρ y d µ y − log µ y ( r ) (cid:19)(cid:19) (10)= T ∗ ρ y (cid:18) ρ y (cid:18) − log d ρ y d µ y (cid:19)(cid:19) + T ∗ ρ y (cid:18) − log d T ∗ ρ d ν (cid:19) (11)= T ∗ ρ y (cid:0) S µ y ( ρ y ) (cid:1) + S ν ( T ∗ ρ ) , (12)where (9) is the fundamental property of the T -disintegration { ρ y } y and (10) isjustiﬁed by the equalities d ρ d µ = d˜ ρ y d µ y = m y ( r ) d ρ y d µ y . ntropy under disintegrations 7 Example 2.

From the computations of Example 1, it is easy to see that if E X = R n × R m , µ is the Lebesgue measure, and T is the projection on the R n factor,then (8) corresponds to the familiar chain rule for Shannon’s diﬀerential entropy. Example 3 (Chain rule in polar coordinates).

Let E X = R \ { } , µ be theLebesgue measure d x d y on R , and ρ = f d x d y a probability measure. Everypoint ~v ∈ E X can be parametrized by cartesian coordinates ( x, y ) or polarcoordinates ( r, θ ), i.e. ~v = ~v ( x, y ) = ~v ( r, θ ). The parameter r takes values fromthe set E R =]0 , ∞ [, and θ from E Θ = [0 , π [; the functions R : E X → E R , v r ( ~v ) and Θ : E X → E Θ , ~v θ ( ~v ) can be seen as random variables with laws R ∗ ρ and Θ ∗ ρ , respectively. We equip E R (resp. E Θ ) with the Lebesgue measure µ R = d r (resp. µ H = d θ ).The measure µ has a ( R, µ R )-disintegration { r d θ } r ∈ E R ; here r d θ is the uni-form measure on R − ( r ) of total mass 2 πr . This is a consequence of the change-of-variables formula: Z R ϕ ( x, y ) d x d y = Z [0 , ∞ [ (cid:18)Z π ϕ ( r, θ ) r d θ (cid:19) d r, (13)which is precisely the disintegration property. Hence, according to Proposition2, ρ disintegrates into probability measures { ρ r } r ∈ E R , with each ρ r concen-trated on { R = r } , absolutely continuous w.r.t. µ r = r d θ and with density f / R π f ( r, θ ) r d θ . The exact chain rule (8) holds in this case.This should be compared with Lemma 6.16 in [8]. They consider the randomvector ( R, Θ ) as an R valued random variable, and the reference measure tobe ν = d r d θ . The change-of-variables formula implies that ( R, Θ ) has density rf ( r, θ ) with respect to ν , so S ν ( ρ ) = S µ ( ρ ) − E ρ (log R ). Then they apply thestandard chain rule to S ν ( ρ ), i.e. as in Examples 1 and 2, to obtain a deformedchain rule for S µ ( ρ ): S µ ( ρ ) = S µ R ( R ∗ ρ ) + Z ∞ − Z π log f R π f d θ ! f d θ R π f d θ ! + E ρ (log R ) . (14)Our term R E R S µ r ( ρ r ) d R ∗ ρ ( r ) comprises the last two terms in the previous equa-tion. Remark 1.

Formula (13) is a particular case of the coarea formula [1, Thm. 2.93],which gives a disintegration of the Hausdorﬀ measure H N restricted to a count-ably H N -rectiﬁable subset E of R M with respect to a Lipschitz map f : R M → R k (with k ≤ N ) and the Lebesgue measure on R k . So the argument of the pre-vious example also applies to the extra term − E ( x , y ) [log J E p y ( x , y )] in the chainrule of [7, Thm. 41], which could be avoided by an adequate choice of referencemeasures.Combining Corollary 1 and the preceding proposition, we get a precise inter-pretation of the conditional term in terms of asymptotic growth of the volumeof slices of the typical set. J. P. Vigneaux

Proposition 4.

Keeping the setting of the previous proposition, lim δ → lim n →∞ n log ν ⊗ n [ µ ⊗ ny ( A ( n ) δ ( ρ ; µ ))] ν ⊗ n ( A ( n ) δ ( T ρ ; ν )) ! = T ρ y ( S µ y ( ρ y )) . Proof.

It is easy to prove that if { µ y } y is a ( T, ν )-disintegration of µ , then { µ ⊗ ny } y is a ( T × n , ν ⊗ n )-disintegration of µ ⊗ n . The disintegration property reads µ ⊗ n ( A ) = ν ⊗ n ( µ ⊗ ny ( A )) (15)for any measurable set A . Hencelog µ ⊗ n ( A ( n ) δ ( ρ ; µ )) = log ν ⊗ n ( A ( n ) δ ( T ρ ; ν )) + log ν ⊗ n ( µ ⊗ ny ( A ( n ) δ ( ρ ; µ ))) ν ⊗ n ( A ( n ) δ ( T ρ ; ν )) . (16)The results follows from the application of lim δ → lim n n to this equality andcomparison of the result with the chain rule.In connection to this result, remark that T ρ ⊗ n concentrates on A ( n ) δ ( T ρ ; ν ) andhas approximately density 1 /ν ⊗ n ( A ( n ) δ ( T ρ ; ν )), so ν ⊗ n [ µ ⊗ ny ( A ( n ) δ ( ρ ; µ ))] /ν ⊗ n ( A ( n ) δ ( T ρ ; ν ))is close to an average of the volume of a “typical part” of the ﬁber T − ( y )according to the “true” law T ρ ⊗ n . Given a locally compact topological group G , there is a unique left-invariantpositive measure (left Haar measure) up to a multiplicative constant [3, Thms.9.2.2 & 9.2.6]. A particular choice of left Haar measure will be denoted by λ withsuperscript G e.g. λ G . The disintegration of Haar measures is given by Weil’sformula. Proposition 5 (Weil’s formula).

Let G be a locally compact group and H aclosed normal subgroup of G . Given Haar measures on two groups among G , H and G/H , there is a Haar measure on the third one such that, for any integrablefunction f : G → R , Z G f ( x ) dλ G ( x ) = Z G/H (cid:18)Z H f ( xy ) d λ H ( y ) (cid:19) d λ G/H ( xH ) . (17)The three measures are said to be in canonical relation , which is written λ G = λ G/H λ H . For a proof of Proposition 5, see pp. 87-88 and Theorem 3.4.6 of [9].For any element [ g ] of G/H , representing a left coset gH , let us denote by λ H [ g ] the image of λ H under the inclusion ι g : H → G, h gH . This is welldeﬁned i.e. does not depend on the chosen representative g : the image of ι g ntropy under disintegrations 9 depends only on the coset gH , and if g , g are two elements of G such that g H = g H , and A is subset of G , the translation h g − g h establishes abijection ι − g ( A ) ∼ → ι − g ( A ); the left invariance of the Haar measure implies that λ H ( ι − g ( A )) = λ H ( ι − g ( A )) i.e. ( ι g ) ∗ λ H = ( ι g ) ∗ λ H as claimed. Proposition 5shows then that { λ H [ g ] } [ g ] ∈ G/H is a (

T, λ

G/H )-disintegration of λ G . In view of thisand Proposition 3, the following result follows. Proposition 6 (Chain rule, Haar case).

Let G be a locally compact group, H a closed normal subgroup of G , and λ G , λ H , and λ G/H

Haar measures incanonical relation. Let ρ be a probability measure on G . Denote by T : G → G/H the canonical projection. Then, there is T -disintegration { ρ [ g ] } [ g ] ∈ G/H of ρ suchthat each ρ [ g ] is a probability measure, and S λ G ( ρ ) = S λ G/H ( π ∗ ρ ) + Z G/H S λ H [ g ] ( ρ [ g ] ) d π ∗ ρ ([ g ]) . (18) References

1. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and FreeDiscontinuity Problems. Oxford Science Publications, Clarendon Press (2000)2. Chang, J.T., Pollard, D.: Conditioning as disintegration. Statistica Neerlandica (3), 287–317 (1997)3. Cohn, D.: Measure Theory: Second Edition. Birkh¨auser Advanced Texts BaslerLehrb¨ucher, Springer New York (2013)4. Cover, T., Thomas, J.: Elements of Information Theory. A Wiley-Interscience pub-lication, Wiley (2006)5. Csisz´ar, I.: Generalized entropy and quantization problems. In: Transactions ofthe 6th Praque Conference on Information Theory, Statistical Decision Functions,Random Processes (Academia, Prague) (1973)6. Hewitt, E., Stromberg, K.: Real and Abstract Analysis: A modern treatment ofthe theory of functions of a real variable. Springer Berlin Heidelberg (1965)7. Koliander, G., Pichler, G., Riegler, E., Hlawatsch, F.: Entropy and source coding forinteger-dimensional singular random variables. IEEE Transactions on InformationTheory (11), 6124–6154 (Nov 2016)8. Lapidoth, A., Moser, S.M.: Capacity bounds via duality with applications tomultiple-antenna systems on ﬂat-fading channels. IEEE Transactions on Informa-tion Theory (10), 2426–2467 (2003)9. Reiter, H., Stegeman, J.D.: Classical Harmonic Analysis and Locally CompactGroups. No. 22 in London Mathematical Society monographs, new. ser., ClarendonPress (2000)10. Shannon, C.: A mathematical theory of communication. Bell System TechnicalJournal27