Tropical diagrams of probability spaces
TTROPICAL DIAGRAMS OF PROBABILITY SPACES
R. MATVEEV AND J. W. PORTEGIES
Abstract.
After endowing the space of diagrams of probability spaceswith an entropy distance, we study its large-scale geometry by identifyingthe asymptotic cone as a closed convex cone in a Banach space. We callthis cone the tropical cone , and its elements tropical diagrams of probabilityspaces . Given that the tropical cone has a rich structure, while tropical dia-grams are rather flexible objects, we expect the theory of tropical diagramsto be useful for information optimization problems in information theoryand artificial intelligence. In a companion article, we give a first applicationto derive a statement about the entropic cone. Introduction
With [MP18] we started a research program aiming for a systematic ap-proach to a class of information optimization problems in information theoryand artificial intelligence. A prototypical example of such a problem, stillwide open, is the characterization of the entropic cone, the closure of all vec-tors in R N − , which are entropically representable. Other information opti-mization problems arise for instance in causal inference [SA15], artificial in-telligence [VDP13], information decomposition [BRO + + tropical cone , and to derive someof its basic properties. In [MP19b] we apply the theory to derive a statementabout the entropic cone.Before outlining the construction of the tropical cone, let us mention thatfor our purposes, the language of random variables proved inconvenient, whichis why work with diagrams of probability spaces instead.Diagrams of probability spaces are commutative diagrams in the categoryof probability spaces, with (equivalence classes of) measure-preserving maps a r X i v : . [ m a t h . D S ] M a y as morphisms, such as(1.1) ZX Y ZX YU TU V WX Y Z
Collections of n random variables give rise to a special type of diagrams, thatinclude, besides the target spaces of the random variables themselves, the tar-get space of every joint variable. Such diagrams have a particular combinato-rial type. The first and the last diagrams in (1.1) are examples of such specialtypes of diagrams in case of two and three random variables respectively. Thedescription of other diagrams using the language of random variables is muchless transparent.We construct the tropical cone as the asymptotic cone in the space ofdiagrams of probability spaces endowed with the intrinsic entropy distance [KSˇS12, Vid12, MP18]. The asymptotic cone captures large-scale geometry ofa metric space. As a particularly neat application, A’Campo gave an elegantconstruction of the real numbers as an asymptotic cone in a metric space of se-quences of integers [A’C03]. We will call elements in the tropical cone tropicaldiagrams of probability spaces .The reason for the name tropical cone is the following. For instance in al-gebraic geometry, tropical varieties are, roughly speaking, divergent sequencesof classical varieties, renormalized on a log scale with an increasing base. Theadjective ‘tropical’ carries little semantics, but was introduced in honor of theBrazilian mathematician and computer scientist Imre Simon who worked onthe subject of tropical mathematics. Analogously, we construct the asymptoticcone from certain divergent sequences with respect to the intrinsic entropy dis-tance. As the intrinsic entropy distance is entropy-based, we achieve a similartype of renormalization as in algebraic geometry.The tropical cone has a rich algebraic structure. Indeed, we show that it isa closed, convex cone in a Banach space. In particular, one can take convexcombinations of tropical diagrams. Other useful operations and constructionscan be carried through for tropical diagrams, whereas they do not have anequivalent in the classical context of probability spaces, see [MP19a]. All inall, from some perspective, tropical diagrams are easier to deal with than di-agrams or probability spaces, since only rough, asymptotic relations betweenprobability spaces are preserved under tropicalization, similar to how all com-plicated features of the landscape disappear when looking at the Earth fromouter space.The structure of the present article is as follows. In Section 2, we firstgive a general construction of an asymptotic cone in an abstract setting. Webelieve that this abstract setting will make the construction more transpar-ent and easier to follow. The results we present in that section are probably quite standard, but we find it beneficial to gather them “under one roof.” InSection 3 we show how, under certain conditions, the asymptotic cone canbe interpreted as a closed convex cone in a Banach space. We specify to thecase of diagrams of probability spaces in Section 4, reformulate the Asymp-totic Equipartition Property proved in [MP18] in terms of tropical diagramsin Section 5. We conclude with a simple characterization of the tropical conefor special types of diagrams in Section 6.2. Asymptotic Cones of Metric Abelian Monoids
In this section we define the asymptotic cone in the setting of an abstractmetric Abelian monoid. In a later section, we will specify to the case of dia-grams of probability spaces.2.1.
Metric and pseudo-metric spaces.
A pseudo-metric space ( X, d ) isa set X equipped with a pseudo-distance d , a bivariate function satisfying allthe axioms of a distance function, except that it is allowed to vanish on pairs ofnon-identical points. An isometry of such spaces is a distance-preserving map,such that for any point in the target space there is a point in the image at zerodistance away from it. Given such an pseudo-metric space ( X, d ) one couldalways construct an isometric metric space ( X / d = , d ) , the metric quotient, byidentifying all pairs of points that are distance zero apart.Any property formulated in terms of the pseudo-metric holds simultaneouslyfor a pseudo-metric space and its metric quotient. It will be convenient forus to construct pseudo-metrics on spaces instead of passing to the quotientspaces.For a pair of points x, y ∈ X in a pseudo-metric space ( X, d ) we will write x d = y if d ( x, y ) =
0. We call such a pair of points ( d -)metrically equivalent.Many metric-topological notions such as (Lipschitz-)continuity, compact-ness, ε -nets, dense subsets, etc., extend to the setting of a pseudo-metric spacesand exercising certain care one may switch between a pseudo-metric space andits metric quotient replacing the d = -sign with equality.2.2. Metric Abelian Monoids.
A monoid is a set equipped with a bivariateassociative operation and a neutral element. The operation is usually calledmultiplication, or addition if it is commutative. We call a monoid with pseudo-distance ( Γ , + , d ) a metric Abelian monoid if it satisfies:(1) For any γ, γ ′ ∈ Γ holds γ + γ ′ d = γ ′ + γ (2) The binary operation is 1-Lipschitz with respect to each argument: Forany γ, γ ′ , γ ′′ ∈ Γ d ( γ + γ ′ , γ + γ ′′ ) ≤ d ( γ ′ , γ ′′ ) The following proposition is elementary.
Proposition 2.1.
Let ( Γ , + , d ) be a metric Abelian monoid. Then: (1) The translations maps T η ∶ Γ → Γ , γ ↦ γ + η are non-expanding for any η ∈ Γ . (2) For any quadruple γ , γ , γ , γ ∈ Γ holds d ( γ + γ , γ + γ ) ≤ d ( γ , γ ) + d ( γ , γ ) (3) For every n ∈ N , and γ , γ ∈ Γ also holds d ( n ⋅ γ , n ⋅ γ ) ≤ n ⋅ d ( γ , γ ) ⊠ A metric Abelian monoid ( Γ , + , δ ) will be called homogeneous if it satisfies(2.1) δ ( n ⋅ γ , n ⋅ γ ) = n ⋅ δ ( γ , γ ) A homogeneous metric Abelian monoid is called an R ≥ -semi-module ( Γ , + , ⋅ , δ ) if in addition there is a doubly distributive R ≥ -action such that for any λ , λ ∈ R ≥ and γ , γ ∈ Γ holds λ ⋅ ( λ ⋅ γ ) δ = ( λ λ ) ⋅ γ λ ⋅ ( γ + γ ) δ = λ ⋅ γ + λ ⋅ γ ( λ + λ ′ ) ⋅ γ δ = λ ⋅ γ + λ ′ ⋅ γ δ ( λ ⋅ γ, λ ⋅ γ ′ ) = λ ⋅ δ ( γ, γ ′ ) A convex cone in a normed vector space would be a typical example ofan R ≥ -semimodule. An intersection of a convex cone in R n with the integerlattice is an example of a monoid, that does not admit semimodule structure.The following proposition asserts that if a metric Abelian monoid is homo-geneous, then the pseudo-distance is translation invariant, and, in particular,it satisfies a cancellation property. This result was communicated to us byTobias Fritz, see also [Fri], [MP18, Proposition 3.7]. Proposition 2.2.
Let ( Γ , + , δ ) be a homogeneous metric Abelian monoid.Then the pseudo-distance function δ is translation invariant, that is it sat-isfies for any γ , γ , η ∈ Γ δ ( γ + η, γ + η ) = δ ( γ , γ ) In particular, the following cancellation property holds in Γ If γ + η δ = γ + η , then γ δ = γ . ⊠ Asymptotic Cones (Tropicalization) of Monoids.
In our construc-tion points of the asymptotic cone of ( Γ , + , d ) will be sequences of points in Γthat grow almost linearly in a certain sense described below. Admissible functions.
Admissible functions will be used to measure thedeviation of a sequence from being linear. We call a function ϕ ∶ R ≥ → R ≥ admissible if(1) the function ϕ is non-decreasing;(2) there exists a constant D ϕ ≥ s ⋅ ∫ ∞ s ϕ ( t ) t d t ≤ D ϕ ⋅ ϕ ( s ) forany s ≥
1. In particular the function ϕ is summable against d t / t .For example, the function ϕ ( t ) ∶= t α is admissible for any 0 ≤ α <
1. Anyadmissible function is necessarily sub-linear, that is ϕ ( t )/ t → t → ∞ . Alinear combination of admissible functions with non-negative coefficients is alsoadmissible.2.3.2. Quasi-linear sequences.
Let ( Γ , + , d ) be a metric Abelian monoid and ϕ be an admissible function. A sequence ¯ γ = { γ ( i )} ∈ Γ N will be called quasi-linear with defect bounded by ϕ if for every m, n ∈ N the following bound issatisfied d ( γ ( m + n ) , γ ( m ) + γ ( n )) ≤ ϕ ( m + n ) For technical reasons we also require γ ( ) =
0. Sequences that are quasi-linearwith defect bounded by ϕ ≡ linear sequences .For an admissible function ϕ we will write QL ϕ ( Γ , d ) for the space of allquasi-linear sequences with defect bounded by C ⋅ ϕ for some (depending onthe sequence) constant C ≥
0. We will also use notation L ( Γ , d ) ∶= QL ( Γ , d ) for the space of linear sequences.2.3.3. Asymptotic distance.
Given two quasi-linear sequences ¯ γ ∈ QL ϕ ( Γ , d ) and ¯ γ ∈ QL ϕ ( Γ , d ) the sequence of distances a ( n ) ∶= d ( γ ( n ) , γ ( n )) is ϕ -subadditive, where ϕ = ϕ + ϕ is also admissible, i.e. a ( m + n ) ≤ a ( n ) + a ( n ) + ϕ ( n + m ) for any n, m ∈ N . By the generalization of Fekete’s Lemma by De Bruijn andErd¨os [dBE52, Theorem 23], it follows that the following limit exists and finiteˆ d ( ¯ γ , ¯ γ ) ∶= lim n →∞ n d ( γ ( n ) , γ ( n )) We call the quantity ˆ d ( ¯ γ , ¯ γ ) the asymptotic distance between ¯ γ , ¯ γ ∈ QL ϕ ( Γ , d ) . It is easy to verify that ˆ d indeed satisfies all axioms of a pseudo-distance. Even if d was a proper distance function, the corresponding asymp-totic distance may vanish on some pairs of non-identical elements. We calltwo sequences ¯ γ ∈ QL ϕ ( Γ , d ) , ¯ γ ∈ QL ϕ ( Γ , d ) asymptotically equivalent ifˆ d ( ¯ γ , ¯ γ ) = γ d = ¯ γ Quasi-homogeneity.
We will show that quasi-linear sequences are alsoquasi-homogeneous in the sense of the following lemma.
Lemma 2.3.
Let ¯ γ ∈ Γ N be a sequence with ϕ -bounded defect. Then for any m, n ∈ N d ( γ ( m ⋅ n ) , m ⋅ γ ( n )) ≤ ⋅ m ⋅ n ⋅ ∫ m ⋅ nn ϕ ( t ) t d t ⊠ Proof:
Define the function ψ related to ϕ as follows ψ ( s ) ∶= ϕ ( e s )/ e s or ϕ ( t ) =∶ t ⋅ ψ ( ln t ) The conclusion of the lemma in terms of ψ then reads d ( γ ( m ⋅ n ) , m ⋅ γ ( n )) ≤ ⋅ m ⋅ n ⋅ ∫ ln ( ⋅ m ⋅ n ) ln n ψ ( s ) d s and it is in that form it will be proven below.Due to monotonicity properties of ϕ function ψ satisfies, for any 0 ≤ s ≤ sψ ( s ) ≤ ψ ( s ) ⋅ e s − s ψ ( s ) ≤ ∫ s + ln 2 s ψ ( s ) d s (2.2)We proceed by induction with respect to m , keeping n fixed. The conclusionof the lemma is obvious for m =
1. For the induction step let m = m ′ + ε ≥ m ′ = ⌊ m / ⌋ and ε ∈ { , } . Then using bound (2.2) we estimate d ( γ ( m ⋅ n ) , m ⋅ γ ( n ) )= d ( γ ( m ′ ⋅ n + m ′ ⋅ n + ε ⋅ n ) , m ′ ⋅ γ ( n ) + m ′ ⋅ γ ( n ) + ε ⋅ γ ( n ) )≤ d ( γ ( m ′ ⋅ n ) , m ′ ⋅ γ ( n ) ) + ϕ ( m ⋅ n )≤ m ′ ⋅ n ⋅ ∫ ln ( m ′ ⋅ n ) ln n ψ ( s ) d s + m ⋅ n ⋅ ψ ( ln ( m ⋅ n ))≤ m ⋅ n (∫ ln ( m ′ ⋅ n ) ln n ψ ( s ) d s + ∫ ln ( m ⋅ n ) ln ( m ⋅ n ) ψ ( s ) d s ) ≤ m ⋅ n ⋅∫ ln ( m ⋅ n ) ln n ψ ( s ) d s ⊠ Applying bound (2) in the definition of admissible functions on page 5 weobtain the following corollary.
Corollary 2.4.
Let ¯ γ be a sequence with ϕ -bounded defect. Then for any m, n ∈ N d ( γ ( m ⋅ n ) , m ⋅ γ ( n )) ≤ ⋅ m ⋅ n ⋅ ∫ ∞ n ϕ ( t ) t d t ≤ D ϕ ⋅ m ⋅ ϕ ( n ) ⊠ The semi-module structure.
The group operation + on Γ induces a ˆ d -continuous (in fact, 1-Lipschitz) group operation on QL ϕ ( Γ , d ) by adding se-quences element-wise. Thus ( QL ϕ ( Γ , d ) , + , ˆ d ) is also a metric Abelian monoid.In addition, it carries the structure of a R ≥ -semi-module, as explained below.The validity of the following constructions is very easy to verify, so we omitthe proofs. Let ϕ > QL ϕ ( Γ , d ) admits anaction of the multiplicative semigroup ( R ≥ , ⋅ ) defined in the following way.Let λ ∈ R ≥ and ¯ γ = { γ ( n )} ∈ QL ϕ ( Γ , d ) . Then define the action of λ on ¯ γ by(2.3) λ ⋅ ¯ γ ∶= { γ (⌊ λ ⋅ n ⌋)} n ∈ N This is only an action up to asymptotic equivalence. Similarly, in the con-structions that follow we are tacitly assuming they are valid up to asymptoticequivalence.The action ⋅ ∶ R ≥ × QL ϕ ( Γ , d ) → QL ϕ ( Γ , d ) is continuous with respect to ˆ d and, moreover it is a homothety (dilation),that is ˆ d ( λ ⋅ ¯ γ , λ ⋅ ¯ γ ) = λ ⋅ ˆ d ( ¯ γ , ¯ γ ) The semigroup structure on QL ϕ ( Γ , d ) is distributive with respect to the R ≥ -action λ ⋅ ( ¯ γ + ¯ γ ) = λ ⋅ ¯ γ + λ ⋅ ¯ γ ( λ + λ ) ⋅ ¯ γ ˆ d = λ ⋅ ¯ γ + λ ⋅ ¯ γ In particular, for n ∈ N and ¯ γ ∈ QL ϕ ( Γ , d ) ¯ γ + ⋯ + ¯ γ ·„„„„„„„„„„„„„„„„‚„„„„„„„„„„„„„„„„¶ n ˆ d = n ⋅ ¯ γ Completeness.
Here, we introduce additional conditions on a metricAbelian monoid ( Γ , + , d ) , that guarantee that ( QL ϕ ( Γ ) , ˆ d ) is a complete metricspace.Suppose ϕ is an admissible function and ( Γ , + , d ) is a metric Abelian monoidsatisfying the following additional property: there exists a constant C >
0, suchthat for any quasi-linear sequence ¯ γ ∈ QL ϕ ( Γ , d ) , there exists an asymptoticallyequivalent quasi-linear sequence ¯ γ ′ with defect bounded by Cϕ . Note that,contrary to the situation in the definition of QL ϕ ( Γ , d ) , the constant C isnow not allowed to depend on the sequence. If this is the case, we say that QL ϕ ( Γ , d ) has the ( C -) uniformly bounded defect property . Proposition 2.5.
Suppose a metric Abelian monoid ( Γ , + , δ ) and an admis-sible function ϕ > are such that ( QL ϕ ( Γ , δ ) , ˆ δ ) has the uniformly boundeddefect property and the distance function δ is homogeneous. Then the space ( QL ϕ ( Γ , δ ) , ˆ δ ) is complete. ⊠ Proof:
Given a Cauchy sequence { ¯ γ i } of elements in ( QL ϕ ( Γ , δ ) , ˆ δ ) we needto find a limit element ¯ η ∈ QL ϕ ( Γ , δ ) . We will construct ¯ η by a diagonal argu-ment. First we replace each element of the sequence { ¯ γ i } by an asymptoticallyequivalent element with defect bounded by Cϕ according to the assumptionof the proposition. We will still call the new sequence { ¯ γ i } . In fact, we maywithout loss of generality assume that C = γ i and ¯ γ j . By homogeneity of δ and Corollary 2.4, it holds for any n, k ∈ N that k ⋅ δ ( γ i ( n ) , γ j ( n )) = δ ( k ⋅ γ i ( n ) , k ⋅ γ j ( n ))≤ δ ( γ i ( k ⋅ n ) , γ j ( k ⋅ n )) + k ⋅ D ϕ ⋅ ϕ ( n ) Dividing by k and passing to the limit k → ∞ , while keeping n fixed, we obtain δ ( γ i ( n ) , γ j ( n )) ≤ n ⋅ ˆ δ ( ¯ γ i , ¯ γ j ) + D ϕ ⋅ ϕ ( n ) Since the sequence ( ¯ γ i ) i ∈ N is Cauchy, it follows that for any n ∈ N there is anumber i ( n ) ∈ N such that for any i, j ≥ i ( n ) holdsˆ δ ( ¯ γ i , ¯ γ j ) ≤ n Then for any i, j, n ∈ N with i, j ≥ i ( n ) we have the following bound(2.4) δ ( γ i ( n ) , γ j ( n )) ≤ D ϕ ⋅ ϕ ( n ) + η by setting η ( n ) ∶= γ i ( n ) ( n ) First we verify that ¯ η is quasi-linear. For m, n ∈ N , we have δ ( η ( n + m ) ,η ( n ) + η ( m )) = δ ( γ i ( n + m ) ( n + m ) , γ i ( n ) ( n ) + γ i ( m ) ( m ) )≤ δ ( γ i ( n + m ) ( n + m ) , γ i ( n + m ) ( n ) + γ i ( n + m ) ( m ) ) + δ ( γ i ( n + m ) ( n ) + γ i ( n + m ) ( m ) , γ i ( n ) ( n ) + γ i ( m ) ( m ) )≤ ϕ ( n + m ) + D ϕ ⋅ ϕ ( n ) + + D ϕ ⋅ ϕ ( m ) + ≤ ( D ϕ + ) ϕ ( n + m ) + ≤ C ′ ⋅ ϕ ( n + m ) for some constant C ′ > γ i to ¯ η is shown as follows. For n, k ∈ N let q n , r n ∈ N bethe quotient and the remainder of the division of n by k , that is n = q n ⋅ k + r n and 0 ≤ r n < k . Fix k ∈ N and let i ≥ i ( k ) , thenˆ δ ( ¯ γ i , ¯ η ) = lim n →∞ n δ ( γ i ( n ) , η ( n ))= lim n →∞ n δ ( γ i ( q n ⋅ k + r n ) , γ i ( n ) ( q n ⋅ k + r n ) )≤ lim sup n →∞ n ( q n ⋅ δ ( γ i ( k ) , γ i ( n ) ( k )) + δ ( γ i ( r n ) , γ i ( n ) ( r n )) ++ q n ⋅ D ϕ ⋅ ϕ ( k ) + ϕ ( n ))≤ lim sup n →∞ n ( q n ⋅ ( D ϕ ⋅ ϕ ( k ) + ) + ( D ϕ ⋅ ϕ ( r n ) + ) ++ q n ⋅ D ϕ ⋅ ϕ ( k ) + ϕ ( n ))= C ′′ ⋅ ϕ ( k )/ k Since k ∈ N is arbitrary and ϕ is sub-linear we havelim i →∞ ˆ δ ( ¯ γ i , ¯ η ) = ⊠ On the density of linear sequences.
For a metric Abelian monoid ( Γ , + , d ) together with an admissible function ϕ we say that QL ϕ ( Γ , d ) hasthe vanishing defect property if for every ε > γ ∈ QL ϕ ( Γ , d ) there exists an asymptotically equivalent quasi-linear sequence ¯ γ ′ with defectbounded by another admissible function ψ such that ∫ ∞ ψ ( t ) t d t < ε .The proposition below gives a sufficient condition under which the linearsequences are dense in the space of quasi-linear sequences. Proposition 2.6.
Suppose ( Γ , + , d ) and admissible function ϕ have the van-ishing defect property. Then L ( Γ , d ) is dense in ( QL ϕ ( Γ , d ) , ˆ d ) . ⊠ Proof:
Let ¯ γ = { γ ( n )} be a quasi-linear sequence. For any i ∈ N select asequence ¯ γ i asymptotically equivalent to ¯ γ with defect bounded by an admis-sible function ϕ i such that ∫ ∞ ϕ i ( t ) t d t < / i according to the “vanishing defect”assumption of the lemma.Define ¯ η i by η i ( n ) ∶= n ⋅ γ i ( ) Thenˆ d ( ¯ γ, ¯ η i ) = ˆ d ( ¯ γ i , ¯ η i ) = lim n →∞ n d ( γ i ( n ) , η i ( n )) = lim n →∞ n d ( γ i ( n ) , n ⋅ γ i ( ))≤ ∫ ∞ ϕ i ( t ) t d t ≤ i Thus, any quasi-linear sequence can be approximated by linear sequences. ⊠ Asymptotic distance on original monoid.
Starting with an element γ ∈ Γone can construct a linear sequence ⃗ γ = { i ⋅ γ } i ∈ N . In view of Proposition 2.1,the map(2.5) ⃗⋅ ∶ ( Γ , d ) → ( L ( Γ , d ) , ˆ d ) is a contraction.By the inclusions in (2.5) we have an induced metric δ on Γ, satisfying forany γ , γ ∈ Γ(2.6) δ ( γ , γ ) ≤ d ( γ , γ ) and the following homogeneity condition(2.7) δ ( n ⋅ γ , n ⋅ γ ) = n ⋅ δ ( γ , γ ) for all n ∈ N .Note that if d was homogeneous to begin with, then δ coincides with d onΓ. By virtue of the bound δ ≤ d , sequences that are quasi-linear with respectto δ are also quasi-linear with respect to d . Since δ is scale-invariant, theassociated asymptotic distance ˆ δ coincides with δ on Γ. We will show (inLemma 2.7 below) that ˆ δ also coincides with ˆ d on d -quasi-linear sequences.Let ϕ be an admissible function. In order to organize all these statements,and to be more precise, let us include the spaces in the following commutativediagram.(2.8) ( L ( Γ , d ) , ˆ d ) ( QL ϕ ( Γ , d ) , ˆ d )( Γ , d ) ( L ( Γ , δ ) , ˆ δ ) ( QL ϕ ( Γ , δ ) , ˆ δ ) ı ı f f ′ The maps f, f ′ and ı are isometries. The maps and are isometricembeddings. The next lemmas show that ı is also an isometric embedding,and it has dense image. Lemma 2.7.
Let ϕ be a positive, admissible function. Then, the natural in-clusion ı ∶ ( QL ϕ ( Γ , d ) , ˆ d ) ↪ ( QL ϕ ( Γ , δ ) , ˆ δ ) is an isometric embedding with the dense image. ⊠ Proof:
First we show that the map ı is an isometric embedding. Let ¯ γ , ¯ γ ∈ QL ϕ ( Γ , d ) be two ϕ -quasi-linear sequences with respect to the distance function d . We have to show that the two numbersˆ d ( ¯ γ , ¯ γ ) = lim n →∞ n d ( γ ( n ) , γ ( n )) and ˆ δ ( ¯ γ , ¯ γ ) = lim n →∞ n δ ( γ ( n ) , γ ( n )) are equal. Since shifts are non-expanding maps, we have δ ≤ d and it followsimmediately that ˆ δ ( ¯ γ , ¯ γ ) ≤ ˆ d ( ¯ γ , ¯ γ ) and we are left to show the opposite inequality. We will do it as follows. Fix n >
0, thenˆ d ( ¯ γ , ¯ γ ) = lim k →∞ k ⋅ n d ( γ ( k ⋅ n ) , γ ( k ⋅ n ))≤ lim k →∞ k ⋅ n ( d ( k ⋅ γ ( n ) , k ⋅ γ ( n )) + k ⋅ D ϕ ⋅ ϕ ( n ))≤ n ˆ d ( γ ( n ) , γ ( n )) + D ϕ ϕ ( n ) n Passing to the limit with respect to n gives the required inequalityˆ d ( ¯ γ , ¯ γ ) ≤ ˆ δ ( ¯ γ , ¯ γ ) Now we will show that the image of ı is dense. Given an element ¯ γ ={ γ ( n )} in QL ϕ ( Γ , ˆ d ) we have to find a ˆ δ -approximating sequence ¯ γ i = { γ i ( n )} in QL ϕ ( Γ , d ) . Define γ i ( n ) ∶= ⌊ ni ⌋ ⋅ γ ( i ) We have to show that each ¯ γ i is d -quasi-linear and that ˆ δ ( ¯ γ i , ¯ γ ) i →∞ —→
0. Thesestatements follow from d ( γ i ( m + n ) , γ i ( m ) + γ i ( n )) = d (⌊ m + ni ⌋ ⋅ γ ( i ) , ⌊ mi ⌋ ⋅ γ ( i ) + ⌊ ni ⌋ ⋅ γ ( i ))≤ d ( γ ( i ) , )≤ C i ⋅ ϕ ( m + n ) for some C i >
0. It is worth noting that the defect of ¯ γ i may not be boundeduniformly with respect to i . Finally, it holds thatˆ δ ( ¯ γ i , ¯ γ ) = lim n →∞ n δ ( γ i ( n ) , γ ( n )) = lim n →∞ n δ (⌊ ni ⌋ ⋅ γ ( i ) , γ ( n ))≤ lim n →∞ [ n δ ( γ ( i ⌊ ni ⌋) , γ ( n )) + n ⌊ ni ⌋ ⋅ D ϕ ⋅ ϕ ( i )]≤ lim n →∞ [ n max k = ,...,i − δ ( γ ( k ) , ) + n ϕ ( n )] + D ϕ ϕ ( i ) i = D ϕ ϕ ( i ) i i →∞ —→ ⊠ The difference between two distance functions ˆ d and ˆ δ is very small: ˆ d isdefined on the dense subset of the domain of definition of ˆ δ and they coincidewhenever are both defined. From now on we will not use the notation ˆ δ . Grothendieck construction
Given an Abelian monoid with a cancellation property, there is a minimalAbelian group (called the Grothendieck Group of the monoid), into which itisomorphically embeds. Similarly, an R ≥ -semi-module naturally embeds intoa normed vector space. A nice example of this construction applied to thesemi-module of convex sets in R n (with the Minkowski sum and the Hausdorffdistance) can be found in [R˚ad52]. Proposition 3.1.
Let ( Γ , + , ⋅ , δ ) be a complete metric Abelian monoid with R ≥ action (an R ≥ -semi-module) with homogeneous pseudo-metric δ . Then thereexists a Banach space ( B , ∣∣ ⋅ ∣∣) and a distance-preserving homomorphism f ∶ Γ → B such that the image of f is a closed convex cone. ⊠ If d is a proper pseudo-metric (not a metric), then the map f is not injective. Proof:
By Lemma 2.2 the pseudo-metric δ is translation invariant. We cantherefore apply the Grothendieck construction to define a normed vector space B : Define B ∶= {( x, y ) ∶ x, y ∈ Γ } / ∼ where ( x, y ) ∼ ( x ′ , y ′ ) if there are z, z ′ ∈ Γ, such that ( x + z, y + z ) d = ( x ′ + z ′ , y ′ + z ′ ) .Define also addition, multiplication by a scalar and a norm on B by settingfor all x, y, x ′ , y ′ ∈ Γ and λ ∈ R ( x, y ) + ( x ′ , y ′ ) ∶= ( x + x ′ , y + y ′ )(− ) ⋅ ( x, y ) ∶= ( y, x ) λ ⋅ ( x, y ) ∶= sign ( λ ) ⋅ (∣ λ ∣ ⋅ x, ∣ λ ∣ ⋅ y )∣∣( x, y )∣∣ ∶= δ ( x, y ) These operations respect the equivalence relation and turn ( B , + , ⋅ , ∣∣ ⋅ ∣∣) into a normed vector-space. The map f defined by f ∶ Γ → B , x ↦ ( x, ) is a well-defined distance-preserving homomorphism.That f ( Γ ) is closed immediately follows as Γ is complete and f is distance-preserving.In general, the space B is not complete. We define the Banach space B asthe completion of the normed vector space B . ⊠ Tropical probability spaces and their diagrams
Diagrams of probability spaces.
We will now briefly describe the con-struction of diagrams of probability spaces, see [MP18] for a more detaileddiscussion. By a finite probability space we will mean a set (not necessarilyfinite) with a probability measure, such that the support of the measure isfinite. For such probability space X we denote by ∣ X ∣ the cardinality of the support of probability measure and the expression x ∈ X will mean, that x isan atom in X , which is a point of positive weight in the underlying set.We will consider commutative diagrams of finite probability spaces, wherearrows are equivalence classes of measure-preserving maps. Two maps are con-sidered equivalent if they coincide on a set of full measure and such equivalenceclasses will be called reductions .Three examples of diagrams of probability spaces are pictured in (1.1). Thecombinatorial structure of such a commutative diagram can be recorded by anobject G , which could be equivalently considered as a special type of category,a finite poset, or a directed acyclic graph (DAG) with additional properties.We will call such objects simply indexing categories . Below we briefly recallthe definition.An indexing category is a finite category such that for any pair of objectsthere exists at most one morphism between them in either direction, and suchthat it satisfies the following property. For any pair of objects i, j in an indexingcategory G there exists a least common ancestor , i.e. an object k such thatthere are morphisms k → i and k → j in G and such that for any other object l admitting morphisms l → i and l → j , there is also a morphism l → k .By [[ G ]] we denote the number of objects in the indexing category, or equiv-alently the number of vertices in the DAG or the number of points in theposet G . Important class of examples of indexing categories are so called fullcategories Λ n , that correspond to the poset of non-empty subsets of a set { , . . . , n } ordered by inclusion. If n =
2, we call the category Λ = ( O ← O { , } → O ) a fan.The space of all commutative diagrams of a fixed combinatorial type willbe denoted Prob ⟨ G ⟩ . A morphism between two diagrams X , Y ∈
Prob ⟨ G ⟩ is defined to be the collection of morphisms between corresponding individualspaces in X and Y , that commute with morphisms within the diagrams X and Y .The construction of forming commutative diagrams could be iterated, pro-ducing diagrams of diagrams. Especially important will be two-fans of G -diagrams, the space of which will be denoted Prob ⟨ G ⟩ ⟨ Λ ⟩ .A two-fan X will be called minimal , if for any morphism of X to anothertwo-fan Y , the following holds: if the induced morphisms on the feet are iso-morphisms, then the top morphism is also an isomorphism. Any G -diagramwill be called minimal if for any sub-diagram, which is a two-fan, it containsa minimal two-fan with the same feet.Given an n -tuple ( X , . . . , X n ) of finite-valued random variables, one canconstruct a minimal Λ n -diagram X = { X I ; χ IJ } by setting for any ∅ ≠ I ⊂{ , . . . , n } X I = ∏ i ∈ I X i where X i is the target space of random variable X i , and the probabilities arethe induced distributions. For the diagram constructed in such a way we willwrite X = ⟨ X , . . . , X n ⟩ . On the other hand, any Λ n -diagram gives rise to the n -tuple of random variables with the domain of definition being the initialspace and the targets being the terminal spaces.The tensor product X ⊗ Y of two G -diagrams is defined by taking the ten-sor product of corresponding probability spaces and the Cartesian product ofmaps.The special G -diagram in which all the spaces are isomorphic to a singleprobability space X will be denoted by X G .For a diagram X ∈
Prob ⟨ G ⟩ one can evaluate entropies of the individualspaces. The corresponding map will be denoted Ent ∗ ∶ Prob ⟨ G ⟩ → R G where the target space is the space of R -valued functions on objects in G andit is equipped with the (cid:96) -norm.For a two-fan F = (X ← Z → Y) of G -diagrams define the entropy distance kd (F ) ∶= ∥ Ent ∗ Z −
Ent ∗ X ∥ + ∥ Ent ∗ Z −
Ent ∗ X ∥ We interpret kd (F ) as a measure of deviation of F from being an isomorphismbetween the diagrams X and Y . Indeed, kd (F ) = F are isomorphisms.We define the intrinsic entropy distance k on the space Prob ⟨ G ⟩ by k (X , Y) ∶= inf { kd (F ) ∶ F = (X ← Z → Y) ∈
Prob ⟨ G ⟩⟨ Λ ⟩} The tensor product is 1-Lipschitz with respect to k , thus ( Prob ⟨ G ⟩ , ⊗ , k ) is a metric Abelian monoid and Ent ∗ ∶ ( Prob ⟨ G ⟩ , ⊗ , k ) → ( R G , ∥ ⋅ ∥ ) is a 1-Lipschitz homomorphism. For proofs and more detailed discussion the readeris referred to [MP18].4.2. Tropical diagrams.
Applying the construction of the previous sectionwe obtain its tropicalization – a semi-module ( Prob [ G ] , + , ⋅ , κ ) . The re-striction of the asymptotic distance on the original monoid can be definedindependently as κ (X , Y) ∶= lim n →∞ n k (X n , Y n ) One of the main tools for the estimation of the (asymptotic) distance is theso-called Slicing Lemma and its following consequence.
Proposition 4.1.
Let G be an indexing category, X , Y ∈
Prob ⟨ G ⟩ and U ∈ Prob . (1) Let
X → U be a reduction, then k (X , Y) ≤ ∫ U k (X ∣ u, Y) d p U ( u ) + [[ G ]] ⋅ Ent ( U ) (2) For a “co-fan”
X → U ← Y holds k (X , Y) ≤ ∫ U k (X ∣ u, Y∣ u ) d p U ( u ) ⊠ The statements and the proofs of the Slicing Lemma and its consequencescan be found in [MP18].We will show below that ( Prob ⟨ G ⟩ , ⊗ , κ ) has the uniformly bounded andvanishing defect properties. For this purpose we need to develop some technicaltools.4.3. Mixtures.
The input data for the mixture operation is a family of G -diagrams, parameterized by a probability space. As a result one obtains an-other G -diagram with pre-specified conditionals. One particular instance ofa mixture is when one mixes two diagrams X and {●} G , the latter being aconstant G -diagram of one-point probability spaces. This operation will beused as a substitute for taking radicals “ X n ” below.4.3.1. Definition of mixtures.
Let G be an indexing category and Θ be a prob-ability space. By Θ G we denote the constant G -diagram – the diagram suchthat all spaces in it are Θ and all morphisms are identity morphisms. Let {X θ } θ ∈ Θ be a family of G -diagrams parameterized by Θ. The mixture of thefamily {X θ } is the reduction M ix {X θ } = (Y —→ Θ G ) such that(4.1) Y∣ θ ≅ X θ for any θ ∈ ΘThe mixture exists and is uniquely defined by property (4.1) up to an iso-morphism which is identity on Θ G .We denote the top diagram of the mixture by Y =∶ ⊕ θ ∈ Θ X θ and also call it the mixture of the family {X θ } .When Θ = Λ α ∶= ( {◻ , ∎} ; p (∎) = α ) is a binary space we write simply X ∎ ⊕ Λ α X ◻ for the mixture. The diagram subindexed by the ∎ will always be the firstsummand.The entropy of the mixture can be evaluated by the following formula Ent ∗ (⊕ θ ∈ Θ X θ ) = ∫ Θ Ent ∗ (X θ ) d p ( θ ) + Ent ∗ ( Θ G ) Mixtures satisfy the distributive law with respect to the tensor product M ix ({X θ } θ ∈ Θ ) ⊗ M ix ({Y θ ′ } θ ′ ∈ Θ ′ ) ≅ M ix ({X θ ⊗ Y θ ′ } ( θ,θ ′ )∈ Θ ⊗ Θ ′ )(⊕ θ ∈ Θ X θ ) ⊗ ( ⊕ θ ′ ∈ Θ ′ Y θ ′ ) ≅ ⊕ ( θ,θ ′ )∈ Θ ⊗ Θ ′ (X θ ⊗ Y θ ′ ) The distance estimates for the mixtures.
Recall that for a diagram cate-gory G we denote by {●} = {●} G the constant G -diagram of one-point spaces.The mixture of a G -diagram with {●} G may serve as an substitute of takingradicals of the diagram. The following lemma provides a justification of thisby some distance estimates related to mixtures and will be used below. Lemma 4.2.
Let G be a complete diagram category and X , Y ∈
Prob ⟨ G ⟩ .Then (1) κ (X , X n ⊕ Λ / n {●}) ≤ Ent ( Λ / n ) (2) κ (X , (X ⊕ Λ / n {●}) n ) ≤ n ⋅ Ent ( Λ / n ) (3) κ ((X ⊗ Y) ⊕ Λ / n {●} , (X ⊕ Λ / n {●}) ⊗ (Y ⊕ Λ / n {●})) ≤ Ent ( Λ / n ) (4) κ ((X ⊕ Λ / n {●}) , (Y ⊕ Λ / n {●})) ≤ n κ (X , Y) ⊠
Note that the distance estimates in the lemma above are with respect tothe asymptotic distance. This is essential, since from the perspective of theintrinsic distance mixtures are very badly behaved.
Proof:
For λ ∈ Λ N / n , define q ( λ ) to be the number of black squares in thesequence λ . It is a binomially distributed random variable with mean N / n and variance Nn ( − n ) .The first claim is then proven by the following calculation κ (X , X n ⊕ Λ / n {●}) = lim N →∞ N k (X N , (X n ⊕ Λ / n {●}) N )= lim N →∞ N k ⎛⎜⎝X N , ⊕ λ ∈ Λ N / n X n ⋅ q ( λ ) ⎞⎟⎠≤ Ent ( Λ / n ) + lim N →∞ N ∫ λ ∈ Λ n / n k (X N , X n ⋅ q ( λ ) ) d p ( λ )≤ Ent ( Λ / n ) + ∥ Ent ∗ (X )∥ ⋅ lim N →∞ nN ⋅ ∫ λ ∈ Λ N / n ∣ N / n − q ( λ )∣ d p ( λ )≤ Ent ( Λ / n ) + ∥ Ent ∗ (X )∥ ⋅ lim N →∞ nN ⋅ √ N ⋅ n ( − n ) = Ent ( Λ / n ) where we used Proposition 4.1(1) for the inequality on the third line above. The second claim is proven similarly and the third follows from the secondand the 1-Lipschitz property of the tensor product: κ ((X ⊗ Y) ⊕ Λ / n {●} , (X ⊕ Λ / n {●}) ⊗ (Y ⊕ Λ / n {●}))≤ κ ((X ⊗ Y) ⊕ Λ / n {●} , X ⊗ Y) + Ent ( Λ / n )≤ Ent ( Λ / n ) Finally, the fourth follows from Proposition 4.1(2), by slicing both argumentsalong Λ / n . ⊠ Vanishing defect property and completeness of the tropical cone.Lemma 4.3.
For every admissible function ϕ , every ¯ X ∈ QL ϕ ( Prob ⟨ G ⟩ , κ ) and every k ∈ N , there exists an asymptotically equivalent sequence ¯ Y withdefect bounded by the admissible function ϕ k defined by ϕ k ( s ) ∶= Ent ( Λ / k ) + k ϕ ( k ⋅ s ) ⊠ Proof:
Let ¯
X = {X ( i )} be a quasi-linear sequence with defect bounded by ϕ and let k ∈ N .Define a new sequence ¯ Y = {Y( i )} by Y( i ) ∶= (X ( k ⋅ i )) ⊕ Λ / k {●} First we verify that the sequences ¯ X and ¯ Y are asymptotically equivalent, thatis ˆ κ ( ¯ X , ¯ Y) ∶= lim i →∞ i κ (X ( i ) , Y( i )) = X and ¯ Y using Lemma 4.2 and Corollary 2.4 as follows κ (X ( i ) , Y( i )) = κ (X ( i ) , X ( k ⋅ i ) ⊕ Λ / k {●} )≤ κ (X ( i ) , X ( i ) k ⊕ Λ / k {●}) + κ (X ( i ) k ⊕ Λ / k {●} , X ( k ⋅ i ) ⊕ Λ / k {●})≤ Ent ( Λ / k ) + D ϕ ⋅ ϕ ( i ) Thus ˆ κ ( ¯ X , ¯ Y) = Y is κ -quasi-linear and evaluate its defect, also using Lemma 4.2. Let i, j ∈ N , then κ (Y( i + j ) , Y( i ) ⊗ Y( j ))= κ (X ( k ⋅ i + k ⋅ j ) ⊕ Λ / k {●} , (X ( k ⋅ i ) ⊕ Λ / k {●} ) ⊗ (X ( k ⋅ j ) ⊕ Λ / k {●} ))≤ κ ((X ( k ⋅ i )⊗X ( k ⋅ j ))⊕ Λ / k {●} , (X ( k ⋅ i ) ⊕ Λ / k {●} )⊗(X ( k ⋅ j ) ⊕ Λ / k {●} ))+ k ϕ ( k ⋅ ( i + j ))≤ Ent ( Λ / k ) + k ϕ ( k ⋅ ( i + j )) ⊠ Corollary 4.4.
For any indexing category G and for the admissible function ϕ given by ϕ ( t ) = t α , α ∈ [ , ) , QL ϕ ( Prob ⟨ G ⟩ , κ ) has the uniformly boundedand vanishing defect properties. ⊠ Proof:
Let ¯
X ∈ QL ϕ ( Prob ⟨ G ⟩ , κ ) . By Lemma 4.3 there exists an asymptot-ically equivalent sequence ¯ Y with defect bounded by ϕ k defined by ϕ k ( t ) ∶= Ent ( Λ / k ) + k Cϕ ( k ⋅ t )= Ent ( Λ / k ) + k C ( k ⋅ t ) α Hence there exists a sequence c k → t ≥ ϕ k ( t ) ≤ c k t α showing the uniformly bounded and vanishing defect property. ⊠ Diagrams of tropical probability spaces.
By applying the generalsetup in the previous section to the metric Abelian monoids ( Prob ⟨ G ⟩ , ⊗ , k ) and ( Prob ⟨ G ⟩ , ⊗ , κ ) and using the Corollary 4.4 we obtain the followingtheorem. Theorem 4.5.
Fix an admissible function ϕ and consider the commutativediagram (4.2) ( L ( Prob ⟨ G ⟩ , k ) , κ ) ( QL ϕ ( Prob ⟨ G ⟩ , k ) , κ )( Prob ⟨ G ⟩ , κ ) ( L ( Prob ⟨ G ⟩ , κ ) , ˆ κ ) ( QL ϕ ( Prob ⟨ G ⟩ , κ ) , ˆ κ ) ı ı f f ′ Then the following statements hold: (1)
The maps f, f ′ , ı are isometries. (2) The maps ı , , are isometric embeddings and each map has a denseimage in the corresponding target space. (3) The space in the lower-right corner, ( QL ϕ ( Prob ⟨ G ⟩ , κ ) , ˆ κ ) , is com-plete. ⊠ We would like to conjecture that all maps in the diagram above are isome-tries.Since QL ϕ ( Prob ⟨ G ⟩ , κ ) is complete and has L ( Prob ⟨ G ⟩ , κ ) as a densesubset for any ϕ >
0, it follows that QL ϕ ( Prob ⟨ G ⟩ , κ ) does not depend (upto isometry of pseudo-metric spaces) on the choice of admissible ϕ >
0. Fromnow on we will choose the particular function ϕ ( t ) ∶= t / . The choice will beclear when we formulate the Asymptotic Equipartition Property for diagrams.We may finally define the space of tropical G -diagrams , as the space in thelower-right corner of the diagram Prob [ G ] ∶= ( QL ϕ ( Prob ⟨ G ⟩ , κ ) , ⊗ , ⋅ , ˆ κ ) By Theorem 4.5 above, this space is complete.The entropy function
Ent ∗ ∶ Prob ⟨ G ⟩ → R G extends to a linear functional Ent ∗ ∶ Prob [ G ] → ( R G , ∥ ⋅ ∥ ) of norm one, defined by Ent ∗ ( ¯ X ) = lim n →∞ n Ent ∗ (X ( n )) AEP
Homogeneous diagrams. A G -diagram X is called homogeneous if theautomorphism group Aut (X ) acts transitively on every space in X . Homoge-neous probability spaces are uniform. For more complex indexing categoriesthis simple description is not sufficient. The subcategory of all homogeneous G -diagrams will be denoted Prob ⟨ G ⟩ h . This space is invariant under thetensor product, thus it is a metric Abelian monoid.5.1.1. Universal construction of homogeneous diagrams.
Examples of homo-geneous diagrams could be constructed in the following manner. Fix a finitegroup G and consider a G -diagram { H i ; α ij } i ∈ G of subgroups of G , where mor-phisms α ij are inclusions. The G -diagram of probability spaces { X i ; f ij } isconstructed by setting X i = ( G / H i , unif ) and taking f ij to be the natural pro-jection G / H i → G / H j , whenever H i ⊂ H j . The resulting diagram X will beminimal if and only if for any i, j ∈ G there is k ∈ G , such that H k = H i ∩ H j .In fact, any homogeneous diagram arises this way, see [MP18].5.2. Asymptotic Equipartition Property.
In [MP18] the following theo-rem is proven. Theorem 5.1.
Suppose
X ∈
Prob ⟨ G ⟩ is a G -diagram of probability spacesfor some fixed indexing category G . Then there exists a sequence ¯ H = (H n ) ∞ n = of homogeneous G -diagrams such that (5.1) 1 n k (X ⊗ n , H n ) ≤ C (∣ X ∣ , [[ G ]]) ⋅ √ ln nn where C (∣ X ∣ , [[ G ]]) is a constant only depending on ∣ X ∣ and [[ G ]] . ⊠ Defining
Prob [ G ] h ∶= QL ϕ ( Prob ⟨ G ⟩ h , κ ) the Asymptotic Equipartition Property can be reformulated as in the Theo-rem 5.2 below. Theorem 5.2.
For any indexing category G the image of the natural inclusion Prob [ G ] h ↪ Prob [ G ] is dense. ⊠ Proof:
By Theorem 5.1, every linear sequence can be approximated by ahomogeneous sequence. It follows from the bound (5.1) that the defect ofthe approximating homogeneous sequence is bounded by a constant times ϕ ,defined by ϕ ( t ) = t / . Moreover, the linear sequences are dense by Theorem4.5. This finishes the proof. ⊠ The tropical cone for probability spaces and chains
Although for general indexing categories G the space of tropical G -diagramsis infinite dimensional, it has a very simple, finite-dimensional description if G consists of a single object, or if it is a special type of indexing categories calleda chain .The chain of length k , denoted by C k , is the indexing category with k objects O , . . . , O k , and a morphism from O i to O j whenever i ≥ j . A C k -diagram ofprobability spaces is then a chain of reductions X k → X k − → ⋯ → X Recall that homogeneous probability spaces are (isomorphic to) probabilityspaces with a uniform distributions.
Homogeneous chains have a very simpledescription as well. A chain
H ∈
Prob ⟨ C k ⟩ is homogeneous if and only if theindividual probability spaces are homogeneous, i.e. if and only if the individ-ual probability spaces are (isomorphic to) probability spaces with a uniformmeasure.Based on this simple description we derive the following theorem. Theorem 6.1.
For k ∈ N , the tropical cone Prob [ C k ] is isomorphic to thefollowing cone in ( R k , ∣ ⋅ ∣ ) : ⎧⎪⎪⎪⎨⎪⎪⎪⎩⎛⎜⎝ x k ⋮ x ⎞⎟⎠ ∈ R k RRRRRRRRRRRRR ≤ x ≤ ⋅ ⋅ ⋅ ≤ x k ⎫⎪⎪⎪⎬⎪⎪⎪⎭ In particular, the algebraic structure and the pseudo-distance are preservedunder the isomorphism. ⊠ In case of single probability spaces, Theorem 6.1 is a direct consequence ofthe asymptotic equipartition property and the following lemma. For chains, asimilar argument works.
Lemma 6.2.
Denote by U n a finite uniform probability space of cardinality n ,then (6.1) k ( U n , U m ) ≤ + ∣ ln nm ∣ and (6.2) κ ( U n , U m ) = ∣ Ent ( U n ) − Ent ( U m )∣ ⊠ Proof:
We will construct a specific two-fan U n f ← U nm g → U m . Identify U (cid:96) with { , . . . , (cid:96) − } . Let k ∈ U nm . Then k can be written uniquely as { k = i ⋅ m + j with i ∈ U n , j ∈ U m k = i ⋅ n + j with i ∈ U m , j ∈ U n and we set f ( k ) ∶= i and g ( k ) ∶= i .Now that we have constructed a two-fan U n f ← U nm g → U m , let U n ← Z → U m be its minimal reduction. We estimate ∣ Z ∣ ≤ n + m , which implies that k ( U n , U m ) ≤ Ent ( Z ) − Ent ( U n ) − Ent ( U m )≤ ( n + m ) − ln n − ln m ≤ + { n, m } − ln n − ln m ≤ + ∣ ln nm ∣ thus establishing inequality (6.1).To show equality (6.2), recall that the entropy as a map is k -Lipschitz withLipschitz constant 1. Therefore, we have ∣ Ent ( U n ) − Ent ( U m )∣ ≤ k ( U n , U m ) ≤ ∣ Ent ( U n ) − Ent ( U m )∣ + κ ( U n , U m ) = lim (cid:96) →∞ (cid:96) k ( U (cid:96)n , U (cid:96)m ) = ∣ Ent ( U n ) − Ent ( U m )∣ ⊠ References [ABD +
08] Nihat Ay, Nils Bertschinger, Ralf Der, Frank G¨uttler, and Eckehard Olbrich.Predictive information and explorative behavior of autonomous robots.
The Eu-ropean Physical Journal B , 63(3):329–339, 2008.[A’C03] Norbert A’Campo. A natural construction for the real numbers. arXiv Mathe-matics e-prints , page math/0301015, Jan 2003.[BRO +
14] Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, J¨urgen Jost, and Nihat Ay.Quantifying unique information.
Entropy , 16(4):2161–2183, 2014.[dBE52] Nicolaas Govert de Bruijn and Paul Erd¨os. Some linear and some quadratic re-cursion formulas. ii.
Proceedings of the Koninklijke Nederlandse Akademie vanWetenschappen: Series A: Mathematical Sciences , 14:152–163, 1952.[Fri] Tobias Fritz. Resource efficiency and metric commutative monoids. in prepara-tion.[Fri09] Karl Friston. The free-energy principle: a rough guide to the brain?
Trends incognitive sciences , 13(7):293–301, 2009.[KSˇS12] Mladen Kovaˇcevi´c, Ivan Stanojevi´c, and Vojin ˇSenk. On the hardness of entropyminimization and related problems. In ,pages 512–516. IEEE, 2012.[KW13] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. arXivpreprint arXiv:1312.6114 , 2013.[Mat07] Frantisek Matus. Infinitely many information inequalities. In
Information Theory,2007. ISIT 2007. IEEE International Symposium on , pages 41–44. IEEE, 2007.[MP18] Rostislav Matveev and Jacobus W Portegies. Asymptotic dependency structureof multiple signals.
Information Geometry , 1(2):237–285, 2018.[MP19a] Rostislav Matveev and Jacobus W. Portegies. Conditioning in tropical probabilitytheory. arXiv e-prints , page arXiv:1905.05596, May 2019.[MP19b] Rostislav Matveev and Jacobus W. Portegies. Tropical probability theory andan application to the entropic cone. arXiv e-prints , page arXiv:1905.05351, May2019.[R˚ad52] Hans R˚adstr¨om. An embedding theorem for spaces of convex sets.
Proceedings ofthe American Mathematical Society , 3(1):165–169, 1952.[SA15] Bastian Steudel and Nihat Ay. Information-theoretic inference of common ances-tors.
Entropy , 17(4):2304–2327, 2015.[VDP13] Sander G Van Dijk and Daniel Polani. Informational constraints-driven organiza-tion in goal-directed behavior.
Advances in Complex Systems , 16(02n03):1350016,2013.[Vid12] Mathukumalli Vidyasagar. A metric between probability distributions on finitesets of different cardinalities and applications to order reduction.