aa r X i v : . [ qu a n t - ph ] F e b Noname manuscript No. (will be inserted by the editor)
From Thermodynamic Sufficiency to Information Causality
Peter Harremoës
Received: date / Accepted: date
Abstract
The principle called information causalityhas been used to deduce Tsirelson’s bound. In this pa-per we derive information causality from monotonic-ity of divergence and relate it to more basic principlesrelated to measurements on thermodynamic systems.This principle is more fundamental in the sense thatit can be formulated for both unipartite systems andmultipartite systems while information causality is onlydefined for multipartite systems. Thermodynamic suffi-ciency is a strong condition that put severe restrictionsto shape of the state space to an extend that we conjec-ture that under very weak regularity conditions it canbe used to deduce the complex Hilbert space formal-ism of quantum theory. Since the notion of sufficiencyis relevant for all convex optimization problems thereare many examples where it does not apply.
Keywords
Bregman divergence · multipartite system · information causality · thermodynamic sufficiency PACS · · Mathematics Subject Classification (2010) · Entanglement is a resource that may allow agents tosolve certain game problems in a more efficient way
P. HarremoësCopenhagen Business CollegeNørre Voldgade 34København KDenmarkTel.: +45-39564171E-mail: [email protected] than what is possible without entanglement. Such taskscould be solved even more efficiently if the agents hadaccess to a fictive resource called PR-boxes. Such boxescannot be used for signaling, but they can create cor-relations that are stronger than the correlations thatcan be created using entanglement. To be more precise,all quantum mechanical correlations satisfy Tsirelson’sbound while PR-boxes can violate Tsirelson’s bound.The goal is to explain Tsirelson’s bound and otherbounds on correlations from more basic physical princi-ples. One such principle is called information causality,and it may be formulated as “one bit of communicationcannot create more than one bit of correlation”. In [16]this principle was introduced and it was proved that itcan be used to derive Tsirelson’s bound. In [16] infor-mation causality was formulated and derived from theexistence of the function conditional mutual informa-tion that is assumed to satisfy some basic properties.In [17] two ways of defining entropy were specified, andthey were used to formulate the principle of informationcausality.In this paper use properties of Bregman divergencesrather than entropy or mutual information as the ba-sic principle. These divergences have several advantagescompared with entropy and mutual information.To each convex optimization problem one can asso-ciate a Bregman divergence. If the optimization prob-lem is energy extraction in thermodynamics the Breg-man divergence is proportional to quantum relative en-tropy that has some very desirable properties. Theseproperties may be violated if one looks at different op-timization problems. Therefore one may ask what isso special about energy extraction in thermodynamics,but this important problem will not be covered in thepresent paper. One advantage of studying divergence(and entropy) rather than conditional mutual informa-
P. Harremoës tion is that divergence and its properties can be studiedfor unipartite systems while conditional mutual infor-mation only makes sense for multipartite systems. Thisis important because we do not have a canonical wayof forming product spaces in generalized probabilistictheories. Bregman divergences with nice properties canbe defined on Jordan algebras and the existence of anice Bregman divergence rule out most other convexbodies as potential state spaces. Finally, both entropyand conditional mutual information may be consideredas derived concepts based on divergence. This aspectwill be the focus of the present paper.The paper is organized as follows. In Section 2 wespecify concepts like state space and measurement andwe fix notation. Jordan algebras and their most impor-tant properties are described in Section 3. In Section 4 itis proved that several different ways of defining entropycoincide for Jordan algebras. Bregman divergences andtheir relation to optimization are described in Section5. Several conditions related to the notion of sufficiencyare defined. For Jordan algebras these conditions areequivalent and the Bregman divergence is generated bythe entropy function. In Section 6 we define conditionalmutual information based on a Bregman divergence andwe demonstrate that the conditional mutual informa-tion has the properties that are needed for informationcausality to be satisfied. We conclude with Section 7 wesummarize our results and state some open problems.
Let P denote a set of preparations of a physical experi-ment. A mixed preparation is a formal mixture P s i · p i where p i are preparations and ( s i ) i is a probability vec-tor. The mixture P s i · p i is identified with the prepara-tion where p i is chosen with probability s i . A measure-ment m maps each preparation in P into a probabilitymeasure on the set of possible outcomes of the exper-iment. We assume that m ( P s i · p i ) = P s i · m ( p i ) . Let M denote the set of measurements that can beperformed by an observer (or a group of observers). If m ( p ) = m ( p ) for all measurements m ∈ M then wesay that p and p represent the same state . The set ofstates is called the state space, and with this Bayesiandefinition of a state the state space will depend on theset of feasible measurements. In particular, the statespaces of two different observers may be different be-cause they may have different sets of measurements. Agroup of observers may have a different state space thanany of the individual observers because the set of jointmeasurements may be larger than the set of measure-ments that can be performed by any of the individualobservers. For simplicity we will assume that the state spacesare convex bodies Ω , i.e. convex compact sets spannedby finitely many elements. The extreme point are called pure states . Any convex body can be embedded in thepointed cone Ω + consisting of formal products t · σ where σ is a state and t is a positive real number calledthe trace of t · σ . The notation is tr ( t · σ ) = t. Theelements in the cone are called positive operators orun-normalized states. The cone is called the state cone .Positive elements can be added by t · σ + t · σ = ( t + t ) · (cid:18) t t + t σ + t t + t σ (cid:19) . The state cone spans a partially ordered vector space V Ω and the trace extends linearly to V Ω . Thus, the statesmay be considered as positive elements of an orderedvector space with trace 1.Let m ∈ M denote a measurement with values v in some set V . If σ is a state then the measurement isgiven by a probability measure m ( σ ) over V . Thus foreach v ∈ V we have a probability m ( σ ) ( v ) ∈ [0 , . For each v the measurement m maps Ω into [0 , andsuch a mapping is called a test and it is an element in Ω ∗ + , i.e. the dual cone of the positive elements. In theliterature on generalized probabilistic theories a test isoften called an effect, but in this paper it is called atest, which is the well established in the statistical lit-erature. The test that maps x ∈ V Ω into λ tr ( x ) willbe denoted λ. In particular the test maps Ω into . Since the total probability of a measurement is 1 wehave P v m ( · ) ( v ) = 1 . A measurement can be repre-sented as a test valued measure. In the Hilbert spaceformalism the tests are given by positive operators andthe measurements are given by positive operator val-ued measures (POVM). We say that two states ρ and σ are mutually singular if there exists a test φ such that φ ( ρ ) = 0 and φ ( σ ) = 1 . Let m , m ∈ M with values in V and V . If M : V → V is some map such that m ( · ) ( v ) = X v : M ( v )= v m ( · ) ( v ) then the measurement m is at least as informativeabout the state as m , and m is called a fine-graining of m . If m ( · ) ( v ) ∝ m ( · ) ( v ) for all values v for which M ( v ) = v , then the fine-graining is said to be trivial . A measurement is fine-grained if all fine-grainings are trivial. Note that a mea-surement m is fine grained if all tests m ( · ) ( v ) lie onextreme rays of Ω ∗ + . Therefore any measurement has a rom Sufficiency to Causality 3 fine-graining that is fine grained when the state space Ω is a convex body.Let Ω and Ω denote two state spaces. An affinemap Φ : Ω → Ω is called and affinity . Let S : Ω → Ω and R : Ω → Ω denote affinities. If R ◦ S = id Ω then S is called a section and R is called a retraction . A frame is a section S : Ω → Ω where Ω is a simplex.Let Ω denote the state space of a group of observers.The set of measurements M A of a single observer Al-ice is a subset of the set of all measurements M ofthe whole group of observers. Therefore the there isa surjective affinity E A : Ω → Ω A . Assume that Al-ice and Bob are observers that can perform measure-ments independently. Further assume that the choiceof measurement made by Alice does not influence theoutcome of a measurement made by Bob and that achoice of measurement made by Bob does not influencethe outcome of a measurement made by Alice. This iscalled the no-signaling condition . If Alice performs themeasurement m A and Bob performs the measurement m B , then the joint measurement is denoted m A ⊗ m B .Further assume that Alice and Bob can communicate.Then Alice and Bob can perform any measurement ofthe form P s i · m A ⊗ m B . If Alice and Bob together canonly perform measurements of the form P s i · m A ⊗ m B their joint state space is a subset of V Ω A ⊗ V Ω B . Assumefurther that Alice and Bob can prepare states individu-ally. If Alice prepares the state σ A and Bob prepares thestate σ B then their joints state is σ A ⊗ σ B ∈ V Ω A ⊗ V Ω B . The convex hull of { σ A ⊗ σ B | σ A ∈ Ω A and σ B ∈ Ω B } is denoted Ω A ⊗ min Ω B and the elements are called sep-arable states. We assume that Ω A ⊗ min Ω B ⊆ Ω . Here we will recall some fact and concepts related toJordan algebras. A more detailed exposition can befound in [14,2]. In the Hilbert space formalism of quan-tum physics the states are represented as density ma-trices on a complex Hilbert space. Classical probabil-ity distributions can be identified with density matricesthat are diagonal. In the set of self adjoint matrices onemay define a product • by A • B = 12 ( AB + BA ) . This product makes the set of Hermitean matrices intoan algebra over the real numbers and the product • satisfies A • ( B • ( A • A )) = ( A • B ) • ( A • A ) . (1)With this equation fulfilled it is possible to define A n = A • A • . . . • A without specifying where the parenthesis have to be placed. Further we have that X i A i = 0 (2)if and only if A i = 0 for all i. The dimension of the al-gebra is defined as the dimension of the Jordan algebraas a real vector space. A finite dimensional algebra overthe real numbers with a product • satisfying the prop-erties (1) and (2) is called an Euclidean Jordan algebra .Elements in an Euclidean Jordan algebra of the form A • A are called positive elements and they form apointed cone. Further, an Euclidean Jordan algebra hasa trace tr that maps positive elements into positivenumbers and such tr (( A • B ) • C ) = tr ( A • ( B • C )) . A state in a Jordan algebra is a positive element of trace1. The rank of a Jordan algebra is the Caratheodoryrank of the state space of algebra. An Euclidean Jordanalgebra has an inner product defined by h A, B i = tr ( A • B ) . With this inner product the positive cone becomes selfdual .An element E of a Jordan algebra is idempotent if E = E. Elements A and B are orthogonal if A • B = 0 . With these definitions any element A has a spectraldecomposition A = X λ i E i where E i are orthogonal idempotent. If the spectral val-ues λ i are different, the decomposition is unique. There-fore one can define f ( A ) = X f ( λ i ) E i . The associative Euclidean Jordan algebras corre-spond to classical probability theory, where the statespace is a simplex. Any Euclidean Jordan algebra J can be written as a direct sum L J i of Jordan algebraswhere each of the Jordan algebras J i is simple. Thesimple Euclidean Jordan algebras belong to one of thethe following five types. – M n ( R ) Real valued Hermitean n × n matrices. – M n ( C ) Complex valued Hermitean n × n matrices. – M n ( H ) Quaternionic valued Hermitean n × n ma-trices. – M ( O ) Octonionic valued Hermitean × matrices. – Jspin ( d ) Spin factors where the state space has theshape of a d -dimensional solid ball. P. Harremoës
The Jordan algebra M ( O ) is called the exceptionalJordan algebra and Jordan algebras that does not con-tain such an exceptional component are called specialJordan algebras . All special Jordan algebras appear assections of M n ( C ) for some value of n . In this sense allspecial Jordan algebras have representations as physicalsystems. If a section of the set of complex valued Her-mitean matrices is required to be completely positivethen the section can be represented as a set of complexvalued Hermitean matrices.It is an important question why exactly the com-plex valued Hermitean matrices are so good in modelingquantum physics compared with the other simple Jor-dan algebras. Actually Adler has attempted to modelquantum theory using quaternions [1], and there havebeen a number of attempts to let the exceptional Jordanalgebra play an active role in modeling physics [6,13].One important property that single out the complexvalued Hermitean matrices is that there is a canonicaltensor product construction within the category of com-plex valued Hermitean matrices with completely posi-tive maps as morphisms [3]. Example 1
Assume that the whole state space Ω can berepresented as real non-negative definite × matriceswith trace 1. The dimension of this state space is 9. Let A and B denote a × real Hermitean matrices. Then A ⊗ B can embedded in Ω as (cid:18) a a a a (cid:19) ⊗ (cid:18) b b b b (cid:19) = a b a b a b a b a b a b a b a b a b a b a b a b a b a b a b a b . The vector space of Hermitean × matrices has di-mension 3. Therefore the tensor product has dimension9. Hence the set of tensors with trace 1 has dimen-sion 8, so it has a lower dimension than set of stateson the whole space. Therefore there are joint states onthe whole space that cannot be distinguished by localmeasurements. Hence the tomography condition is notfulfilled.There are a number of ways to characterize Jordan al-gebras. Above we have defined the Jordan algebras al-gebraically. A classic result is that a real vector spacewith a self-dual homogeneous cone can be representedas a Jordan algebra [11]. A new result is that a statespace that is spectral and where any pair of frames canbe mapped into each other, can be represented by aJordan algebra [4].For Jordan algebras it is possible to define a well-behaved entropy function and an associated divergence function. In [10] it was proved that if a state space hasrank 2 and it has a monotone Bregman divergence thenit can be represented as a Jordan algebra (spin fac-tor). Similar representation theorems for state spacesof higher rank are not yet available, so in this paper wefocus on other consequences of the existence of entropyfunction or Bregman divergences. In generalized probabilistic theories there are two waysof defining entropy [17]. The decomposition entropy ofa state σ is given by ˘ H ( σ ) = inf P p i · σ i = σ H (( p i ) i ) . Here the infimum is taken over all mixtures P p i · σ i = σ where σ i are pure states and H (( p i ) i ) denotes the Shan-non entropy of the probability vector ( p i ) i . Versions ofthis definition can also be found in [8], but they datesback to [18]. Note that the definition of spectral entropyin [12] is closely related but slightly different.Following [17] one can define the fine grained en-tropy of a state in a generalized probabilistic theory by ˆ H ( σ ) = inf m H ( m ( σ )) where the infimum has been taken over all fine grainedmeasurements m on Ω . This fine grained entropy is astrictly concave function. Lemma 1
If the state space Ω is spectral a decomposi-tion that minimizes the decomposition entropy is spec-tral.Proof This was essentially proved in [8] although theterminology regarding spectrality was slightly different. ⊓⊔ Theorem 1
If the state space Ω is spectral then forany state σ the following inequality holds ˆ H ( σ ) ≤ ˘ H ( σ ) . Proof
Let σ = P p i σ i be a decomposition of σ wherethe states σ i are pure. To this decomposition there cor-responds a measurement m such that m ( σ ) ( i ) = p i . Since this measurement is fine grained we have ˆ H ( σ ) ≤ H ( m ( σ )) = H (( p i ) i ) . Therefore ˆ H ( σ ) ≤ inf P p i σ i = σ H (( p i ) i ) = ˘ H ( σ ) . ⊓⊔ rom Sufficiency to Causality 5 Theorem 2
If the state space Ω is spectral and thecone Ω + is self dual then ˆ H ( σ ) = ˘ H ( σ ) = - h σ, ln ( σ ) i . (3) Proof
Let M denote a fine grained measurement. Themeasurement is given by a positive test valued measure,i.e. there exists ρ j ≥ such that P ρ j = 1 and such that M ( ρ ) ( j ) = h ρ j , ρ i . Since the measurement is fine grained ρ j must be states.Thus, M ( σ ) ( j ) = h ρ j , σ i = * ρ j , X i p i σ i + = X i p i h ρ j , σ i i . If ˜ σ is the state X i r · σ i then M (˜ σ ) = * ρ j , X i r · σ i + j = 1 r h ρ j , i j = 1 r . the Markov kernel ( p i ) i → P i p i h ρ j , σ i i j maps theuniform distribution (cid:0) r (cid:1) i into the uniform distribution (cid:0) r (cid:1) i , i.e. the Markov kernel is bi-stochastic. Since bi-stochastic Markov kernels increase entropy we have H ( M ( σ )) = H (cid:16) h ρ j , σ i j (cid:17) ≥ H (( p i ) i ) = - h σ, ln ( σ ) i . Therefore- h σ, ln ( σ ) i ≤ ˆ H ( σ ) . (4)Now the result is obtained by combining Lemma 1 andTheorem 1 with inequality (4). Definition 1
The entropy H of a state σ in a Jordanalgebra is given as the common value of any of the ex-pressions given in Equation (3). Corollary 1
In a finite Euclidean Jordan algebra theentropy - h σ, ln ( σ ) i is a concave function.Proof Concavity of H follows because H equals the finegrained entropy and the fine grained entropy is concave[17]. ⊓⊔ Concavity of the entropy function H on Jordan al-gebras was proved in [8] with a more involved proof. We consider a optimization problem where we want tooptimize some quantity defined on the state space. Inthermodynamics the goal is typically to extract energyfrom the system by some feasible interaction with thesystem. Our approach makes sense for any convex op-timization problem and in principle the function mayrepresent other objectives such as the amount of moneyone may obtained by trading or the code length that isobtained after using a certain data compression proce-dure. Various examples of such optimization problemsare given in [7]. In this paper the objective function willbe energy.Assume that the system is in state ρ ∈ Ω and thatwe apply some action a from a set of feasible actions A .Then the mean energy that we extract will be denoted h a, ρ i and it is an affine function of the state ρ. An action a will be identified with this function ρ → h a, ρ i so thatthe actions are considered as elements in the dual spaceof the state space. We can define the free energy of state ρ as F ( ρ ) = sup a ∈A h a, ρ i . In thermodynamics Helmholz free energy is given as F = U − T S so that the free energy is an affine func-tion minus a term that is proportional to the entropyfunction. Then F is a convex function of ρ. The regretof doing action a if the state is ρ is defined as D F ( ρ, a ) = F ( ρ ) − h a, ρ i . The interpretation of the regret function is as follows.Assume that the system is in state ρ but one uses a sub-optimal action a. Then the regret measures the differ-ence between the energy that one could have extracted F ( ρ ) and the energy that one extracts using action a. For simplicity we will assume that F is differentiable sothat to each state ρ there exists a unique action a ρ suchthat F ( ρ ) = h a, ρ i . For states ρ, σ ∈ Ω the Bregmandivergence is defined as D F ( ρ, σ ) = D F ( ρ, a σ ) . It measures the regret of acting as if the state were σ ifit actually is ρ. The Bregman divergence is given by D F ( ρ, σ ) = F ( ρ ) − (cid:18) F ( σ ) + dd t F ((1 − t ) σ + tρ ) | t =0 (cid:19) . P. Harremoës
The formula for the Bregman divergence is often writ-ten in terms of the gradient. D F ( ρ, σ ) = F ( ρ ) − ( F ( σ ) + h ∇ F ( σ ) | ρ − σ i ) . Proposition 1 ([8, Lemma 17])
For Hermitean ma-trices A and B we have dd t (tr ( f ( A + tB ))) | t =0 = h f ′ ( A ) , B i . Example 2
Assume that the state space can be repre-sented as a state space of a Jordan algebra. Let F ( σ ) = h σ, ln ( σ ) i denote the negative of the entropy. The Breg-man divergence corresponding to F can be computedas D F ( ρ, σ ) = F ( ρ ) − (cid:26) F ( σ ) + dd t F ((1 − t ) σ + tρ ) | t =0 (cid:27) = h ρ, ln ( ρ ) i− {h σ, ln ( σ ) i + h ln ( σ ) + 1 , ρ − σ i} = h ρ, ln ( ρ ) − ln ( σ ) i − tr ( ρ − σ ) . (5)We call this quantity the information divergence and denote it as D ( ρ k σ ) . Note that the last term vanishif ρ and σ are states. If the Jordan algebra is associativewe get Kulback-Leibler divergence given by D ( P k Q ) = X p i ln p i q i . If the Jordan algebra is a C ∗ -algebra F is minus the vonNeumann entropy the information divergence equalsquantum information divergence (quantum relative en-tropy) given by D ( ρ k σ ) = tr ( ρ (ln ρ − ln σ )) . There are a number of conditions that some regret func-tions and Bregman divergences may have.
Definition 2
The Bregman divergence D F is mono-tone if D F (Φ ( ρ ) , Φ ( σ )) ≤ D F ( ρ, σ ) for any affinity Φ : Ω → Ω .We note that monotonicity is associated with the de-crease of free energy for a closed thermodynamic sys-tem. It is possible to define the regret D F ( ρ, σ ) even ifthe function F is not differentiable, but if such a regretfunction is monotone then F is automatically differen-tiable [7]. In the rest of this paper we shall focus entirelyon the case when F is differentiable and the regret be-tween states is given by the Bregman divergence. Theorem 3
Information divergence is monotone onspecial Jordan algebras. Proof
Let Ω denote the state space of a special Jordanalgebra. Then there exists a section S : Ω → M n ( C ) with a corresponding retraction R : M n ( C ) → Ω . Let
Φ : Ω → Ω denote some affinity. Then S ◦ Φ ◦ R is anaffinity M n ( C ) → M n ( C ) . Then D ( Φ ( ρ ) k Φ ( σ )) = D ( S (Φ ( ρ )) k S (Φ ( σ )))= D ( ( S ◦ Φ ◦ R ) ( S ( ρ )) k ( S ◦ Φ ◦ R ) ( S ( σ ))) ≤ D ( S ( ρ ) k S ( σ )) = D ( ρ k σ ) . Here we have used that information divergence is mono-tone on M n ( C ) [15]. ⊓⊔ It is not known if information divergence is mono-tone on the exceptional Jordan algebra. Let ρ θ denote afamily of states and let Φ denote an affinity Φ : Ω → Ω .Then Φ is said to be sufficient for ρ θ if there existsa recovery map Ψ : Ω → Ω , i.e. an affinity such that Ψ (Φ ( ρ θ )) = ρ θ . Definition 3
A Bregman divergence D F is said to sat-isfy sufficiency if D F (Φ ( ρ ) , Φ ( σ )) = D F ( ρ, σ ) when-ever Φ is sufficient for ρ, σ. It is easy to prove that monotonicity implies sufficiency.Further it is easy to prove that sufficiency implies theproperty called statistical locality as defined below.
Definition 4
A Bregman divergence D F satisfies sta-tistical locality if ρ ⊥ σ i implies D F ( ρ, (1 − t ) · ρ + t · σ ) = D F ( ρ, (1 − t ) · ρ + t · σ ) . Proposition 2
In an Euclidean Jordan algebra Infor-mation divergence satisfies statistical locality.Proof
Assume that ρ, σ , and σ are states and that ρ ⊥ σ i . Then D ( ρ k (1 − t ) · ρ + t · σ ) = h ρ, ln ( ρ ) − ln ((1 − t ) · ρ + t · σ ) i = h ρ, ln ( ρ ) − ln ((1 − t ) · ρ ) i = - ln (1 − t ) . ⊓⊔ Theorem 4
If the state space Ω can be represented asthe state space of a Jordan algebra of rank at least 3then a statistically local Bregman divergence D F is pro-portional to information divergence given by Equation(5). There exists a constant c > such that the function F equals c · h ρ, ln ρ i plus an affine function on Ω .Proof The theorem was proved for finite C ∗ -algebras in[7], but the proof is the same for more general Jordanalgebras. ⊓⊔ The theorem implies under certain conditions thefollowing conditions are equivalent rom Sufficiency to Causality 7 – Monotonicity, – Sufficiency – Statistical locality – The Bregman divergence is proportional to informa-tion divergence. – The objective function F is proportional to entropyplus an affine function.If the state space has rank 2 these conditions are notequivalent and this special case was studied in greatdetail in [10]. Consider a bipartite system with Alice and Bob as ob-servers. We assume the no-signaling condition and lo-cal tomography are fulfilled so that a joint state canbe described as an element in the tensor product of lo-cal vector spaces. Let U A and U b denote order units ofAlice and Bob.Let F denote some payoff function on a joint systemwith regret function D F . We will assume that the regretfunction D F satisfies monotonicity. Then F is differen-tiable and D F is a Bregman divergence. Therefore D F is given by D F ( ρ, σ ) = F ( ρ ) − ( F ( σ ) + h ∇ F ( σ ) | ρ − σ i ) . The following proposition is well-known if the affinecombination is a convex combination.
Proposition 3 If P i t i = 1 and the affine combina-tion ¯ ρ = P i t i · ρ i is a state then the Bregman identityholds: X i t i · D F ( ρ i , σ ) = X i t i · D F ( ρ i , ¯ ρ ) + D F (¯ ρ, σ ) . (6) Proof
We expand the right hand side of (6) and get X i t i · D F ( ρ i , ¯ ρ ) + D F (¯ ρ, σ )= X i t i · ( F ( ρ i ) − ( F (¯ ρ ) + h ∇ F (¯ ρ ) | ρ i − ¯ ρ i ))+ F (¯ ρ ) − ( F ( σ ) + h ∇ F ( σ ) | ¯ ρ − σ i ) . We can re-arrange the terms and use that ¯ ρ = X i t i · ρ i to get X i t i · F ( ρ i ) − X i t i · F (¯ ρ )+ * ∇ F (¯ ρ ) | X i t i · ρ i − ¯ ρ +! + F (¯ ρ ) − ( F ( σ ) + h ∇ F ( σ ) | ¯ ρ − σ i )= X i t i · F ( ρ i ) − ( F (¯ ρ ) + h ∇ F (¯ ρ ) | ¯ ρ − ¯ ρ i )+ F (¯ ρ ) − ( F ( σ ) + h ∇ F ( σ ) | ¯ ρ − σ i ) . Therefore the right hand side of Equation (6) reducesto X i t i · F ( ρ i ) − ( F ( σ ) + h ∇ F ( σ ) | ¯ ρ − σ i )= X i t i · ( F ( ρ i ) − ( F ( σ ) + h ∇ F ( σ ) | ρ i − σ i ))= X i t i · D F ( ρ i , σ ) , which is the left hand side of Equation (6) and thiscompletes the proof. ⊓⊔ Theorem 5
Assume that Ω ⊂ V A ⊗ V B . If ρ , ρ ∈ Ω A and σ , σ ∈ Ω B and D F satisfies sufficiency then D F ( ρ ⊗ σ , ρ ⊗ σ ) = D F ( ρ ⊗ σ , ρ ⊗ σ ) . Proof
To see this define
Φ ( π ) = E A ( π ) ⊗ σ , Ψ ( π ) = E A ( π ) ⊗ σ . Then
Φ ( ρ i ⊗ σ ) = ρ i ⊗ σ , Ψ ( ρ i ⊗ σ ) = ρ i ⊗ σ . The result is obtained by sufficiency of D F . ⊓⊔ If ρ , ρ ∈ Ω A we may write D F ( ρ , ρ ) as an ab-breviation for D F ( ρ ⊗ σ, ρ ⊗ σ ) where some arbitrarystate σ ∈ Ω B is used. Definition 5
Let σ denote a state on a system witha bipartite subsystem composed of subsystems labeled A and B . Then the mutual information between thesubsystem A and subsystem B is defined as I σ ( A ; B ) = D F ( σ AB , σ A ⊗ σ B ) . (7) Theorem 6
If the Bregman divergence D F is mono-tone then mutual information satisfies the following twoconditions. Consistency
If the system has a bipartite subsys-tem consisting of two classical subsystems A and B thenthe mutual information restricted to the bipartite sub-system is proportional to classical mutual information. Data processing inequality If Φ : V B → V B is apositive trace conserving affinity then I σ ( A ; B ) ≥ I ( id ⊗ Φ)( σ ) ( A ; B ) . Proof
Consistency
If the subsystems defined by Aliceand Bob are classical and non-trivial then the rank oftheir joint state space is at least × . When therank of the state space is least 3 the function F is alinear function of the Shannon entropy and therefore P. Harremoës the mutual information defined by (7) is proportionalto the classical mutual information.
Data processing inequality
Assume that
Φ : V B → V B is a positive trace conserving affinity. Then ˜Φ = id ⊗ Φ is given by ˜Φ ( σ A ⊗ σ B ) = σ ⊗ Φ ( σ B ) and I σ ( A ; B ) = D F ( σ AB , σ A ⊗ σ B ) ≤ D F (cid:16) ˜Φ ( σ AB ) , ˜Φ ( σ A ⊗ σ B ) (cid:17) = D F (cid:16) ˜Φ ( σ AB ) , σ A ⊗ Φ ( σ B ) (cid:17) = I ˜Φ( σ ) ( A ; B ) , which completes the proof. ⊓⊔ In probability theory one may define entropy as selfinformation via H ( A ) = I ( A, A ) . This is not possible in quantum theory because the dif-ferent sub-spaces in a tensor product decompositionhave to be distinct. In probability theory this is nota problem and cloning is allowed i.e. one is allowed toform identical copies a state. In probability theory onegets H ( AB ) = I ( AB, AB )= I ( A, AB ) + I ( B, AB | A ) ≥ I ( A, AB )= I ( A, A ) + I ( A, B | A ) ≥ I ( A, A )= H ( A ) . Therefore in probability theory the entropy of a sub-system is less than the entropy of the full system.
Definition 6
A Bregman divergence D F on a bipartitesystem is additive if D F ( ρ A ⊗ ρ B , σ A ⊗ σ B ) = D F ( ρ A , σ A ) + D F ( ρ B , σ B ) . Theorem 7
If the state spaces Ω A and Ω B can be rep-resented as state spaces of Jordan algebras J A and J B ,and if D F satisfies sufficiency then D F is additive.Proof Let c A and c B denote distributions that maxi-mize the fine grained entropy distributions in each ofthe algebras. Then D F equals D ˜ F where ˜ F ( σ ) = D F ( σ, c A ⊗ c B ) . Let ρ A and ρ B denote states in the state spaces Ω A and Ω B . Then ρ A and ρ B generate associative sub-algebras A A ⊆ J A and A B ⊆ J B with classical state spaces. Now the restriction of D F to A A ⊗ A B satisfies suffi-ciency and according to Theorem 4 D F is proportionalto information divergence. Therefore D F ( ρ A ⊗ ρ B , c A ⊗ c B ) = D F ( ρ A , c A ) + D F ( ρ B , c B ) because information divergence is additive on classicalstate spaces. Define ˜ F A ( ρ A ) = D F ( ρ A , c A ) , ˜ F B ( ρ B ) = D F ( ρ B , c B ) . With this notation ˜ F ( ρ A ⊗ ρ B ) = ˜ F A ( ρ A ) + ˜ F B ( ρ B ) . Thus D F ( ρ A ⊗ ρ B , σ A ⊗ σ B ) = ˜ F ( ρ A ⊗ ρ B ) − ˜ F ( σ A ⊗ σ B ) + D ∇ ˜ F ( σ A ⊗ σ B ) (cid:12)(cid:12)(cid:12) ρ A ⊗ ρ B − σ A ⊗ σ B E ! = ˜ F A ( ρ A ) + ˜ F B ( ρ B ) − ˜ F A ( σ A ) + ˜ F B ( σ B ) + D ∇ ˜ F A ( σ A ) + ∇ ˜ F B ( σ B ) (cid:12)(cid:12)(cid:12) ρ A ⊗ ρ B − σ A ⊗ σ B E ! = ˜ F A ( ρ A ) − ˜ F ( σ A ) + D ∇ ˜ F A ( σ A ) (cid:12)(cid:12)(cid:12) ρ A ⊗ ρ B − σ A ⊗ σ B E ! + ˜ F B ( ρ B ) − ˜ F B ( σ B ) + D ∇ ˜ F B ( σ B ) (cid:12)(cid:12)(cid:12) ρ A ⊗ ρ B − σ A ⊗ σ B E ! = D F ( ρ A , σ A ) + D F ( ρ B , σ B ) . ⊓⊔ Example 3
If tensor products of × Hermitean ma-trices are embedded in Hermitean × matrices as inExample 1 then mutual information is additive. Lemma 2
An additive monotone Bregman divergencesatisfies the following identity D F ( σ AB , ρ A ⊗ ρ B ) = D F ( σ AB , σ A ⊗ ρ B )+ D F ( σ A , ρ A ) . (8 )Proof Any state σ AB can be written as an affine com-bination of tensor products σ AB = P t i · π A,i ⊗ π B,i . Then D F ( σ AB , ρ A ⊗ ρ B ) = X t i · D F ( π A,i ⊗ π B,i , ρ A ⊗ ρ B ) − X t i · D F ( π A,i ⊗ π B,i , σ AB ) . rom Sufficiency to Causality 9 Using additivity it can be rewritten as D F ( σ AB , ρ A ⊗ ρ B )= X t i · ( D F ( π A,i , ρ A ) + D F ( π B,i , ρ B )) − X t i · D F ( π A,i ⊗ π B,i , σ AB )= X t i · D F ( π A,i , ρ A ) + X t i · D F ( π B,i , ρ B ) − X t i · D F ( π A,i ⊗ π B,i , σ AB ) . The Bregman identity (6) gives D F ( σ AB , ρ A ⊗ ρ B ) = X t i · D F ( π A,i , σ A ) + D F ( σ A , ρ A )+ X t i · D F ( π B,i , ρ B ) − X t i · D F ( π A,i ⊗ π B,i , σ AB ) . This can be re-arranged as D F ( σ AB , ρ A ⊗ ρ B )= X t i · ( D F ( π A,i , σ A ) + D F ( π B,i , ρ B )) − X t i · D F ( π A,i ⊗ π B,i , σ AB ) + D F ( σ A , ρ A ) . Now additivity leads to D F ( σ AB , ρ A ⊗ ρ B ) = X t i · D F ( π A,i ⊗ π B,i , σ A ⊗ ρ B ) − X t i · D F ( π A,i ⊗ π B,i , σ AB ) + D F ( σ A , ρ A )= D F ( σ AB , σ A ⊗ ρ B ) + D F ( σ A , ρ A ) . ⊓⊔ Definition 7
We define the conditional mutual infor-mation on a tripartite system as I σ ( A ; B | C ) = D F ( σ ABC , σ A ⊗ σ B ⊗ σ C ) − D F ( σ AC , σ A ⊗ σ C ) − D F ( σ BC , σ B ⊗ σ C ) . In our definition of conditional mutual informationthe subsystems
A, B, and C should be distinct so thatthe tensor products are defined. If the state space is asimplex, i.e. the system is classical, then one may allowthe subsystems to overlap. Definition 8
A function I σ on a multipartite system iscalled a separoid function [5,9] if it satisfies the followingthree properties: Positivity I σ ( A ; B | C ) ≥ . Symmetry I σ ( A ; B | C ) = I σ ( B ; A | C ) . Chain rule I σ ( A ; BC | D ) = I σ ( A ; B | D )+ I σ ( A ; C | BD ) . (9) Theorem 8
Assume that D F is a monotone and addi-tive Bregman divergence. Then conditional mutual in-formation is a separoid function.Proof Positivity
Conditional mutual information canbe rewritten as I σ ( A ; B | C ) = D F ( σ ABC , σ A ⊗ σ B ⊗ σ C ) − D F ( σ AC , σ A ⊗ σ C ) − D F ( σ BC , σ B ⊗ σ C )= D F ( σ ABC , σ B ⊗ σ AC ) − D F ( σ BC , σ B ⊗ σ C )= D F ( σ ABC , σ B ⊗ σ AC ) − D F ( σ A ⊗ σ BC , σ A ⊗ σ B ⊗ σ C ) . Let Φ denote the affinity Φ ( ρ ) = σ A ⊗ E BC ( ρ ) . Then Φ ( σ ABC ) = σ A ⊗ σ BC , Φ ( σ B ⊗ σ AC ) = σ A ⊗ σ B ⊗ σ C . Therefore monotonicity implies that I σ ( A ; B | C ) can-not be negative. Symmetry
It follows directly from the definitionthat conditional mutual information is symmetric.
Chain rule
To prove the chain rule we expand theleft hand side of Equation (9) as I σ ( A ; BC | D ) = D F ( σ ABCD , σ A ⊗ σ BC ⊗ σ D ) − D F ( σ AD , σ A ⊗ σ D ) − D F ( σ BCD , σ BC ⊗ σ D ) . Next we use Equation (8) to get I σ ( A ; BC | D )= (cid:18) D F ( σ ABCD , σ A ⊗ σ B ⊗ σ C ⊗ σ D ) − D F ( σ BC , σ B ⊗ σ C ) (cid:19) − D F ( σ AD , σ A ⊗ σ D ) − (cid:18) D F ( σ BCD , σ B ⊗ σ C ⊗ σ D ) − D F ( σ BC , σ B ⊗ σ C ) (cid:19) . The left hand side reduces to I σ ( A ; BC | D ) = D F ( σ ABCD , σ A ⊗ σ B ⊗ σ C ⊗ σ D ) − D F ( σ AD , σ A ⊗ σ D ) − D F ( σ BCD , σ B ⊗ σ C ⊗ σ D ) . (10)Similarly, we expand the right hand side of Equation(9) as I σ ( A ; B | D ) + I σ ( A ; C | BD )= D F ( σ ABD , σ A ⊗ σ B ⊗ σ D ) − D F ( σ AD , σ A ⊗ σ D ) − D F ( σ BD , σ B ⊗ σ D ) + D F ( σ ABCD , σ A ⊗ σ C ⊗ σ BD ) − D F ( σ ABD , σ A ⊗ σ BD ) − D F ( σ BCD , σ C ⊗ σ BD ) . We use Equation (8) to re-write the three last terms as I σ ( A ; B | D ) + I σ ( A ; C | BD )= D F ( σ ABD , σ A ⊗ σ B ⊗ σ D ) − D F ( σ AD , σ A ⊗ σ D ) − D F ( σ BD , σ B ⊗ σ D )+ (cid:18) D F ( σ ABCD , σ A ⊗ σ B ⊗ σ C ⊗ σ D ) − D F ( σ BD , σ B ⊗ σ D ) (cid:19) − (cid:18) D F ( σ ABD , σ A ⊗ σ B ⊗ σ D ) − D F ( σ BD , σ B ⊗ σ D ) (cid:19) − (cid:18) D F ( σ BCD , σ B ⊗ σ C ⊗ σ D ) − D F ( σ BD , σ B ⊗ σ D ) (cid:19) . The right hand side reduces to I σ ( A ; B | D ) + I σ ( A ; C | BD )= D F ( σ ABCD , σ A ⊗ σ B ⊗ σ C ⊗ σ D ) − D F ( σ AD , σ A ⊗ σ D ) − D F ( σ BCD , σ B ⊗ σ C ⊗ σ D ) . (11)Since the left hand side (10) and the right hand side(11) are equal we have proved the chain rule (9). ⊓⊔ We have carefully described concepts like state spaceand introduced state spaces on Jordan algebras as themost important example. In general probabilistic theo-ries there are different ways of defining the entropy of astate, but these different definitions coincide on Jordanalgebras. For any optimization problem trere is an asso-ciated Bregman divergence, but with extre constraintslike monotonicity, sufficiency, or statistical locality aBregman divergenceon a Jordan algebra is proportionalto the Bregman divergence generated by the uniquelydefined entropy function. A monotone Bregman diver-gence on a Jordan algebra is automatically additive. Forcomposed systems an additive and monotone Bregmandivergence can be used to define conditional mutualinformation and this quantity will satisfy consistency,the data processing inequality and the chain rule. In[16] it was proved that if conditional mutual informa-tion can be defined in a way such that consistency, thedata processing inequality and the chain rule are sat-isfied then the system will satisfy the condition called information causality [16]. In [16] it was also provedthat a system that satisfies information causality can-not have super-quantum correlations, i.e. correlationsviolate Tsirelson’s bound. The conclusion is that theexistence of a monotone Bregman divergence impliesthat super-quantum correlations do not exist. The results work out nicely on Jordan algebras, butmaybe it will work in any generalized probabilistic the-ory. For instance it would be interesting if the followingconjecture holds.
Conjecture 1
All monotone Bregman divergences areadditive.A careful inspection of the proofs also reveal thatthe results involving Jordan algebras only involve thatthe cone is self dual and that a Euclidean Jordan al-gebra is strongly spectral in the sense that f ( σ ) is welldefined for any function f . Appearently monotonicity ofa Bregman divergence implies spectrality, but the onlysolid result in this direction is the following theorem. Theorem 9 ([10])
If a state space has rank 2 and ithas a strict and monotone Bregman divergence then thestate space can be represented as a spin factor. In par-ticular the state spce is strongly spectral.
For most convex bodies it is not possible to define amonotone Bregman divergence and it is not known if itis possible to define a monotone Bregman divergenceson any convex body that cannot be represented by aJordan algebra. It would be highly desirable to clas-sify state spaces with monotone Bregman divergencesin cases when the rank exceeds 2.
Conflict of interest
The corresponding author states that there is no con-flict of interest.
References
1. Adler, S.: Quaternionic Quantum Mechanics and Quan-tum Fields. Oxford Univ. Press, New York, Oxford (1995)2. Baes, M.: Convexity and differentiability properties ofspectral functions and spectral mappings on EuclideanJordan algebras. Linear Algebra and its Applications , 664–700 (2007). DOI doi:10.1016/j.laa.2006.11.0253. Barnum, H., Graydon, M., Wilce, A.: Composites andcategories of Euclidean Jordan algebras (2016). ArXivpreprint arXiv:1606.093314. Barnum, H., Hilgert, J.: Strongly symmetric spectral con-vex bodies are Jordan algebra state spaces (2019)5. Dawid, A.P.: Separoids: A mathematical framework forconditional independence and irrelevance. Ann. Math.Artif. Intell. , 335–372 (2001)6. Günaydin, M., Gürsey, F.: Quark structure and octo-nions. J. Math. Phys. (11), 1651–1667 (1973)7. Harremoës, P.: Divergence and sufficiency for convex op-timization. Entropy (5), Article no. 206 (2017). URL https://doi.org/10.3390/e19050206
8. Harremoës, P.: Maximum entropy and sufficiency. AIPConference Proceedings (1), 040001 (2017). URL https://doi.org/10.1063/1.4985352 rom Sufficiency to Causality 119. Harremoës, P.: Entropy inequalities for lattices. Entropy , 748 (2018). DOI 10.3390/e2010078410. Harremoës, P.: Entropy on spin factors. In: N. Ay,P. Gibilisco, F. Matúš (eds.) Information Geometry andIts Applications, Springer Proceedings in Mathematics &Statistics , vol. 252, pp. 247–278. Springer (2018)11. Jordan, P., von Neumann, J., Wigner, E.: On an alge-braic generalization of the quantum mechanical formal-ism. Annals of Mathematics (1), 29–64 (1934). DOI10.2307/1968117. JSTOR 196811712. Krumm, M., Barnum, H., Barrett, J., Müller, M.P.:Thermodynamics and the structure of quantumtheory. New Journal of Physics (4), 043025(2017). DOI 10.1088/1367-2630/aa68ef. URL http://dx.doi.org/10.1088/1367-2630/aa68ef
13. Manogue, C.A., Dray, T.: Octonions, e6, and particlephysics. J. Phys. Conf. p. 012005 (2010)14. McCrimmon, K.: A Taste of Jordan Algebras. Springer(2004)15. Müller-Hermes, A., Reeb, D.: Monotonicity of thequantum relative entropy under positive maps. An-nales Henri Poincaré (5), 1777–1788 (2017). URL https://doi.org/10.1007/s00023-017-0550-9
16. Pawlowski, M., Paterek, T., Kaszlikowski, D., Scarani,V., Winter, A., Zukowski, M.: Information causality as aphysical principle. Nature , 1101–1104 (2009). DOIhttps://doi.org/10.1038/nature0840017. Short, A.J., Wehner, S.: Entropy in general physical the-ories. New J. Phys.12