[PDF] Natural information measures in Cox' approach for contextual probabilistic theories

Abstract

In this article we provide, from a novel perspective, arguments that support the idea that, in the wake of Cox' approach to probability theory, von Neumann's entropy should be the natural one in Quantum Mechanics. We also generalize the pertinent reasoning to more general orthomodular lattices, which reveals the structure of a general non-Boolean information theory.

Full PDF

aa r X i v : . [ qu a n t - ph ] S e p Natural information measures in Cox’ approach for contextualprobabilistic theories

Federico Holik , Angel Plastino , and Manuel S´aenz September 29, 2018

1- Universidad Nacional de La Plata, Instituto de F´ısica (IFLP-CCT-CONICET), C.C. 727, 1900 LaPlata, Argentina2- Department of Mathematics, University of Buenos Aires & CONICET.3- Universitat de les Illes Balears and IFISC-CSIC, 07122 Palma de Mallorca, Spain.

Abstract

In this article we provide, from a novel perspective, arguments that support the ideathat, in the wake of Cox’ approach to probability theory, von Neumann’s entropy shouldbe the natural one in Quantum Mechanics. We also generalize the pertinent reasoning tomore general orthomodular lattices, which reveals the structure of a general non-Booleaninformation theory.

Key words: von Neumann entropy, Information Theory, Lattice Theory, Non-Boolean Algebras

The problem of characterizing information measures has puzzled people since the very be-ginning of information theory. As an example, this intriguing character is expressed in thevery origin of the term ‘entropy’ for Shannon’s measure. In Shannon’s words:My greatest concern was what to call it. I thought of calling it an “infor-mation”, but the word was overly used, so I decided to call it an “uncertainty”.When I discussed it with John von Neumann, he had a better idea. Von Neu-mann told me, “You should call it entropy, for two reasons. In the ﬁrst place youruncertainty function has been used in statistical mechanics under that name, soit already has a name. In the second place, and more important, nobody knowswhat entropy really is, so in a debate you will always have an advantage”.[1] (seealso [2], page 35).This entropic mystery did nothing but grow since the advent of quantum informationtheory (QIT) [3], in which von Neumann’s entropy (VNE) plays a signiﬁcant role (as forexample, in the quantum coding theorem presented in [4, 5]). Of course, von Neumann’smeasure is not ‘alone’: there are many other entropic measures which also play a signiﬁcantrole in QM and QIT, such as the Tsallis’ [6, 7] and Reny`ı’s [8] entropies, and even more generalones [9, 10, 11]. This fact led to an important debate about which is the correct informationmeasure for the quantal realm (see for example [12] and [13]). Many studies attempted tocharacterize Shannon’s entropy ﬁrst [14, 15, 16, 17], but also von Neumann’s [18]. In this rticle, we oﬀer a novel perspective which considers VNE as the natural information measurefor a non-Boolean probability calculus.Classical information theory (CIT) relies on the notion of probability: as stressed byShannon, the probabilistic aspect of the source is at the basis of his seminal work [19]. Itis widely accepted that classical probability theory can be axiomatized using Kolmogorov’spostulates [20]. But it turns out that there exists another approach to classical probability,namely, that of R. T. Cox [21, 22]. In Cox’ approach, probabilities are considered as aninference calculus on a Boolean algebra of propositions: a rational agent, intending to makeinferences using classical logic (wherefrom the Boolean structure emerges), must computethe plausibility of certain events to occur. It turns out that the only measure of plausibilitycompatible with the algebraic symmetries of the Boolean algebra of propositions is —upto rescaling— equivalent to Kolmogorov’s probability theory [23, 24, 25]. In this way, theplausibility calculus is considered as a direct extension of classical deduction theory to aninference theory: the extension of rationality applied to the calculus of plausibility.In his preliminary works, Cox also conjectured that Shannon’s entropy [21] was thenatural information measure for classical probability distributions. This approach was con-siderably developed and improved in [23, 24, 25, 26, 27, 28, 29]. In this way, Shannon’s andHartley’s entropies have been characterized as the only entropies that can be used for thepurposes of inquiry, in the sense that other entropies will lead to inconsistencies with theBoolean character of the lattice of assertions [27].In [30], we presented a derivation of the axioms of non-commutative probabilities inQuantum Mechanics (QM) by appealing to the non-distributive (non-Boolean) character ofthe lattice of projections of the Hilbert space (Cf. Appendix A of this work for elementarynotions of lattice theory). This was done by extending Cox’ approach for the orthomodularlattice of projection operators to i) the quantal case and ii) more general non-Boolean alge-bras. In this work we complement the approach by deriving the VNE as the most naturalinformation measure in the quantum context. As in [30], we extend our results to moregeneral (atomic) orthomodular lattices. This is done by exploiting the fact that Cox’ deriva-tion of Shannon’s entropy can be applied to all possible maximal Boolean subalgebras of anarbitrary atomistic orthomodular lattice. Thus, according to our extension of Cox’ approachto the non-commutative realm, the VNE and Measurement Entropy (ME) [31, 32, 33] ariseas the most natural information measures.The results presented in this work pave the way for a new way of conceiving informationtheory. While classical probabilities give rise to Shannon’s theory, and thus, lead to CIT,non-Boolean probabilities give rise to VNE and QIT. Thus, QIT could be conceived as anon-Boolean generalization of CIT. This opens the door to a new way of exploring physicaltheories from the informational point of view, due to the fact that probabilistic theoriesmore general than the ones appearing in classical and (standard) quantum mechanics can beconceived.Indeed, during the 30’s von Neumann developed a theory of rings of operators [34] (todayknown as von Neumann algebras [35, 36]), and subsequently, in a joint work with Murray,they provided a classiﬁcation of factors [37, 38, 39, 40]. While quantum systems of ﬁnitedegrees of freedom (as is the case for example, in standard non-relativistic QM), can bedescribed using Type I factors, more general Factors are needed for more general theories:it can be shown that Type III Factors must be used in relativistic quantum ﬁeld theory, andType II factors may appear in quantum statistical mechanics of inﬁnite systems [41, 35].Thus, it is expected that theories to be yet developed, such as a quantum theory of gravity,may very likely imply the use of more general probabilistic models (perhaps not containedin the above examples).The more general framework for studying probabilistic models up to now is provided bythe Convex Operational Models (COM) approach [44, 45]. In a general probabilistic model,the probabilities will not be necessarily Kolmogorovian (as is the case for the probabilitiesappearing in Type I, II and III factors). Thus, we envisage the development of a non-Kolmogorovian (or generalized) information theory . A clear example of the fact that such an For the case of negative probabilities, see for example [42, 43]. ntity does exists can be found in studies focusing on the validity of informational notionssuch as [31, 46, 32, 47] (cf. [45] for more references on the subject). In this work, weshow that CIT and QIT are just particular cases of this approach; the ﬁrst would be theBoolean case, and the second, the one represented by the Type I factors of the Murray-vonNeumann classiﬁcation theory, i.e., as the algebras of bounded operators acting on separableHilbert spaces. In this context, it is pertinent to mention that quantum algorithms whereshown to exploit the non-Boolean character of the lattice of projection operators in quantummechanics [48]. The generalization of the Bayesian Cox’ approach presented here (and in[30]), provides a unifying formal framework for dealing with possible physical theories. Italso provides a possible interpretation of VNE and ME as natural measures of informationfor non-commutative event structures. In other words, as a natural information measure fortheories exhibiting a highly contextual character, like standard quantum mechanics [49, 50].Before concluding this Introduction, we remark that an important advantage of extendingthe Cox’ approach to non-Boolean settings is that it oﬀers a novel argument in favor of the useof the logarithmic functional form appearing in the VNE and the ME. As we have remarkedabove, there is a debate around the question of why using the VNE instead of more generalquantum information measures (such as the quantum versions of R´enyi and Tsallis entropies)in the quantum realm. Moreover, while the ME was introduced in References [31], [32] and[33], no conceptual argument is presented there in favor of using that functional form insteadof more general ones (apart from observing that it possesses some of the ‘desired properties’shared by Shannon’s measure and the VNE). In fact, in Section 4 . contextual rational agent ’ perspective, the functional forms appearing inthe VNE and the ME can be considered as the most reasonable choices compatible with thealgebraic structure of a contextual inquiry calculus. Notice that this approach also allows fora very intuitive interpretation of the VNE and the ME, which was not present in previousworks.The paper is organized as follows. In Section 2 we start by reviewing classical probabilitytheory (in the approaches of Kolmogorov and Cox) and probabilities appearing in QM, em-phasizing the diﬀerences with the classical case. In Section 3 we present a digression on Cox’[22] and more recent [25] derivations of Shannon’s entropy as natural information measuresfor Boolean algebras. Next, in Section 4, we show how VNE arises as a natural measure ofinformation for the Hilbertian projection lattice. In Section 5, we discuss generalized proba-bilistic models and ME. Finally, in Section 6 we draw some conclusions. Elementary notionsof lattice theory can be found in Appendix A. Let Σ represent a sigma-algebra of subsets of a given outcome set. To ﬁx ideas, consider theexample of a dice. For this case, the outcome set Ω = { , , , , , } is the set of all possibleresults, and Σ = P (Ω) is the set of all possible subsets of Ω; each element of Σ represents apossible event (for example, the event “the result is even”, is represented by the set { , , } and so on). Kolmogorov’s axioms can be presented in the form of conditions on a measure µ over Σ as follows [20]: : Σ → [0 , µ ( ∅ ) = 0 µ ( A c ) = 1 − µ ( A ) , (1)where ( . . . ) c stands for the set-theoretical complement.For any pairwise disjoint denumerable family { A i } i ∈ I ,µ ( S i ∈ I A i ) = P i µ ( A i ) . With this minimal axiomatic basis the whole building of classical probability theory canbe erected.A random variable is deﬁned as a function X : Σ −→ R that assigns real values tothe elements of Σ. Random variables are intended to describe properties of the systemunder study that depend on the diﬀerent possible outcomes that may result from a givenexperiment. A random variable may be discrete if its set of possible values is countable, or continuous if there exists a continuous function which determines its probability distributionaccording to P ( X ⊆ B ) = Z B f ( x ) dx (2)Although not necessarily, the formalization of probability given by Kolmogorov’s axiomsis usually associated with an objectivist (of frequentist ) interpretation of probability theory,in which probabilities represent a property of the system under study, and are thereforecapable of being subject to experimental test. Alternatively, in Cox’ approach probabilities are interpreted in a subjective manner: they donot represent properties of physical systems, but rather they are related to the informationone possesses about them. The aim of Cox was to establish probability theory as a formof induction arising as an extension of classical logic to situations of incomplete knowledge.As it will be shown, by doing so, Cox arrives at the same results as the ones obtained fromKolmogorov’s axioms. However, these two approaches signiﬁcantly diﬀer at the conceptuallevel. In this section, although we will follow Cox’ original deductions (presented in [21, 22]),for the sake of clarity, we will somewhat change Cox’ notation. In [51] and [52], a moredetailed discussion on Cox’ work, together with implications and criticisms, can be found.Let us call P the set of propositions that a rational agent uses to describe a system understudy and “ ¬ ”, “ ∨ ” and “ ∧ ”, the logical negation, disjunction, and conjunction, respectively.Cox starts by postulating the existence of a function ϕ h : P −→ R that represents the plausibility of the propositions in P on the basis of a special knowledge possessed by theagent. Such knowledge is that of a proposition, called h (usually called hypothesis ), that i)happens to be true and ii) satisﬁes: • ∀ a ∈ P , ϕ h ( ¬ a ) = f ( ϕ h ( a )), for some function f : P −→ R . • ∀ a, b ∈ P , ϕ h ( a ∨ b ) = g [ ϕ h ( a ) , ϕ h ( b )], for some function g : P × P −→ R .It is now possible to derive the calculus of probabilities by imposing on this structure thesymmetries of a Boolean algebra . On such a basis one arrives at results analogous to theones obtained from Kolmogorov’s axioms. By classical logic one refers to the propositional calculus endowed with the operations “ ¬ ”, “ ∨ ” and “ ∧ ”. Itis widely known that the algebraic structures corresponding to this propositional calculus are closely related toBoolean algebras. y imposing coherence of the function ϕ h ( · ) with the associativity of conjunction ( a ∧ ( b ∧ c ) = ( a ∧ b ) ∧ c )), Cox showed that the function g ( x, y ) must satisfy the functional equation g [ x, g ( y, z )] = g [ g ( x, y ) , z ] (3)Using the theory developed in [15], it can be shown that after a re-scaling and a properdeﬁnition of the probability P ( a | h ) in terms of ϕ h ( a ), this equation’s solutions lead to the product rule of probability theory P ( a ∧ b | h ) = CP ( a | h ∧ b ) P ( b | h ) (4)where C is a constant. The deﬁnition of P ( a | h ) in terms of ϕ ( a | h ) is omitted, as in actualcomputations one ends up using only the function P ( a | h ) and never ϕ ( a | h ). On the otherhand, imposing coherence with i) the law of double negation ( ¬¬ a = a ) and ii) Morgan’slaw for disjunction ( ¬ ( a ∨ b ) = ¬ a ∧ ¬ b ), Cox arrives to a functional equation for f ( · ) whichhas solutions in terms of P ( a | h ) given (up to re-scaling) by P ( a | h ) r + P ( ¬ a | h ) r = 1 (5)This seemingly arbitrary choice of value for the constant r can be avoided via re-scalingprobability to absorb the r exponent. That is to say, it can be avoided by deﬁning probabilityas P ′ ( a | h ) ≡ P r ( a | h ) instead of P ( a | h ). Cox decides to take r = 1 and thus he obtains theusual rule for computing the probabilities of complementary outcomes. Finally, using results(4) and (5), and imposing coherence with i) the law of double negation and ii) Morgan’s lawfor conjunction ( ¬ ( a ∧ b ) = ¬ a ∨ ¬ b ), Cox deduces the sum rule of probability theory: P ( a ∨ b | h ) = P ( a | h ) + P ( b | h ) − P ( a ∧ b | h ) (6)It can be easily shown from equations (4) and (6) that, if normalized to 1, P ( a | h ) satisﬁesall the properties of a Kolmogorovian probability (equations 1). In [53] R. P. Feynman deﬁnes probabilities as follows:I should say, that in spite of the implication of the title of this talk the concept ofprobability is not altered in quantum mechanics. When I say the probability of acertain outcome of an experiment is p , I mean the conventional thing, that is, ifthe experiment one expects that the fraction of those which give the outcome inquestion is roughly p . I will not be at all concerned with analyzing or deﬁning thisconcept in more detail, for no departure of the concept used in classical statisticsis required.What is changed, and changed radically, is the method of calculating probabilities.Feynman asserts that while the concept of probability is not altered in QM, the methodof calculating probabilities changes radically. What does this mean? In order to clarify, letus write down things in a more technical way. To begin with, a general state in QM can berepresented by a density operator, i.e., a trace class positive hermitian operator of trace one[54, 55]. Let P ( H ) be the orthomodular lattice of projection operators of a Hilbert space H (cf. App. A). Due to the spectral theorem, every physical event (i.e., the outcome ofany conceivable experiment), can be represented as a projection operator in P ( H ). If P isa projection representing an event and the state of the system is represented by the densityoperator ρ , then, the probability p ρ ( P ) that the event P occurs is given by the formula p ρ ( P ) = tr( ρP ) (7)which is known as Born’s rule [55]. Given an event P and state ρ , if the experiment is repeatedmany times, Born’s rule assigns a number which coincides with the fraction mentioned in eynman’s quotation. Gleason’s theorem [56] ensures that density operators are in bijectivecorrespondence with measures s of the form s : P ( H ) → [0 , s ( ) = 0 ( is the null subspace) .s ( P ⊥ ) = 1 − s ( P ) , (8)and, for a denumerableand pairwise orthogonal family of projections P j s ( P j P j ) = P j s ( P j ) . Thus, given a state ρ , a measure s ρ satisfying Eqns. 8 is uniquely determined in such away that, for each outcome of each experiment represented by a projection operator P , itcoincides with the probability deﬁned in Feynman’s quotation. In this way, probabilitiesappearing in QM (which are governed by the density matrix and the Born’s rule), can beaxiomatized using Eqs. 8. How is all of this related with the above Feynman’s quotation?What is the technical meaning of the radical diﬀerence mentioned by Feynman? WhileEqs. 8 may look unfamiliar, it is instructive to consider a quantum probability distribution,such as s , as a collection of classical probability distributions . Let us make some importantdeﬁnitions in order to see how this works. Let E := { P i } i ∈ N be a collection of projectionssuch that W i P i = H and P i ⊥ P j = 0 whenever i = j . We call E an experiment . Theintuitive idea of an experiment refers to the set of events deﬁned by a concrete experimentalsetup. Each one of these events is in a bijective correspondence with a possible measurementoutcome. Thus, the E ’s can be regarded as part of an outcome set Ω E . As an example,measuring the spin of a particle in a deﬁnite direction deﬁnes an experiment. To measureit in another direction, deﬁnes a new experiment incompatible with the ﬁrst one. Noticethat any experiment deﬁnes a maximal Boolean subalgebra of P ( H ), which is isomorphic to P (Ω E ). The state of the system deﬁnes a classical probability distribution on this Booleansubalgebra, satisfying Eqs. 1.We call an orthonormal complete set of projectors of the form {| ϕ i ih ϕ i |} i ∈ N in H (wherethe | ϕ i i are unit vectors) a frame . Notice that any frame is also an experiment. In a sense, aframe represents a maximal experiment on the system, in the sense that it cannot be reﬁnedby any ﬁner measurement (cfr. [57], Chapter 2). We call F H to the set of all possible framesin H . Frames are irreducible experiments, in the sense that no outcome is degenerate.Each experiment E deﬁnes a maximal Boolean subalgebra B E ⊂ P ( H ) . Again, if werestrict the state ρ to B E , we obtain a measure ρ B E on B E satisfying Kolmogorov’s probabilitytheory (deﬁned by Eqns. 1).Indeed, if we restrict to frames, for each orthonormal basis {| φ i i} i ∈ N of H representinga particular irreducible experiment, the state ρ assigns to it a classical probability distri-bution represented by the vector ( p | φ i , p | φ i , . . . ), where p | φ i i = tr( ρ | φ i ih φ i | ). Indeed, theset {| φ i ih φ i |} i ∈ N generates a maximal Boolean algebra, and measure s ρ deﬁnes a classicalprobability measure on it just as in 1. Thus, the quantum probabilities originated in a givenstate can be considered as a (non-denumerable) family The construction of B E is trivial: it is indeed the smallest Boolean subalgebra of P ( H ) containing E . ( p | φ i , p | φ i , p | φ i , . . . . . . ) }{ ( p | φ ′ i , p | φ ′ i , p | φ ′ i , . . . . . . ) }{ ( p | φ ′′ i , p | φ ′′ i , p | φ ′′ i , . . . . . . ) } ...... (9)where | φ i i , | φ ′ i i , etc., ranges over all possible orthonormal basis of H .Thus, a quantum state can be seen as a collection of classical probability distributionsranging over each possible experiment. Since in QM diﬀerent experiments can be incom-patible (i.e., some of them cannot be simultaneously performed ), a quantum state does notdetermine a single classical probability distribution: due to Gleason’s theorem, this fact iscorrectly axiomatized by Eqs. 8. We thus see how the meaning of the expression “radicallychanged” in Feynman’s quote can be expressed in a clear technical (but also conceptual)sense. In classical probability theory the rational agent is confronted with an event structurerepresented by a single Boolean algebra (only one context). This is the content of Cox’ ap-proach to probability theory : the Boolean structure of propositions representing classicalevents determine the possible measures of degrees of belief. In other words, if the agentwants to avoid inconsistencies, he must compute probabilities according to rules compatiblewith the Boolean structure of classical logic.In the quantum realm, due to the existence of complementary contexts, a single Booleanalgebra is no longer suﬃcient to cogently (and fully) describe physical phenomena, and thus,the orthomodular structure of P ( H ) emerges. This is the case for more general theories aswell, such as algebraic relativistic quantum ﬁeld theory or quantum mechanics with inﬁnitelymany degrees of freedom, and this involves the use of more general algebraic structures (moreon this in the next Section). Notice that these considerations do not imply that classicallogic should be abandoned; quite on the contrary, the experimenter is always confrontedwith concrete experiments for which a Boolean algebra is perfectly deﬁned. But no a prioriprinciple grants that the complete description of all possible phenomena will be exhaustedwithin a single Boolean context. Here we encounter the radical diﬀerence in computingprobabilities that quantum mechanics forces on us: non-Boolean event structures do appearin nature, and in this case, new rules for computing probabilities must be invoked. In [30]Cox’ construction is generalized by showing that when the experimenter is confronted withevents represented by a non-Boolean algebra such as P ( H ), the plausibility measures mustobey Eqns. 8 in order to avoid inconsistencies. Measures in lattices more general than the sigma-algebra of the classical case and P ( H ) canbe constructed [35, 36]. They can be axiomatized as conditions on a measure s as follows: s : L → [0; 1] , ( L standing for the lattice of all events)such that s ( ) = 0 .s ( E ⊥ ) = 1 − s ( E ) , (10)and, for a denumerable and pairwise orthogonal family of events E j s ( P j E j ) = P j s ( E j ) . It is also the content of other similar approaches as well, such as the ones presented in Section II of [58] ee [61] regarding the conditions for the existence of such measures. Eqs. 1 and 8 are justparticular cases of this general approach. There do exist concrete examples of measures onlattices, coming from Type II and Type III factors, which do not reduce to 1 and 8 [35, 41].Deﬁne an experiment as a set of propositions A := { a i } i ∈ N , such that a i ⊥ a j for i = j and W i a i = . Call E to the set of all possible experiments. A frame in L will be an orthogonalset { a i } i ∈ N of atoms such that W i a i = . Notice that frames are also experiments here. Given an event structure (i.e, a set of propositions referring to events) represented by anatomic Boolean lattice B , Cox deﬁnes a question as the set of assertions that answer it. If aproposition x ∈ B answers question Q (notice that according to Cox’ deﬁnition this means x ∈ Q ), and if y implies x (or in lattice theoretical notation: y ≤ x ), then, y should alsoanswer Q (and thus, y ∈ Q ). Any set of propositions in B with this property will be calleda down-set (see [25]). Thus, any question Q in the set of questions Q ( B ) deﬁned by B is adown-set. Q ( B ) forms a lattice with set theoretical inclusion as partial order, intersectionas conjunction and set union as disjunction. Notwithstanding, Q ( B ) will fail to be Boolean,due to the failure of orthocomplementation.Following [59], deﬁne an ideal I of a lattice L as a non-empty subset satisfying thefollowing conditions • If x ≤ y and y ∈ I , then x ∈ I . • If x, y ∈ I , then x ∨ y ∈ I .Thus, any ideal is also a down-set. Given an element a ∈ L , a set of the form I ( a ) = { x ∈L | x ≤ a } is an ideal, and it is called a principal ideal of L . An important theorem due toBirkhoﬀ [59] asserts that the set ˆ L of all ideals forms a lattice and the set ˆ L p of all principalideals forms a sublattice, which is isomorphic to L [59] (and we denote this fact by ˆ L p ∼ L ).For an arbitrary atomic Boolean algebra B , any a ∈ B can be written in the form a = W i a i , for some atoms a i . We can also form the lattices of ideals ˆ B and ˆ B p , with ˆ B p ⊆ ˆ B and ˆ B p ∼ B (as lattices).We can also form the lattice of questions Q ( B ) (which will not be necessarily suitablyorthocomplemented). Notice that while each ideal in B belongs to Q ( B ), not every elementin Q ( B ) is an ideal (because a system of assertions does not necessarily satisfy the joincondition of the deﬁnition of ideal [28]). Thus, in order to stress the diﬀerence, let us callˆ Q ( B ) to the set of ideal-questions (i.e., questions such that are represented by ideals of B ).It should be clear that ˆ Q ( B ) ⊆ Q ( B ). For any question Q ∈ Q ( B ), if a ∈ Q , then, the ideal I ( a ) of a in B satisﬁes I ( a ) ⊆ Q (because Q must contain all the x such that x ≤ a ). Fromthis, it follows that Q = S a ∈ Q I ( a ).One more step is needed in order to guarantee that our questions be real . A real questionmust satisfy the condition of being answerable by a true statement [25]. This is elegantlydone by requiring that all atoms must belong to a question in order to be considered real.Thus, let R ( B ) be the set if real questions. It can be shown that in the general case, R B willnot be Boolean because of the failure of orthocomplementation. We will not use this latticehere, but only consider Q ( B ) and ˆ Q ( B ).There exists a quantity analogous to probability, called relevance [25], which quantiﬁesthe degree to which one question answers another (the technical details of the constructionof the relevance function are similar to those presented in Section 2 . Notice that maximal Boolean subalgebras of P ( H ) satisfy these conditions and that the disjunction may beinﬁnite but denumerable. epeat that the vocable relevance refers to the computation of to what an extent a questionanswers another one. From the mathematical point of view, this task is completely analogousto that of assigning plausibility to B , but applied now to Q ( B ). As explained in [25], in orderto assign relevances, i) the algebraic properties of the question lattice Q ( B ) and ii) theprobability assigned to B using Cox’ method must be taken into account. The objective isthus to assign relevances to the ideal-questions (the rest can be computed using the inquirycalculus derived using Cox’ method, see Knuth [25]). With the question algebra well-deﬁned,Knuth extends the ordering relation to a quantity that describes the degree to which onequestion answers another. This is done by deﬁning a bi-valuation on the lattice that takes twoquestions and returns a real number d ∈ [0 , c ], where c is the maximal relevance. Precisely,Knuth calls this bi-valuation the relevance [25]. This procedure can be applied to Q ( B ) andˆ Q ( B ), and thus we have a function d ( ·|· ) with properties analogous to that of a plausibilityfunction, but now deﬁned on the lattices of questions.Following [25], we assume that the extent to which the top question ˆ1 answers a join-irreducible question I ( a i ) depends only on the probability of the assertion a i from which thequestion I ( a i ) was generated. More abstractly, d ( I ( a i ) | ˆ1 ) = H ( p ( a i ) | ), H being a functionto be determined in such a way that it satisﬁes compatibility with the algebraic propertiesof the lattice and the probabilities assigned in B (by using Cox’ method). Now, let us reviewthe properties of d ( ·|· ) according to Knuth’ inquiry calculus. First, we will have subadditivity d ( a ∨ b | c ) ≤ d ( a | c ) + d ( b | c ) (11)which is a straightforward consequence of the sigma-additivity condition d ( _ i x i | c ) = X i d ( x i | c ) (12)for pairwise disjoint questions { x i } i ∈ N . Commutativity of “ ∨ ” implies that d ( x ∨ x ∨ . . . ∨ x n | c ) = d ( x π (1) ∨ x π (2) , . . . ∨ x π ( n ) | c ) (13)for any permutation π . Now suppose that to a certain collection of questions { x , x , . . . , x n } we add a new question y = I ( x ) and that we know in advance that the assertion x is false.Then, y collapses to ˆ0 ∈ Q ( B ). Thus, we should have the expansibility condition d ( x ∨ x ∨ . . . ∨ x n ∨ y | c ) = d ( x ∨ x ∨ . . . ∨ x n | c ) (14)Suppose now that a question X in Q ( B ) can be written as X = W i I ( a i ), where the { I ( a i ) } are ideal questions with I ( a i ) ∧ I ( a j ) = ˆ0 . Then, we will have d ( _ i I ( a i ) | ˆ1 ) = X i d ( _ i I ( a i ) | ˆ1 ) = X i H ( p ( a i ) | ) . (15)Let us cast the above equation as d ( _ i I ( a i ) | ˆ1 ) = K ( p ( a i )) , (16)where we have introduced the function K ( p ( a i )) which depends on the p ( a i ) only. If the { I ( a i ) } form a ﬁnite set (of n elements), we can write K ( p ( a i )) = K n ( p ( a ) , . . . , p ( a n )). Itturns out that K n ( p ( a ) , . . . , p ( a n )) satisﬁes subadditivity, additivity, symmetry and expan-sibility. A well known result [14, 25] implies that K n ( p ( a ) , . . . , p ( a n )) = AH n ( p ( a ) , . . . , p ( a n )) + BH n ( p ( a ) , . . . , p ( a n )) , (17)where A and B are arbitrary constants, H n ( p , . . . , p n ) = − P ni =1 p i ln p i and H n = ln( n )are the Shannon and Hartley entropies respectively. For information theoretical purposes elated to the continuity of the measure of information [14, 25], it is very natural to set A = 1 and B = 0, and thus K n ( p ( a ) , . . . , p ( a n )) = − P ni =1 p ( a i ) ln p ( a i ). When the terms { I ( a i ) } in the decomposition are an inﬁnite denumerable set, by continuity, we will havethat K ( p ( a i )) = − P ∞ i =1 p ( a i ) ln p ( a i ). The discussion in this Section allows us to discardthe restriction to ﬁnite Boolean algebras and turn to more general ones. P ( H ) As was done in the Cox approach to the Boolean case in order to justify the use of Shannonmeasure, we look now for a natural information measure for P ( H ), i.e., a function dependingon the non-commutative measure deﬁned by Eqns. 8. In other words, by appealing toGleason’s theorem, we look for a function S ( ρ ) (depending only on the state ρ ), and at thesame time compatible with the algebraic structure of P ( H ). Notice that it is not a prioriobvious whether a variant of Cox method can be applied to the non-Boolean structure of P ( H ) and used to justify the choice of the VNE. In this Section we will see that, accordingto Cox approach, the VNE appears as the most rational choice .Let us call B P ( H ) to the set of all maximal Boolean lattices of P ( H ). For each B ∈ B P ( H ) ,we can consider its dual lattice of ideals ˆ B .Notice that when H is ﬁnite dimensional, its maximal Boolean subalgebras will be ﬁnite.As an example, consider P ( C ), i.e., the set of all possible linear subspaces of a two dimen-sional complex Hilbert space. Then, each maximal Boolean subalgebra will be of the form { , P , ¬ P ⊥ , C } , with P = | ϕ ih ϕ | for some unit norm vector | ϕ i and P ⊥ = | ϕ ⊥ ih ϕ ⊥ | (with h ϕ | ϕ ⊥ i = 0). In a similar way, for P ( C ), a maximal Boolean subalgebra will be isomorphicto P ( { a,b,c } ) = {∅ , { a } , { b } , { c } , { a,b } , { a,c } , { b,c } , { a,b,c }} . More speciﬁcally, for this last example, given three orthogonal rays in C represented by unitary vectors | ϕ i , | ϕ i and | ϕ i , the set { , P , P , P , P , P , P , C } ,where P i = | ϕ i ih ϕ i | ( i = 1 , ,

3) and P ij = | ϕ i ih ϕ i | + | ϕ j ih ϕ j | ( i, j = 1 , , i = j ), forms amaximal Boolean subalgebra (and all maximal Boolean subalgebras are of this form). No-tice that in these examples, the sets of atoms {| ϕ ih ϕ | ; | ϕ ih ϕ | ; | ϕ ih ϕ |} (with orthonormal | ϕ i ih ϕ i | for all i ) and {| ϕ ih ϕ | ; | ϕ ⊥ ih ϕ ⊥ |} i) form frames, and ii) generate the above mentionedBoolean subalgebras of P ( H ).Now, it is important to notice that if we restrict a state ρ to B , we will have a classicalprobability measure such as the one deﬁned by Eqns. 1, and a concomitant inquiry set Q ( B )can be deﬁned as in [22, 23] (see Section 3 of this work). In what follows, our strategywill be to construct a suitable information measure, just as we did in Section 3, for eachmaximal Boolean subalgebra of P ( H ). For each frame F = {| ϕ i ih ϕ i |} i ∈ N ⊂ B representing acomplete experiment, state ρ assigns probabilities p i = tr( ρ | ϕ i ih ϕ i | ) to each possible outcomeof F . By following Cox’ spirit [22, 23, 26] and the procedure sketched in Section 3, we canguarantee (by choosing suitable coeﬃcients A and B in Eqn. 17) that for each maximalBoolean subalgebra B there exists a canonical information measure H F ( ρ ) such that for eachframe F ⊆ B : H F ( ρ ) = − X i p i ln p i = − X i tr( ρ | ϕ i ih ϕ i | ) ln(tr( ρ | ϕ i ih ϕ i | )) . (18)The above construction can be carried out for any B ∈ B P ( H ) . Thus, for any ρ , each B ∈ B P ( H ) and each frame F ⊆ B , we have a measure H F ( ρ ). It is important to note thatthis family of measures, although only deﬁned on the maximal boolean sublattices, do cover he whole P ( H ) lattice. This is so because, as shown in [60], every orthomodular lattice isthe union of its maximal boolean sublattices.Our point is that we need a measure such that it depends only on ρ and not on theparticular choice of complete experiment (represented by a particular frame). Among thefamily of measures H F ( ρ ), it is natural (according to Cox approach) to take the one whichattains the minimum value: the one with the least Shannon’s information (i.e., we are lookingfor the frame in which the information is maximal). This means that it is natural to deﬁne H ( ρ ) := inf F ∈F H H F ( ρ ) . (19)Given that ρ is self adjoint, let us consider its set of eigenprojectors F ρ = {| ρ i ih ρ i |} i ∈ N ,with ρ i ∈ R satisfying ρ | ρ i i = ρ i | ρ i i and ρ = P ρ i | ρ i ih ρ i | . It should be clear that if ρ isnon-degenerate, F ρ is a frame. If ρ is degenerate, it is equally easy to ﬁnd a frame out ofits eigenprojections. Accordingly, without loss of generality we can suppose that ρ deﬁnes aframe. Now consider the maximal Boolean algebra B F ρ generated by F ρ . Using Eq. 18, itfollows that the canonical measure H , when restricted to F ρ satisﬁes H F ρ ( ρ ) = − P i tr( ρ | ρ i ih ρ i | ) ln(tr( ρ | ρ i ih ρ i | )) = − P i ρ i ln ρ i = − tr( ρ ln( ρ )) (20)which is nothing but the VNE. But the VNE has the well known property of attaining itsminimum value at F ρ (cf. Reference [2]): − tr( ρ ln( ρ )) ≤ H F ( ρ ) , ∀ F ∈ F H (21)Thus, we have shown that H ( ρ ) = − tr( ρ ln( ρ )). In other words, von Neumann’s entropyis the only function which emerges canonically as the minimum of all measures compatiblewith the algebraic structure of P ( H ). Notice that we are deriving VNE out of the algebraicsymmetries of the lattice. The above considerations show the VNE as a natural measure ofinformation of P ( H ), as a consequence of Shannon’s entropy being the natural informationmeasure of a Boolean algebra following Cox’ method. Notice that our derivation covers boththe ﬁnite and inﬁnite dimensional cases. After deriving the VNE using the Cox method, we now advance a step further and investigatewhether this procedure can be extended to more general contextual theories. Concretely, wenow brieﬂy discuss what happens if L is an arbitrary atomic orthomodular lattice and µ isa measure obeying Eqs. 10. We show that the procedure of the previous Section can beextended to this case. Let B L be the set of all possible maximal Boolean subalgebras of L .For each B ∈ B L , the Cox’ construction applies as in Section 3, and we have a Shannon’sfunction H F ( µ ) deﬁned for each frame F = { a i } i ∈ N ∈ E (see Section 2 . H F ( µ ) = − X a i ∈ A µ ( a i ) ln( µ ( a i )) , (22)As in the previous Section, we deﬁne: H ( µ ) := inf F ∈ E H F ( µ ) . (23)Notice that when restricted to frames, H A ( µ ) coincides with the Shannon’s measures derivedusing Cox’ method. Thus, by construction, H ( µ ) does the job of representing the canonicalmeasure of information, as Shannon’s and VNE did in the classical and quantum cases,respectively. he results of this Section show that it is indeed possible to generalize Cox’ method toprobabilistic theories more general than a Boolean algebra. Notice that, when L is a Booleanalgebra, we recover Cox’ construction, and when L = P ( H ), we recover our construction forthe VNE. Indeed, by looking at Eq. 23, the reader will soon recognize that our derivationcoincides with the measurement entropy (ME) introduced in [31, 32, 33]. The main diﬀerenceof our approach with the one of these references is that: i) we derive the same measures byusing Cox approach, and thus, we provide a novel intuitive interpretation for them; and ii)by means of our derivation, we discard other possible functional forms, such as the onesappearing in Tsallis or R´enyi entropies, justifying in this way the usage of the logarithmicform of the VNE and the ME. If a rational agent deals with a Boolean algebra of assertions, representing physical events, aplausibility calculus can be derived in such a way that the plausibility function yields a theorywhich is formally equivalent to that of Kolmogorov for classical probabilities [21, 22, 29, 25].A similar result holds if the rational agent deals with an atomic orthomodular lattice [30],as is the case with the contextual character of the lattice of projections representing eventsof a quantum system. For the later case, non-Kolmogorovian probabilities (Eqs. 8) ariseas the only ones compatible with the non-commutative (non-Boolean) character of quantumcomplementarity.In Cox’ approach, Shannon’s information measure relies on the axiomatic structure ofKolmogorovian probability theory. We have shown in Section 4 that, according to ourextension of Cox’ method, the VNE emerges as its non-commutative version. The VNE thusarises as a natural measure of information derived from the non-Boolean character of theunderlying lattice P ( H ). The diﬀerent entropies discussed in this work are summarized inTable 1.The fact that this kind of construction can be extended to more general probabilisticmodels (as we have shown in Section 5, where we have deduced the ME as a natural measureof information), implies that CIT and QIT can be considered as particular cases of a moregeneral non-commutative information theory .These results allow for an interpretation of the VNE and measurement entropy as thenatural measures of information for an experimenter who deals with a non-Boolean (contex-tual) event structure. This is the case for standard quantum mechanics, in which quantumcomplementarity expresses itself in the existence of non-compatible measurement set-upsand, consequently, in the diﬀerent contexts of P ( H ) (maximal Boolean subalgebras) andnon-commutative observables. Classical Quantum GeneralLattice P (Γ) P ( H ) L Entropy − P i p ( i ) ln( p ( i )) − tr ρ ln( ρ ) inf F ∈ E H F ( µ )Table 1: Table comparing the diﬀerences between the classical, quantal, and general cases. Acknowledgements

We acknowledge CONICET and UNLP. We thank to the anonymousReferees for useful comments and suggestions.

References

References [1] M. Tribus and E. C. Mcirvine. Energy and Information, Sci. Am., 225(3):179–188, 1971.

2] I. Bengtsson and K. ˙Zyczkowski.

Geometry of Quantum States: An Introduction toQuantum Entanglement (Cambridge University Press, Cambridge, 2006).[3] Michael A Nielsen and Isaac L Chuang. Quantum computation and quantum informa-tion, Cambridge university press, 2010.[4] Benjamin Schumacher. Quantum coding. Phys. Rev. A, 51:2738–2747, Apr 1995.[5] Richard Jozsa and Benjamin Schumacher. A new proof of the quantum noiseless codingtheorem. Journal of Modern Optics, 41(12):2343–2349, 1994.[6] Constantino Tsallis. Possible generalization of boltzmann gibbs statistics. Journal ofStatistical Physics, 52(1-2):479–487, 1988.[7] Alexey Rastegin. Tests for quantum contextuality in terms of q-entropies. QuantumInformation And Computation, 14:0996–1013, September 2014.[8] Alfr´ed R´enyi. On Measures of Entropy and Information. University of California Press,Berkeley, Calif., 1961.[9] R. Rossignoli, N. Canosa, and L. Ciliberti. Generalized entropic measures of quantumcorrelations. Phys. Rev. A, 82:052342, Nov 2010.[10] S Zozor, G M Bosyk, and M Portesi. General entropy-like uncertainty relations in ﬁnitedimensions. Journal of Physics A: Mathematical and Theoretical, 47(49):495302, 2014.[11] M. M¨uller-Lennert, F. Dupuis, O. Szehr, S. Fehr, and M. Tomamichel. On quantumr´enyi entropies: A new generalization and some properties. Journal of MathematicalPhysics, 54(12), 2013.[12] ˇCaslav Brukner and Anton Zeilinger. Conceptual inadequacy of the shannon informa-tion in quantum measurements. Phys. Rev. A, 63:022113, Jan 2001.[13] C.G. Timpson. On a supposed conceptual inadequacy of the shannon information inquantum mechanics. Studies in History and Philosophy of Science Part B: Studies inHistory and Philosophy of Modern Physics, 34(3):441 – 468, 2003. Quantum Informationand Computation.[14] J. Acz´el, B. Forte, and C. T. Ng. Why the Shannon and Hartley entropies are ‘natural’.Advances in Appl. Probability, 6:131–146, 1974.[15] J. Acz´el. Lectures on functional equations and their applications. Mathematics in Sci-ence and Engineering, Vol. 19. Academic Press, New York-London, 1966. Translated byScripta Technica, Inc. Supplemented by the author. Edited by Hansjorg Oser.[16] W. Ochs. A unique characterization of the generalized Boltzmann-Gibbs-Shannon en-tropy. Phys. Lett. A, 54(3):189–190, 1975.[17] W. Ochs. A unique characterization of the generalized Boltzmann-Gibbs-Shannon en-tropy. Rep. Mathematical Phys., 9(3):331–354, 1976.[18] W. Ochs. A new axiomatic characterization of the von Neumann entropy. Rep. Mathe-matical Phys., 8(1):109–120, 1975.[19] Claude E Shannon. A mathematical theory of communication, part i. Bell Syst. Tech.J., 27:379–423, 1948.[20] A.N. Kolmogorov. Foundations of Probability Theory. Julius Springer: Berlin, Germany,1933.[21] R. T. Cox. Probability, frequency and reasonable expectation. American Journal ofPhysics, 14(1), 1946.[22] R.T. Cox. The Algebra of Probable Inference. The Johns Hopkins Press: Baltimore,MD, USA, 1961.[23] Kevin H. Knuth. Deriving laws from ordering relations. In Bayesian inference and maxi-mum entropy methods in science and engineering, volume 707 of AIP Conf. Proc., pages204–235. Amer. Inst. Phys., Melville, NY, 2004.

45] Federico Holik, Cesar Massri, Manuel S´aenz, and Angel Plastino. Generalized probabil-ities in statistical theories. arXiv:1406.0913 [stat.OT], 2014.[46] Howard Barnum, Jonathan Barrett, Matthew Leifer, and Alexander Wilce. Cloning andBroadcasting in Generic Probabilistic Theories. 2006.[47] Alexander Holevo. Probabilistic and statistical aspects of quantum theory, volume 1of Quaderni. Monographs. Edizioni della Normale, Pisa, second edition, 2011. With aforeword from the second Russian edition by K. A. Valiev.[48] Jeﬀrey Bub. Quantum computaton from a quantum logical perspective. Quantum In-formation And Computation, 7(4):281–296, May 2007.[49] Otfried G¨uhne, Matthias Kleinmann, Ad´an Cabello, Jan-Ake Larsson, Gerhard Kirch-mair, Florian Z¨ahringer, Rene Gerritsma, and Christian F. Roos. Compatibility andnoncontextuality for sequential measurements. Phys. Rev. A, 81:022121, Feb 2010.[50] Ad´an Cabello. Proposal for revealing quantum nonlocality via local contextuality. Phys.Rev. Lett., 104:220401, Jun 2010.[51] Kevin S Van Horn. Constructing a logic of plausible inference: a guide to cox’s theorem.International Journal of Approximate Reasoning, 34(1):3–24, 2003.[52] Stefan Arnborg and Gunnar Sjodin. On the foundations of bayesianism. In AIP Confer-ence Proceedings, pages 61–71. IOP INSTITUTE OF PHYSICS PUBLISHING LTD,2001.[53] Richard P. Feynman. The Concept of Probability in Quantum Mechanics. University ofCalifornia Press, Berkeley, Calif., 1951.[54] F. Holik and A. Plastino. Convex polytopes and quantum separability. Phys. Rev. A,84:062327, Dec 2011.[55] Federico Holik, C ˜A c (cid:13) sar Massri, A. Plastino, and Leandro Zuberman. On the latticestructure of probability spaces in quantum mechanics. International Journal of Theo-retical Physics, 52(6):1836–1876, 2013.[56] AndrewM. Gleason. Measures on the closed subspaces of a hilbert space. In C.A. Hooker,editor, The Logico-Algebraic Approach to Quantum Mechanics, volume 5a of The Uni-versity of Western Ontario Series in Philosophy of Science, pages 123–133. SpringerNetherlands, 1975.[57] Asher Peres. QuantumTheory: Concepts And Methods, volume 72 of FundamentalTheories of Physics.[58] Christopher A. Fuchs and R¨udiger Schack. Quantum-bayesian coherence. Rev. Mod.Phys., 85:1693–1715, Dec 2013.[59] Garrett Birkhoﬀ. Lattice theory, volume 25 of American Mathematical Society Collo-quium Publications. American Mathematical Society, Providence, R.I., third edition,1979.[60] Mirko Navara and Vladimir Rogalewicz. The pasting constructions for orthomodularposets. Mathematische Nachrichten, 154(1):157–168, 1991.[61] Enrico G. Beltrametti and Gianni Cassinelli. The logic of quantum mechanics, volume 15of Encyclopedia of Mathematics and its Applications. Addison-Wesley Publishing Co.,Reading, Mass., 1981. With a foreword by Peter A. Carruthers.[62] Garrett Birkhoﬀ and John von Neumann. The logic of quantum mechanics. Ann. ofMath. (2), 37(4):823–843, 1936.

Lattices • A lattice L is a partially ordered set (i.e., a set endowed with a partial order relationship“ ≤ ”) such that for very a, b ∈ L there exists a unique supremum, the least upper bound“ a ∨ b ” called their join , and an inﬁmum, the greatest lower bound “ a ∧ b ” called their meet . A bounded lattice has a greatest and least element, denoted and (also called top and bottom , respectively). • For any lattice, an orthocomplementation is a unary operation “ ¬ ( . . . )” satisfying: ¬ ( ¬ ( a )) = a (24a) a ≤ b −→ ¬ b ≤ ¬ a (24b) a ∨ ¬ a and a ∧ ¬ a exist and a ∨ ¬ a = (24c) a ∧ ¬ a = (24d)hold. • If L has a null element 0, then an element x of L is an atom if 0 < x and there existsno element y of L such that 0 < y < x . L is Atomic , if for every nonzero element x of L , there exists an atom a of L such that a ≤ x . • A modular lattice is one that satisﬁes the modular law x ≤ b implies x ∨ ( a ∧ b ) =( x ∨ a ) ∧ b , where ≤ is the partial order, and ∨ and ∧ (join and meet, respectively) arethe operations of the lattice. An orthomodular lattice is an orthocomplemented latticesatisfying the orthomodular law: a ≤ b and ¬ a ≤ c implies a ∨ ( b ∧ c ) = ( a ∨ b ) ∧ ( a ∨ c ). • Distributive lattices are lattices for which the operations of join and meet are distribu-tive over each other. Distributive orthocomplemented lattices are called

Boolean . Thecollection of subsets of a given set, with set intersection as meet, set union as join andset complement as orthocomplementation, form a complete bounded lattice which isalso Boolean. • Any quantum system represented by a separable Hilbert space H has associated alattice formed by all its closed subspaces P ( H ), where is the null subspace, is thetotal space H , ∨ is the closure of the direct sum, ∧ is subspace intersection, and ¬ ( S )is the orthogonal complement of a subspace S ⊥ [36]. This lattice was called “QuantumLogic” by Birkhoﬀ and von Neumann and it is a modular one if the Hilbert spaceis ﬁnite dimensional, and orthomodular for the inﬁnite dimensional case. The set ofprojection operators on H forms a lattice which is isomorphic to P ( H ) (and thus, theycan be identiﬁed).) (and thus, theycan be identiﬁed).