A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement
aa r X i v : . [ c s . I T ] A ug arXiv :1808.XXXX A Perspective on Unique Information:Directionality, Intuitions, and Secret Key Agreement
Ryan G. James, ∗ Jeffrey Emenheiser, † and James P. Crutchfield ‡ Complexity Sciences Center and Physics Department,University of California at Davis, One Shields Avenue, Davis, CA 95616 (Dated: August 28, 2018)Recently, the partial information decomposition emerged as a promising framework for identifyingthe meaningful components of the information contained in a joint distribution. Its adoption andpractical application, however, have been stymied by the lack of a generally-accepted method ofquantifying its components. Here, we briefly discuss the bivariate (two-source) partial informationdecomposition and two implicitly directional interpretations used to intuitively motivate alternativecomponent definitions. Drawing parallels with secret key agreement rates from information-theoreticcryptography, we demonstrate that these intuitions are mutually incompatible and suggest thatthis underlies the persistence of competing definitions and interpretations. Having highlighted thishitherto unacknowledged issue, we outline several possible solutions.
PACS numbers: 05.45.-a 89.75.Kd 89.70.+c 02.50.-rKeywords: information theory, partial information decomposition, secret key agreement, cryptography
I. INTRODUCTION
Consider a joint distribution over “source” variables X and X and “target” Y . Such distributions arise in manysettings: sensory integration, logical computing, neuralcoding, functional network inference, and many others.One promising approach to understanding how the infor-mation shared between X , X , and Y is organized is the partial information decomposition (PID) [1]. This decom-position seeks to quantify how much of the informationshared between X , X , and Y is done so redundantly ,how much is uniquely attributable to X , how much is uniquely attributable to X , and finally how much arises synergistically by considering both X and X together.Unfortunately, the lack of a commonly accepted methodof quantifying these components has hindered PID’sadoption. In point of fact, several proposed axioms arenot mutually consistent. And, to date, there is littleagreement as to which should hold. Here, we take astep toward rectifying these issues by bringing to lighta potentially fundamental inconsistency in the intuitionscommonly and often implicitly brought to bear upon in-formation decomposition. We make the intuitions quan-titative by appealing to information-theoretic cryptogra-phy. Taken together, our observations suggest that thecontext in which PID is applied should determine how itscomponents are quantified.Our development proceeds as follows. Section II brieflydescribes the two-source PID. Section III calls out the ∗ [email protected] † [email protected] ‡ [email protected] two distinct intuitions often used in interpreting PID.Section IV introduces a prototype distribution that high-lights the issues and we interpret it through the lensesof the two intuitions. Section V defines secret key agree-ment rates and computes them for the prototype distri-bution. Section VI then discusses how the two intuitionsrelate to secret key agreement rates and identifies whenthe latter result in viable decompositions. Finally, Sec-tion VII summarizes our findings and speculates as tohow future developments can bring consistency to PID. II. PARTIAL INFORMATIONDECOMPOSITION
Two-source PID seeks to decompose the mutual informa-tion I[ X X : Y ] between “sources” X and X and a“target” Y into four nonnegative components. The com-ponents identify information that is redundant, uniquelyassociated with X , uniquely associated with X , andsynergistic:I[ X X : Y ] = I ∂ [ X · X → Y ] redundant + I ∂ [ X → Y \ X ] unique from X + I ∂ [ X → Y \ X ] unique from X + I ∂ [ X X → Y ] . synergistic Furthermore, the mutual information between X and Y is decomposed into two components:I[ X : Y ] = I ∂ [ X · X → Y ] redundant + I ∂ [ X → Y \ X ] . unique from X And, similarly:I[ X : Y ] = I ∂ [ X · X → Y ] redundant + I ∂ [ X → Y \ X ] . unique from X In this way, PID relates the four component informations.However, it does not uniquely determine how to quantifythem. To do this, a definition must be supplied for oneof them and then the others follow.This allows for a range of choices. In the case thatone wishes to directly quantify the unique informationsI ∂ [ X → Y \ X ] and I ∂ [ X → Y \ X ] , a consistencyrelation must hold when they are computed indepen-dently:I ∂ [ X → Y \ X ] + I[ X : Y ]= I ∂ [ X → Y \ X ] + I[ X : Y ] . (1) III. THE CAMEL AND THE ELEPHANT
There are two common ways of thinking about PID.These approaches differ only in the (implied) direction-ality of cause and effect—a property unspecified by PID.In the first approach, one thinks of X and X as “inputs”that, when combined, produce Y , a “output”. Whileseemingly helpful labels, their use already imports anunwarranted semantics to the relationship between thethree random variables. In this, it inadvertently begsthe main issue we wish to raise here, while at the sametime illustrating the issue.When taking this view of PID, one generally asks ques-tions such as “How much information in X is uniquelyconveyed to Y ?”. From this vantage, considering the roleof the individual channels X → Y and X → Y might ormight not help develop intuition. Recalling the aphorism“a camel is a horse designed by committee”, we call thisthe camel intuition as particular input events X and X come together to describe an output Y .In the second approach, one considers X and X as“noisy observations” or “representations” of a single un-derlying object Y . When taking this view, one mightask a question such as “How much information in Y isuniquely captured by X ?”. Under this, the individualchannels Y → X and Y → X take on primary impor-tance. After the parable of the blind men describing anelephant, we call this the elephant intuition since par-ticular objects Y may be described by various, possiblypartial, representations, X and X . Pnt. Unq. X X Y Pr0 1 1 / / / / TABLE I. The pointwise unique distribution.
IV. THE POINTWISE UNIQUE DISTRIBUTION
The pointwise unique distribution [2] is given by theevents and probabilities displayed in Table I: at any timeexactly one of X or X is a ‘1’ or ‘2’ and matches Y ,while the other is ‘0’. Let’s now interpret this distribu-tion by adopting the camel and elephant intuitions inturn. We will see that they provide contradictory inter-pretations of the relationships between the variables.Adopting the camel intuition, we consider the ways inwhich X influences Y . It is easy to see that half of thetime (Table I’s 1 st and 3 rd rows) X is unable to say any-thing about the state of Y . The other half of the time(the 2 nd and 4 th rows) X and Y are perfectly correlated,while X is ignorant as to their state. Analogously, this istrue when considering how X influences Y . In this way,we interpret the distribution’s PID as consisting entirelyof unique informations. The camel intuition is summa-rized in Table II.When adopting the elephant intuition, however, a strik-ingly different picture emerges. Taking the viewpointof Y , both single channel distributions p ( X | Y ) and p ( X | Y ) are identical. So, any information shared withone must be redundantly shared with the other. Thesechannels do not allow one to determine the states of either X or X . What is learned, however, is that exactly oneof them matches Y , while the other is ‘0’. Furthermore,removing the remaining uncertainty in the values of X and X requires observing one of them—a synergistic ef-fect. The resulting elephant analysis is also summarizedin Table II.In short, the two directional PID interpretations leadto contradictory quantifications. From the viewpoint ofcamels, elephant approaches create redundancy wherethere is none. From the vantage of elephants, camelsdraw distinctions where none exist. This has been dis-cussed by Ref. [3] regarding whether or not unique infor-mation should depend on I[ X : X ] . From the camel’spoint of view, ignoring this as a constraint may “artifi-cially correlate” X and X and thereby inflate redun-dancy. This viewpoint can be more directly illustratedby considering the intermediate distribution from whichI BROJA [4]—an elephant—computes unique informationDecompositions by Intuitioncamel elephantI ∂ [ X · X → Y ] 0 bit / bitI ∂ [ X → Y \ X ] / bit 0 bitI ∂ [ X → Y \ X ] / bit 0 bitI ∂ [ X X → Y ] 0 bit / bit TABLE II. Camel and elephant intuitions applied to Table I’spointwise unique distribution. The camel intuition takes theview that X and X supply Y with unique informations,though only one of them at a time. The elephant intuitiontakes the view that Y provides both X and X with the sameinformation, but it gets erased on the way to exactly one ofthem. for the pointwise unique distribution: X X Y Pr0 0 1 / / / / From the elephant’s view, I[ X : X ] is irrelevant. V. SECRET KEY AGREEMENT
Secret key agreement is a fundamental concept withininformation-theoretic cryptography [5]. The central ideais that if three parties, Alice, Bob, and Eve, observe somejoint probability distribution
ABE ∼ p ( a, b, e ) where Al-ice has access only to a , Bob b , and Eve e , is it possiblefor Alice and Bob to agree upon a secret key of which Evehas no knowledge. The degree to which they may gen-erate such a secret key immediately depends upon thestructure of the joint distribution ABE . It also dependsupon whether Alice and Bob are allowed to publicly com-municate.Concretely, consider Alice, Bob, and Eve each receiv-ing n independent, identically distributed samples from ABE —Alice receiving A n , Bob B n , and Eve E n . A se-cret key agreement scheme consists of functions f and g , as well as a protocol for public communication ( h )allowing either Alice, Bob, neither, or both to commu-nicate. In the case of a single party being permittedto communicate—say, Alice—she constructs C = h ( A n )and then broadcasts it to all parties. In the case thatboth parties are permitted communication, they taketurns constructing and broadcasting messages of the form C i = h i ( A n , C [0 ...i − ) (Alice) and C i = h i ( B n , C [0 ...i − )(Bob) [6].Formally, a secret key agreement scheme is considered R -achievable if for all ǫ > K A (1) = f ( A n , C ) K B (2) = g ( B n , C ) p ( K A = K B = K ) (3) ≥ − ǫ I[ K : CE n ] (4) ≤ ǫ n H[ K ] (5) ≥ R − ǫ where (1) and (2) denote the method by which Alice andBob construct their keys K A and K B , respectively, (3)states that their keys must agree with arbitrarily highprobability, (4) states that the information about the keywhich Eve—armed with both her private information E n as well as the public communication C —be arbitrarilysmall, and (5) states that the key consists of approxi-mately R bits per sample.The greatest rate R such that an achievable scheme ex-ists is known as the secret key agreement rate . Nota-tional variations indicate which parties are permitted tocommunicate. In the case that Alice and Bob are notallowed to communicate, their rate of secret key agree-ment is denoted S( A : B || E ). When only Alice is al-lowed to communicate their secret key agreement rateis S( A → B || E ). And, similarly, if only Bob is per-mitted to communicate. When both Alice and Bob areallowed to communicate, their secret key agreement rateis denoted S( A ↔ B || E ). In this, we modified the stan-dard notation for secret key agreement rates to emphasizewhich party or parties communicate.In the case of no communication, S( A : B || E ) is givenby [7]: S( A : B || E ) = H[ A f B | E ] (2)where X f Y denotes the Gács-Körner common randomvariable [8]. It is worth noting that this quantity does notvary continuously with the distribution and genericallyvanishes.In the case of one-way communication, S( A → B || E ) isgiven by [9]:S( A → B || E ) = max { I[ B : K | C ] − I[ E : K | C ] } (3)where the maximum is taken over all variables C and K , such that the following Markov condition holds: C −◦− K −◦− A −◦− BE . It suffices to consider K and C suchthat | K | ≤ | A | and | C | ≤ | A | .There are no such solutions for S( A ↔ B || E ), howeverboth upper- and lower-bounds are known [6].Secret Key Agreement RatesS( X : Y || X ) 0 bitS( X : Y || X ) 0 bitS( Y → X || X ) 0 bitS( Y → X || X ) 0 bitS( X → Y || X ) / bitS( X → Y || X ) / bitS( X ↔ Y || X ) / bitS( X ↔ Y || X ) / bit TABLE III. The variety of secret sharing schemes and theirrates for the pointwise unique distribution of Table I.
Let us now consider the pointwise unique distribution ofTable I and the ability of X and Y to agree upon a secretkey while X eavesdrops. This can be interpreted fourdifferent ways. First, neither X nor Y may be allowed tocommunicate. Second, only Y can communicate. Third,only X is permitted to communicate. Finally, both X and Y may be allowed to communicate. Note that theeavesdropper X is not allowed to communicate in anysecret sharing schemes here. Looking at this distribution,a general strategy becomes clear: both X and Y needsome scheme to determine when they agree (the 2 nd and4 th rows).Broadly, the only way in which both X and Y cancome to understand if they match or not is if X is per-mitted to broadcast whether she observed a 0 or not.Therefore, in the instances where X is not communi-cating there is no ability to agree upon a key: S( X : Y || X ) = S( Y → X || X ) = 0 bit. However, when X is allowed communication a key can be agreed upon:S( X → Y || X ) = S( X ↔ Y || X ) = / bit. Theserates are summarized in Table III.
VI. DIRECTIONALITY, NATURALNESS, ANDCONSISTENCY
We are now in a position to integrate the two intuitionswith the results of secret key agreement rates. The camelintuition, with the channels X → Y and X → Y taking center stage, most closely aligns with the one-way secret key agreement rates S( X → Y || X ) and Secret key agreement rates have been associated with uniqueinformations before. An upper bound on S( A ↔ B || E )—theintrinsic mutual information [10]—is known to not satisfy the con-sistency condition Eq. (1) [11]. More recently, the relationshipbetween a particular method of quantifying unique informationand one-way secret key agreement has been considered [12]. It is known that S( X ↔ Y || X ) = / bit due to the conver-gence of upper and lower bounds in this instance. S( X → Y || X ). This also agrees with Section IV’squantification (compare Tables II and III):I ∂ [ X → Y \ X ] = S( X → Y || X ) andI ∂ [ X → Y \ X ] = S( X → Y || X ) . The elephant intuition, with its focus on the channels Y → X and Y → X is more naturally aligned with theone-way secret key agreement rates S( Y → X || X ) andS( Y → X || X ). This again accords with Section IV’squantification:I ∂ [ X → Y \ X ] = S( Y → X || X ) andI ∂ [ X → Y \ X ] = S( Y → X || X ) . There are, however, difficulties with these approaches.The first difficulty concerns the camel intuition. If theone-way secret key agreement rates S( X → Y || X )and S( X → Y || X ) are used to quantify the uniqueinformations I ∂ [ X → Y \ X ] and I ∂ [ X → Y \ X ] ,respectively, the consistency relation given by Eq. (1) isnot necessarily satisfied. Importantly, though, if S( Y → X || X ) and S( Y → X || X ) are used, the resultingPID is always consistent. One concludes that the ele-phant intuition is the more natural of the two when usingone-way secret key agreement rates to quantify unique in-formations.There is another difficulty. PID is defined to be agnos-tic to directionality. Furthermore, only one of the myr-iad proposed PID axioms is contingent on any inherentdirectionality—the Blackwell Property [13] and it is anelephant. In this sense, neither the camel nor the ele-phant intuitions are consistent with PID. Again relatingto secret key agreement, this implies that unique infor-mations should more closely align with either the pairS( X : Y || X ) and S( X : Y || X ) or with the pairS( X ↔ Y || X ) and S( X ↔ Y || X ); neither of whichadopt any sort of directionality.Both approaches bring their own further difficulties. Onthe one hand, the no-communication secret key agree-ment rate is not continuous in the space of distributions,whereas PID is generally considered to vary continuously.On the other hand, the two-way secret key agreementrate S( X ↔ Y || X ) has no known closed-form solu-tion, only upper and lower bounds, and so it cannot bepractically computed. Furthermore and perhaps morefundamentally, whether or not the two-way secret keyagreement rate results in a consistent decomposition isnot known. That said, our extensive searches of exam-ples for which the upper and lower bounds converge areencouraging—they have not resulted in any violations ofEq. (1). VII. CONCLUSION
At present, a primary barrier for PID’s general adop-tion as a useful and possibly a central tool in analyz-ing how complex systems store and process informationis an agreement on a method to quantify its componentinformations. Here, we posited that one reason for dis-agreement stems from conflicting intuitions regarding thedecomposition’s operational behavior. This suggests sev-eral possibilities.The first is that PID is inherently context-dependent andquantification depends on a notion of directionality. Inthis case, the elephant intuition is apparently more natu-ral, as adopting closely related notions from cryptographyresults in a consistent PID. If context demands the camelintuition, though, either a noncryptographic method ofquantifying unique information is needed or consistencymust be enforced by augmenting the secret key agreementrate.The second possibility suggested by our observations isthat intuitions which project a directionality on the de-composition are inherently flawed and that any correctquantification must be independent of direction. Inter-estingly, cryptographic notions may still play a role here.Though, since there is as yet no known way to computethe two-way secret key agreement rate, its applicationremains open.A final possibility is that associating secret key agreement rates with unique information is fundamentally flawedand that, ultimately, PID quantifies unique informationas something distinct from the ability to agree upon asecret key.Given that one of the main factors driving PID’s cre-ation was the need for interpretability, ensuring that theintuitions brought to bear are consistent with the quanti-tative values is of the utmost importance. We describedthree quantitative regimes, each corresponding to a spe-cific directionality or the lack thereof. While it is possiblethat each can play a distinct role in the understandingof complex systems, our hope is that a single methodwill emerge as the most useful and accepted approach tounderstanding the organization of information within ajoint probability distribution.
ACKNOWLEDGMENTS
All calculations herein were performed using the dit
Python package [14]. We thank P. Banerjee, E. Olbrich,and D. Feldspar for many helpful discussions. As a fac-ulty member, JPC thanks the Santa Fe Institute andthe Telluride Science Research Center for their hospi-tality during visits. This material is based upon worksupported by, or in part by, Foundational Questions In-stitute grant FQXi-RFP-1609, the U.S. Army ResearchLaboratory and the U.S. Army Research Office under con-tracts W911NF-13-1-0390 and W911NF-13-1-0340 andgrant W911NF-18-1-0028, and via Intel Corporation sup-port of CSC as an Intel Parallel Computing Center. [1] P. L. Williams and R. D. Beer. Nonnegative decomposi-tion of multivariate information. arXiv:1004.2515 .[2] C. Finn and J. T. Lizier. Pointwise partial information de-composition using the specificity and ambiguity lattices.
Entropy , 20(4):297, 2018.[3] R. A.A. Ince. Measuring multivariate redundant informa-tion with pointwise common change in surprisal.
Entropy ,19(7):318, 2017.[4] N. Bertschinger, J. Rauh, E. Olbrich, J. Jost, and N. Ay.Quantifying unique information.
Entropy , 16(4):2161–2183, 2014.[5] U. M. Maurer. Secret key agreement by public discus-sion from common information.
IEEE Trans. Info. Th. ,39(3):733–742, 1993.[6] A. Gohari, O. Günlü, and G. Kramer. Coding for pos-itive rate in the source model key agreement problem. arXiv:1709.05174 .[7] E. Chitambar, B. Fortescue, and M.-H. Hsieh. The con-ditional common information in classical and quantumsecret key distillation.
IEEE Trans. Info. Th. , 2018.[8] P. Gács and J. Körner. Common information is far lessthan mutual information.
Problems of Control and Infor- mation Theory , 2(2):149–162, 1973.[9] R. Ahlswede and I. Csiszár. Common randomness ininformation theory and cryptography. i. secret sharing.
IEEE Trans. Info. Th. , 39(4):1121–1132, 1993.[10] U. M. Maurer and S. Wolf. Unconditionally securekey agreement and the intrinsic conditional information.
IEEE Trans. Info. Th. , 45(2):499–514, 1999.[11] N. Bertschinger, J. Rauh, E. Olbrich, and J. Jost. SharedinformationâĂŤnew insights and problems in decompos-ing information in complex systems. In
Proceedings of theEuropean Conference on Complex Systems 2012 , pages251–269. Springer, 2013.[12] P. K. Banerjee, E. Olbrich, J. Jost, and J. Rauh. Uniqueinformations and deficiencies. arXiv:1807.05103 .[13] J. Rauh, P. Banerjee, E. Olbrich, J. Jost, andN. Bertschinger. On extractable shared information.
En-tropy , 19(7):328, 2017.[14] R. G. James, C. J. Ellison, and J. P. Crutchfield. dit:a Python package for discrete information theory.