[PDF] Approaches to causality and multi-agent paradoxes in non-classical theories

Abstract

This thesis reports progress in the analysis of causality and multi-agent logical paradoxes in quantum and post-quantum theories. These research areas are highly relevant for the foundations of physics as well as the development of quantum technologies. In the first part, focussing on causality, we develop techniques for using generalised entropies to analyse distinctions between classical and non-classical causal structures. We derive new properties of Tsallis entropies of systems that follow from the relevant causal structure, and apply these to obtain new necessary constraints for classicality in the Triangle causal structure. Supplementing the method with the post-selection technique, we provide evidence that Shannon and Tsallis entropic constraints are insufficient for detecting non-classicality in Bell scenarios with non-binary outcomes. This points to the need for better methods of characterising correlations in non-classical causal structures. Further, we investigate the relationships between causality and space-time by developing a framework for modelling cyclic and fine-tuned influences in non-classical theories. We derive necessary and sufficient conditions for such causal models to be compatible with a space-time structure and for ruling out operationally detectable causal loops. In particular, this provides an operational framework for analysing post-quantum theories admitting jamming non-local correlations. In the second part, we investigate multi-agent logical paradoxes such as the Frauchiger-Renner paradox and develop a framework for analysing such paradoxes in arbitrary physical theories. Applying this to box world, a post-quantum theory, we derive a stronger paradox that does not rely on post-selection. Our results reveal that reversible evolution of agents' memories is not necessary for deriving multi-agent paradoxes, and that certain forms of contextuality might be.

Full PDF

AApproaches to causality and multi-agent paradoxes innon-classical theories

ByVilasini VenkateshDoctor of PhilosophyUniversity of YorkMathematicsFebruary 2021 a r X i v : . [ qu a n t - ph ] F e b edication To my mother, Sujatha bstract

Causality and logic are both fundamental to our understanding of the universe, but our intu-itions about these are challenged by quantum phenomena. This thesis reports progress in theanalysis of causality and multi-agent logical paradoxes in quantum and post-quantum theories.Both these research areas are highly relevant for the development of quantum technologies suchas quantum cryptography and computing.Part I of this thesis focuses on causality. Firstly, we develop techniques for using generalisedentropies to analyse distinctions between classical and non-classical causal structures. We de-rive new properties of classical and quantum Tsallis entropies of systems that follow from therelevant causal structure, and apply these to obtain new necessary constraints for classicalityin the Triangle causal structure. Supplementing the method with the post-selection technique,we provide evidence that Shannon and Tsallis entropic constraints are insuﬃcient for detect-ing non-classicality in Bell scenarios with non-binary outcomes. This points to the need forbetter methods of characterising correlations in non-classical causal structures. Secondly, weinvestigate the relationships between causality and space-time by developing a framework formodelling cyclic and ﬁne-tuned inﬂuences in non-classical theories. We derive necessary andsuﬃcient conditions for such causal models to be compatible with a space-time structure andfor ruling out operationally detectable causal loops. In particular, this provides an operationalframework for analysing post-quantum theories admitting jamming non-local correlations.In Part II of this thesis, we investigate multi-agent logical paradoxes, of which the quantumFrauchiger-Renner paradox has been the only example. We develop a framework for analysingsuch paradoxes in arbitrary physical theories. Applying this to box world, a post-quantumtheory, we derive a stronger paradox that does not rely on post-selection. Our results revealthat reversible, unitary evolution of agents’ memories is not necessary for deriving multi-agentlogical paradoxes, and suggest that certain forms of contextuality might be.ii ontents

Abstract iiContents iiiList of Tables viList of Figures viiAcknowledgements ixDeclarations x1 Introduction 1

ONTENTS

I Approaches to causality in non-classical theories 56 ( , , , ) Bell scenario in entropy space . . . . . . . . . . 965.3 New results for the ( , , , ) Bell scenario in probability space . . . . . . . . . . 975.4 New results for the ( , , , ) Bell scenario in entropy space . . . . . . . . . . . . 995.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 iv ONTENTS

II Multi-agent paradoxes 170 v ist of Tables ( , , , ) CHSH . . . . . . . . . . . . . . . . . . . . . 110vi ist of Figures q for the distribution of Equation (5.4). . . . . . . . . . . . . . . . . . . . . . . . . 1005.2 Regions in the v − (cid:15) plane where Tsallis entropic BC inequalities can be violated. 1066.1 Motivating examples for cyclic and ﬁne-tuned causal models . . . . . . . . . . . . 121vii IST OF FIGURES viii cknowledgements

First and foremost, I am grateful to my supervisor, Roger Colbeck for his gentle guidance anddeep insights without which this thesis would not be possible. I have greatly beneﬁted bothfrom his expertise and from his encouragement to develop my own research ideas and to pursueexternal collaborations. His ability to express complex concepts in the most eﬀective verbal andmathematical form have inspired me to strive for perfection in my scientiﬁc communication.I wish to thank the Department of Mathematics for my PhD fellowship and all the timely ad-ministrative support. I extend my gratitude to other members of our department, in particularTony Sudbery and Matt Pusey, for valuable discussions that have enriched my knowledge. I amthankful to Mirjam for several stimulating discussions on causal structures, for sharing snippetsof her Mathematica code, and to her and Vicky for their help during my initial days in York.I deeply appreciate the experience I have had with all my peers, in particular Peiyun, Gior-gos, Rutvij, Vincenzo, Vasilis, Max and Ali— the enjoyable conversations, moral support andspontaneous music-making sessions have made my PhD experience very memorable. I expressmy deep regards to my collaborators at ETH Zürich. Firstly to Lídia del Rio for the numerousexciting exchanges about quantum foundations, career and life advice, and much more. Thanksare also due to Nuriya Nurgalieva, especially for the pretty diagrams on our joint paper. I amgrateful to Renato Renner, whose profound ideas and clarity of thought have always inspiredme, and to him and Christopher Portmann for truly insightful interactions that have undoubt-edly beneﬁted my work. Further, I thank my examiners, Matt Pusey and Stefan Wolf for theiruseful comments that have enhanced the ﬁnal version of this thesis.The unwavering support of my family and friends have been vital to the completion of thisthesis. My aunt, Indumathi for several enlightening discussions on abstract mathematicalconcepts since high school days. Meghana and Avni for being ever-prepared to endure myexcited monologues about quantum physics, and being the main contributors to my knowledgeof fascinating scientiﬁc facts outside of physics. Nemanja for his invaluable support with allthings tech and for being a patient audience to several practice talks. Lastly but importantly,my mother, Sujatha for inspiring my academic pursuits at a young age by impressing upon methat knowledge should be sought not merely as a means to an end, but as an end in itself.ix eclarations

I declare that this thesis is a presentation of original work and I am the sole author.This work has been carried out under the supervision of Prof. Roger Colbeck andhas not previously been presented for an award at this, or any other University.Chapters 4 and 5 are primarily based on publications [1] and [2] listed below whichwere carried out with Prof. Colbeck, while Chapter 8 is based on the publication[3] which is joint work with collaborators from ETH Zürich, Switzerland. Chapter 6is based on yet unpublished work [4] carried out with Prof. Colbeck. Further,the Mathematica package [5] was developed for use in publications [1] and [2].Additional research work carried out or published during the doctoral studies inrelated topics, but not reported in this thesis are given in [6] and [7]. All othersources are acknowledged and listed in the bibliography.

List of publications included in this thesis [1] Vilasini, V. and Colbeck, R.

Analyzing causal structures using Tsallis entropies.

PhysicalReview A 100, 062108 (2019).[2] Vilasini, V. and Colbeck, R.

Limitations of entropic inequalities for detecting nonclas-sicality in the postselected Bell causal structure . Physical Review Research 2, 033096(2020).[3] Vilasini, V., Nurgalieva, N. and del Rio, L.

Multi-agent paradoxes beyond quantum theory.

New Journal of Physics 21, 113028 (2019).

In preparation and code [4] Vilasini, V. and Colbeck, R.

Cyclic and ﬁne-tuned causal models and compatibility withrelativistic principles.

In preparation (2020).x

IST OF FIGURES [5] Colbeck, R. and Vilasini, V.

LPAssumptions (A Mathematica package for solving linearprogramming problems with free parameters, subject to assumptions on these parameters) (2019). https://github.com/rogercolbeck/LPAssumptions . Additional research work not included in this thesis [6] Vilasini, V., Portmann, C. and del Rio, L.

Composable security in relativistic quantumcryptography.

New Journal of Physics 21, 043057 (2019).[7] Vilasini, V., del Rio, L. and Renner, R.

Causality in deﬁnite and indeﬁnite space-times.

Inpreparation (2020). Extended abstract at: https://wdi.centralesupelec.fr/users/valiron/qplmfps/papers/qs01t3.pdf xi HAPTER Introduction1.1 Preface Q uantum theory makes several predictions that ﬂy in the face of common intuition and havepuzzled its very founders— energy appearing only in discrete quanta, particles exhibitingwave-like properties such as interference and superposition, entanglement of distant particles,the list goes on. Nevertheless, quantum theory has passed several decades of stringent ex-perimental tests, establishing itself as one of the most successful theories of nature currentlyavailable. Why does nature seem to comply with the predictions of quantum theory? Whatare the physical principles that uniquely deﬁne quantum theory? These questions continueto be actively researched, even after a century since the inception of the theory. The pro-cess has repeatedly revealed the inter-dependent and mutually reinforcing relationship betweentechnological progress and advancements in fundamental scientiﬁc research— an understand-ing of fundamental principles enables us to harness their potential for technologies, which inturn allow us to probe nature at greater detail. Revolutionary technologies such as the laser,nuclear energy, positron emission tomography (i.e., PET medical imaging) and quantum com-puting would not exist if not for fundamental explorations into the microscopic regime, fedby human curiosity to comprehend the ways of nature. On the other hand, breakthroughs inhigh-precision measurements are enabling the observation of quantum eﬀects at newer physicalregimes, making it imperative to model the implications of extending quantum theory to largerand more complex systems.In the modern era of physics, the ﬁeld of quantum information theory has led to great progressin understanding the foundational aspects of the quantum regime, while also bringing to atten-tion the increased information-processing capabilities of quantum theory. A lot of this progresstraces back to a seminal theorem by John Bell in 1964 [22] proving the incompatibility of quan-tum predictions with certain classical models of the physical world. The remarkable fact is1 HAPTER 1. INTRODUCTION that this incompatibility can in principle be witnessed at the level of empirical data such thesettings of knobs and values of pointers on measurement devices, by testing the strength ofcorrelations between the data obtained by non-communicating parties. This has made possiblecryptographic protocols such as key distribution whose security is guaranteed by the laws ofphysics [79, 145, 17], as opposed to weighing an adversary’s computing power against the dif-ﬁculty of solving a mathematical problem. It has also enabled the generation and certiﬁcationof true randomness [169, 60, 5], leading to the development of quantum random number gener-ators that have now become commercially available. In the ﬁeld of computing, Shor’s quantumalgorithm [192] enables prime factorisation in polynomial time, oﬀering a signiﬁcant speedupover existing classical algorithms, which has major implications for the security of modern daycryptosystems. This has attracted huge investments from governments as well as companiessuch as Google, IBM and Microsoft, that are making steady progress towards the developmentof scalable quantum computers as well as quantum cryptographic primitives. Furthermore,quantum advantages in several communication tasks have also been reported [211, 40], andthere are global eﬀorts towards establishing quantum communication in space through satellitenetworks and also developing the quantum internet.These technological feats are backed by theoretical research into quantum correlations, quantumnetworks, relativistic quantum information and quantum circuits, among others. Many of theseareas fall in the ambit of the study of causality and causal structures in non-classical theories, ofwhich there are several approaches. For example, reformulating Bell’s theorem in terms of thediﬀerent constraints implied by causal structures on correlations in classical and non-classicaltheories has revealed possible ways to generalise it to more complex scenarios involving multipleparties and sources of correlations [225, 48]. Analysing causal structures beyond the standardCHSH Bell scenario [56] are important for building quantum networks for communication anddistributed computing. Furthermore, in protocols involving multiple agents distributed overspace-time, relativistic principles such as the ﬁnite speed of signalling, also play a role in re-stricting the information processing possibilities and consequently, in the security of relativisticcryptographic protocols [59, 210]. The study of causation in quantum theory has also extendedthe analysis of quantum properties such as superposition and entanglement from the spatial tothe temporal domain, and generalisations of quantum circuits have been introduced for mod-elling spatio-temporal correlations [53, 154, 172]. Such scenarios have not only been physicallyimplemented [146, 174, 185], they have also been shown to oﬀer additional information-theoreticadvantages in numerous information-processing tasks [37, 9, 107, 51]. These examples can alsobe used to simulate thought experiments where the space-time structure itself exhibits quantumproperties, arising from quantum gravitational eﬀects [231]. Therefore a more complete under-standing of causation and its relation to space-time structure in quantum theory is essentialfor harnessing its full information-processing potential and is likely to aid the long sought afteruniﬁcation of quantum theory with general relativity.The ability to create and manipulate quantum superpositions of larger composite systems is animportant consequence of the advancements in quantum networks and computing technologies.While we don’t observe quantum eﬀects at macroscopic scales, the possibility of such an obser-vation under careful experimental conditions is not forbidden by any known physical principle. The RSA encryption scheme is based on the complexity of this problem. HAPTER 1. INTRODUCTION

Several theoretical and experimental research groups around the globe have been invested inprobing the quantum to classical transition, whether there is a fundamental scale at whichsuch a transition should occur remains a burning open question. Recent work has revealedthat extending quantum theory to more complex systems can lead to a conﬂict with simpleand commonly employed principles of logical reasoning [85, 152]. Continued research into thisdomain would hence enable a better characterisation of the structure of logic that apply tolarger scale quantum systems, such as quantum computers. It would also generate insightsinto whether the principles of quantum theory are universally applicable, or whether they aremerely a very good approximation for the microscopic realm oﬀered by a more fundamentaltheory that we are yet to discover.This thesis concerns itself with the two areas of quantum information research enunciatedabove, namely the study of causality and of multi-agent logical paradoxes that can arise whenquantum theory is extended to more complex systems. Our analysis of these topics also extendsto post-quantum theories, since it is often the case that stepping beyond quantum theory to amore general setting informs an understanding of what makes quantum theory special amongthe plethora of other physical theories. We summarise the main contributions of this thesisbelow, and its relevance for the broader open questions described here.

This thesis divided into two parts and consists of a total of nine chapters. The ﬁrst part focuseson causality while the second on multi-agent paradoxes in non-classical theories. Chapters 4,5, 6 and 8 contain the majority of the original contributions of this thesis which are based onpublished [207, 208, 209] as well as unpublished work . Leaving out the present chapter, ofwhich the current section is the last, we summarise the main contents and/or contributions ofthe remaining eight chapters below. Chapter 2: Preliminaries

Here we introduce the background concepts and mathemati-cal tools that will be important for the rest of this thesis. An expert reader may choose toskip over parts or the whole of this chapter. We ﬁrst cover the basics of polyhedral geometryand computation, followed by an overview of information-theoretic entropic measures and theirproperties. Both these topics are central to the results of Chapters 4 and 5. We then dis-cuss a class of post-quantum theories, namely generalised probabilistic theories, before movingon to the preliminary concepts about causal structures in classical, quantum and generalisedprobabilistic theories. The former will be relevant for both parts of the thesis while the latterwill predominantly be employed in Part I. We conclude this chapter with a review of epistemicmodal logic, a concept that will feature in Part II of the thesis. Vilasini, V. and Colbeck, R.

Cyclic and ﬁne-tuned causal models and compatibility with relativistic principles.

In preparation (2020). HAPTER 1. INTRODUCTION

Part I: Approaches to causality in non-classical theories

Causality is a fundamental part of our perception of the universe. A precise understandingof cause-eﬀect relationships is integral to the scientiﬁc method, having found applications indrug trials, economic predictions, machine learning as well as biological and physical systems.Quantum mechanics, a theory that has been immensely successful in explaining microscopicphysics, strongly challenges our classical intuitions about causation, deeming classical modelsinadequate for describing quantum phenomena. Apart from this fundamental implication, aplethora of information-theoretic applications arise from the study of causal structures, thatbeneﬁt from a quantum over classical advantage. Extending this study to more general, post-quantum theories facilitates this understanding by highlighting the aspects of causality thatmight be special to quantum theory. Part I of this thesis is dedicated to the analysis of causalstructures, both acyclic and cyclic, and using both entropic and non-entropic techniques.

Chapter 3: Overview of techniques for analysing causal structures

Here, we providea more speciﬁc overview of the techniques for certifying the non-classicality of causal structures,that will be employed in Chapters 4 and 5. We ﬁrst discuss the techniques employed when theobserved correlations in a causal structure are represented in probability space, focusing on thegeometry of the correlation sets in the bipartite Bell causal structure. Following this, we presentan overview of methods to certify non-classicality in entropy space. This includes a summary ofthe entropy vector method and the post-selection technique which is often employed with thismethod. This chapter is based on the introductory sections of our published papers [207, 208].

Chapter 4: Entropic analysis of causal structures without post-selection

In thischapter, we develop new techniques for using generalised entropy measures to analyse causalstructures, particularly Tsallis entropies. Employing the Shannon entropy for this purpose isknown to have certain limitations [215, 217] and the use of generalised entropies have not beenpreviously considered. Our ﬁrst contribution is a new set of constraints on the Tsallis entropiesthat are implied by (conditional) independence between classical random variables, which weapply to causal structures. Finding that the standard entropic method becomes computation-ally intractable even for small causal structures, we propose a way to circumvent this issue andobtain a partial solution, namely a new set of Tsallis entropic necessary constraints for clas-sicality in the Triangle causal structure. Tsallis entropies have found important applicationsin other areas of information theory and in the ﬁeld of non-extensive statistical physics, hencethe properties of Tsallis entropies derived here can be useful even beyond the analysis of causalstructures. Further, we also ﬁnd that Rényi entropies pose signiﬁcant limitations for certifyingthe non-classicality of causal structures. This chapter is based on our published work [207]. Additionally, we also generalise these constraints to quantum Tsallis entropies in certain cases, under asuitable notion of quantum conditional independence. But this is not central to the remaining results. HAPTER 1. INTRODUCTION

Chapter 5: Entropic analysis of causal structures with post-selection

We investigatewhether the entropic method combined with the post-selection technique allows non-classicalityto be detected whenever it is present. We focus on the bipartite Bell causal structure, where theanswer is known to be positive for the case of binary valued measurement outcomes [46]. Devel-oping an analogue of the technique used in [46] to the scenario where there are three outcomesand two inputs per party, we identify two families of non-classical distributions here, namelythose whose non-classicality can/cannot be detected through this technique. We then inves-tigate extensions of the technique by allowing the observed correlations to be post-processedaccording to a natural class of non-classicality non-generating operations prior to testing thementropically. We provide numerical as well as analytical evidence that indicate that even undera natural class of post-processings, entropic inequalities for either the Shannon or Tsallis en-tropies are generally not suﬃcient to detect non-classicality in the Bell causal structure. Ourwork provides insights into some of the advantages and the limitations of the entropic tech-nique, and is the ﬁrst to report drawbacks of the technique in post-selected causal structures.This chapter is based almost entirely on published material [208].

Chapter 6: Cyclic and ﬁne-tuned causal models and compatibility with relativisticprinciples

In this chapter, we investigate the relationships between causation and space-timestructure. We develop an operational framework for modelling cyclic and ﬁne-tuned causal in-ﬂuences in non-classical theories, and characterise their compatibility with an underlying space-time structure. Distinguishing between causal loops based on the ability to operationally detectthem, we derive necessary and suﬃcient conditions for ruling out operationally detectable loopsin these causal models, and for the model to not signal superluminally with respect to a space-time structure. This provides a mathematical framework for analysing generalisations of theso-called “jamming” scenario [105, 116] whereby a party superluminally inﬂuences correlationsbetween other space-like separated parties. We prove, through an explicit protocol that thesescenarios can lead to superluminal signalling (without additional assumptions), contrary to theoriginal claim that they are compatible with relativistic principles. Furthermore, we analyse aclaim made in [116] that a set of no-signaling conditions are necessary and suﬃcient for rulingout causal loops in multipartite Bell scenarios. Our results also identify missing assumptionsin these claims and generalise them to arbitrary causal structures. Finally, we present ideasfor building a more complete framework for causally modelling these general scenarios, whichis likely to be of broader relevance. This work is yet unpublished.

Part II: Multi-agent paradoxes in non-classical theories

Logical deduction is another rudimentary aspect of our understanding of the world around us,which we begin to employ since early childhood. In recent years, it has been shown that quantumtheory, when extended to the level of the observer, challenges some of the commonly employedrules of logical reasoning. More precisely, when observers model each other’s memories (wherethey store their measurement outcomes) as quantum systems and use simple logical principles toreason about each other’s knowledge, they can arrive at a deterministic contradiction [85, 152].This multi-agent logical paradox proposed by Frauchiger and Renner has generated a surge of HAPTER 1. INTRODUCTION scientiﬁc interest in understanding the role of observers as quantum systems, and the objectivityof their physical observations. An observer here need not necessarily be a conscious entity, butone capable of implementing quantum measurements and performing simple deductions basedon the outcomes— something that a relatively small quantum computer can be programmedto do (in principle). Experimental eﬀorts towards the developing scalable quantum computerscould make possible a physical implementation of such scenarios in the near future, making itimperative to develop a better theoretical understanding thereof. Part II of this thesis focuseson multi-agent paradoxes, in and beyond quantum theory, for we must step outside the quantumformalism to better comprehend its peculiarities.

Chapter 7: Multi-agent paradoxes in quantum theory

In this chapter, we present anentanglement-based version of the Frauchiger-Renner thought experiment [85] and explain itsseemingly paradoxical result. We brieﬂy analyse the relationship of this to other results withsimilar implications for the objectivity of agents’ observations. We conclude with a discussionabout the implications of this result for the interpretations of quantum theory.

Chapter 8: Multi-agent paradoxes beyond quantum theory

We generalize assump-tions involved of the Frauchiger-Renner result so that they can be applied to arbitrary physicaltheories. This provides an operational framework for modelling agents as systems of a gen-eral physical theory, which we apply for modelling how observers’ memories may evolve in boxworld, a particular post-quantum, probabilistic theory. We use this to ﬁnd a deterministic con-tradiction in the case where agents share a PR box, a bipartite box world system. Our versionof the paradox in box world is stronger than the quantum one of Frauchiger and Renner, in thesense that it does not rely on post-selection. It also reveals that reversibility of the memoryupdate, akin to quantum unitarity is not necessary for witnessing such paradoxes, suggestingthat certain forms of contextuality are to be held responsible. Obtaining an inconsistency inthe framework of generalised probabilistic theories broadens the landscape of theories whichare aﬀected by the application of classical rules of reasoning to physical agents, and enables adeeper understanding of the features of quantum theory that lead to an incompatibility withclassical logical structures. This chapter is based on our published paper [209].

Chapter 9: Conclusions and oulook

We present concluding remarks on the contributionsmade in both parts of this thesis, and their potential scope for addressing immediate as well asbroader open problems in the foundations of physics and information theory. HAPTER Preliminaries I n this chapter, we review the necessary background and main mathematical tools that willbe employed in this thesis. We begin with an overview of convex geometry, optimization andpolyhedral computations in Section 2.2. In Section 2.3, we present the information-theoreticentropy measures that are relevant to this thesis, namely Shannon, von-Neumann, Tsallis andRényi entropies, and outline some of their useful properties. We then review the framework ofgeneralised probabilistic theories (GPTs) in Section 2.4, mainly following the work of Barrett[16]. Section 2.5 summarises the basics of classical and non-classical causal models which willbe primarily based on the classical Bayesian networks approach of Judea Pearl [160] and theframework of generalised causal structures developed by Henson, Lal and Pusey [115] thatapply to quantum and GPT causal structures. These concepts form the backbone of Part I ofthis thesis which concerns causality in classical and non-classical theories. In Section 2.6 wesummarise the main structures and axioms underlying the modal logic framework. This, alongwith the framework for GPTs mentioned above form the main prerequisites for Part II of thisthesis, which concerns multi-agent logical paradoxes in quantum and GPT settings. We ﬁrstpresent a brief overview of notational conventions used in this thesis. Throughout this thesis (unless speciﬁed otherwise), we will employ the following notationalconventions.

Random variables, probabilities, entropies:

We will use capital letters (typically fromthe second half of the English alphabets) to denote random variables or RVs for short e.g., X , Y , Z . We will also use the same label to denote the set of possible values taken by the random7 HAPTER 2. PRELIMINARIES variable, the corresponding small letter to denote a particular value of the random variable and ∣ . ∣ to denote cardinality of sets. For example, the random variable X takes values x ∈ X , andthere are ∣ X ∣ possible values that X can take. We will only consider ﬁnite and discrete randomvariables throughout this thesis. For a probability distribution over a set of random variables { X , ..., X n } , we will use P X ,...,X n ∈ P n , where P n denotes the set of all probability distributionsover n discrete random variables. Whenever it is necessary to mention the speciﬁc values of therandom variables, we will abbreviate P X ,...,X n ( X = x , ..., X n = x n ) to P ( X = x , ..., X n = x n ) or simply P ( x , ..., x n ) . When an equation contains only distributions labelled by variablesdenoted in capital letters, it should be interpreted as being satisﬁed for every value of thosevariables (denoted by the corresponding small letter). For example P XY = P X P Y should be readas P ( xy ) = P ( x ) P ( y ) ∀ x ∈ X , y ∈ Y , where P ( xy ) is short for P XY ( X = x, Y = y ) . We will usethe short form XY or sometimes X, Y to denote the union of two variables/sets of variables.Further, we will use H ( XY... ) (or sometimes, H ( X, Y, ... ) ) to denote joint entropies of sets ofrandom variables X , Y , .... Quantum formalism:

Given a Hilbert space H , we will use P ( H ) to represent the setof positive, semi-deﬁnite operators on H , and S ( H ) to denote the set of positive semi-deﬁnite and trace one operators on H . Quantum states will, in general be represented bydensity operators, i.e., ρ ∈ S ( H ) . Hermitian conjugate will be denoted using the usual dag-ger notation, ρ † and matrix transpose by the subscript T , ρ T . Rank 1 density operators i.e.,pure states will be denoted as elements of the Hilbert space H , in the conventional bra-ketnotation ∣ ψ ⟩ ∈ H and ∣ ψ ⟩ † = ⟨ ψ ∣ ∈ H ∗ , where H ∗ is the dual space of H . Quantum chan-nels or transformations on quantum states correspond to completely positive trace-preserving (CPTP) maps E ∶ S ( H A ) ↦ S ( H B ) that map an input state ρ A ∈ S ( H A ) to an output state ρ B = E ( ρ A ) ∈ S ( H B ) . Quantum measurements will be described by positive operator valuedmeasures (POVMs): a POVM is a set { E x } x ∈ X with E x ∈ P ( H ) , labelled by the classicalvalues x , and summing to identity ∑ x ∈ X E x = . We will often abbreviate { E x } x ∈ X to { E X } X .For the purposes of this thesis, we will only consider a discrete and ﬁnite set of measurementoutcomes, so X corresponds to a discrete and ﬁnite random variable taking values x ∈ X . Causal structures and space-time diagrams:

Everywhere in the thesis, except for Chap-ter 6, we will use the regular arrows —→ to denote causal inﬂuence (from cause to eﬀect). InChapter 6, we will use for the same purpose because we would like to categorise these intosolid arrows —→ and dashed arrows (cid:57)(cid:57)(cid:75) based on how these causal inﬂuences are detectedoperationally. A vast majority of the causal structures appearing in the rest of the thesis cor-respond to the —→ case, which justiﬁes this notation. All space-time diagrams will have spacealong the x-axis and time along the y-axis. Logical operators and others:

We will use the standard logical operators i.e., ¬ for ‘not’, ∧ for ‘and’, ∨ for ‘or’, ⇒ for ‘if...then’, and ⇔ for ‘equivalent.’ Further, ⊕ will be used to denotemodulo-2 addition of classical bits i.e., the bitwise OR. Logarithms will use base e i.e., we willconsider natural logarithms and denote it by ln. HAPTER 2. PRELIMINARIES

The lecture notes [92] provide a thorough introduction to polyhedral computation, while [31,104, 142] are good resources for convex geometry and their application in optimisation problems.In this section, we give an overview of the main aspects of these topics that will be relevant tothe current thesis and will focus our attention on the vector spaces V = R n over the orderedﬁeld of reals R . The following special types of linear combinations are central to the concepts of polyhedraltheory and convex optimization.

Deﬁnition 2.2.1 (Aﬃne, conic and convex combinations) . Consider a set of points x , ..., x m ∈ V and their linear combination ∑ mi = α i x i with real coeﬃcients α i . Then,1. ∑ mi = α i x i is an aﬃne combination if ∑ mi = α i = ∑ mi = α i x i is a conic combination if α i ≥ ∀ i .3. ∑ mi = α i x i is a convex combination if it is both aﬃne and conic i.e., ∑ mi = α i = α i ≥ ∀ i .The aﬃne, conic and convex hull of a set of points are then naturally deﬁned with respect tothe corresponding type of linear combination. Deﬁnition 2.2.2 (Aﬃne, conic and convex hull) . The aﬃne hull of a set of points S ⊆ V , isthe set of all aﬃne combinations of points in S and is denoted as Aﬀ ( S ) . Similarly the conicand convex hull of S correspond to the sets of all conic and convex combinations of S and aredenoted by Cone ( S ) and Conv ( S ) respectively.Analogous to the concept of linear independence, a set of points S is said to be aﬃnely inde-pendent if and only if none of the points in S can be expressed as an aﬃne combination of theremaining points in S . A set S ⊆ V that is closed under aﬃne combinations is called an aﬃnesubspace of V = R n .We now proceed to discuss the central objects of polyhedral theory namely, convex sets, convexcones, polyhedra and polytopes, based on the concepts deﬁned above. Deﬁnition 2.2.3 (Convex set) . A subset

S ⊆ V is said to be convex if for any two points x , x ∈ S , the line segment [ x , x ] ∶= { x ∶ x + ( − α ) x , ≤ α ≤

1} is contained in S . HAPTER 2. PRELIMINARIES

By applying Deﬁnition 2.2.3 inductively to the set S , it can be shown that a set is convex if andonly if it contains every convex combination of its points. Convex sets of particular interest tous are convex cones , and a further subset of those, polyhedral cones . Deﬁnition 2.2.4 (Convex cone) . A set

C ⊆ V is called a cone if for every x ∈ C , and α ≥ αx ∈ C . C is a convex cone if it is convex and a cone i.e., for any x , x ∈ C , and α , α ≥ α x + α x ∈ C (2.1)Note that a set is a convex cone if and only if it contains every conic combination of its points. Deﬁnition 2.2.5 (Polyhedral cone) . A set

P ⊆ V is said to be a polyhedral cone if it is a polyhedron and a cone , where a polyhedron P is a subset of V that can be expressed as thesolution set of a ﬁnite number of linear inequalities, P = { x ∈ V ∶ Ax ≤ b and Cx = d } , A ∈ R m × n , b ∈ R m , C ∈ R k × n , d ∈ R k . (2.2)By construction, a polyhedron corresponds to an intersection of half-spaces { x ∈ V ∣ Ax ≤ b } and hyperplanes { x ∈ V ∣ Cx = d } , and is convex. This is known as the H -representation (where H stands for half-space) of a polyhedron, and bounded polyhedra are called polytopes . Since acone, by deﬁnition contains the origin cone, ∈ V = R n , polyhedral cones can be expressedin the more concise form P = { x ∈ V ∶ Ax ≤ } . Polyhedra (and hence polytopes) can alsobe represented through a V -representation (where V stands for vertex) which follows from animportant result in polyhedral theory, the Minkowski-Weyl Theorem. Theorem 2.2.1 (Minkowski-Weyl Theorem) . For

P ⊆ V , the following statements are equiva-lent1. P is a polyhedron,2. P is ﬁnitely generated, i.e., there exist ﬁnite sets S, T ⊂ V such that P =

Conv ( S ) + Cone ( T ) , (2.3) where addition of sets is deﬁned with respect to the Minkowski sum, i.e., S + S ∶= { s + s ∶ s ∈ S , s ∈ S } for S , S ⊆ V . The second statement deﬁnes the V -representation of P . In the case of a polyhedral cone givenby P = { x ∈ V ∶ Ax ≤ } in the H -representation, S ⊂ V can be taken to be the empty set in the Note that the equalities Cx = d in Equation (2.2) can be equivalently written in terms of inequalities as Cx ≥ d and Cx ≤ d , but we will often write these separately (as is common in the convex optimization literature). We will adopt these deﬁnitions in the rest of thesis, noting that in the literature, the opposite conventionfor deﬁning polyhedra and polytopes is sometimes used. Strictly speaking, unbounded polyhedra have extremal rays rather than extremal points or vertices (deﬁnedlater in the section). Nevertheless, the representations in terms of extremal rays (in the unbounded case) aswell as in terms of vertices (bounded case) are known as the V -representation. HAPTER 2. PRELIMINARIES corresponding V -representation. On the other hand, a polytope (in its V -representation) canbe written as the convex hull of a ﬁnite set of points, and T can be taken to be the empty setin this case. A simplex is a polytope that can be written as the convex hull of a ﬁnite set of aﬃnely independent points.The geometry of a polyhedron P ⊆ V is characterised by its faces which are deﬁned in terms ofvalid linear inequalities of P . A linear inequality is valid for P if it holds for all x ∈ P . Deﬁnition 2.2.6 (Faces of a polyhedron) . For a polyhedron

P ⊆ V , a subset F ⊆ P is called ais called a face of P if it is represented as F = P ∩ { x ∶ c T x = d } , (2.4)for some valid inequality c T x ≤ d .All faces of a polyhedron are by construction, polyhedrons themselves. Faces of dimensions 0, 1and dim (P)− vertices , edges and facets respectively. It can be shown that vertices ofa polyhedron are equivalent to its extreme points which are points in P that cannot be writtenas a convex combination of other points in P . Hence will use these terms interchangeably.Further, when an edge of a polyhedron is unbounded, it can either be a line (unbounded inboth directions or half-line (starting from a vertex and unbounded in one direction). In thelatter case, the edge is called an extremal ray . In the V -representation of a polyhedron givenby statement 2 of Theorem 2.2.1, it is often convenient to take the sets S and T to correspondto the set of extremal points and extremal rays respectively. Fourier-Motzkin Elimination is a method for projecting higher dimensional polyhedra to lowerdimensional ones through variable elimination, and forms an important part of the computa-tional methods employed in this thesis. Here, we describe the mathematical concepts underpin-ning this algorithm. For this, we start with linear transformations of which the transformationsof interest, projections are a subset.

Deﬁnition 2.2.7 (Linear transformation) . A linear transformation is a map f ∶ R n → R m thatacts as f ∶ x ↦ Ax , where A ∈ R m × n .It can be immediately shown that linear transformations preserve convexity (since convex com-binations are particular cases of linear combinations). Additionally, linear transformation alsomap convex cones to convex cones and polyhedral cones to polyhedral cones. Projections π arelinear transformations that are idempotent i.e., π.π = π . In particular, we will consider orthog-onal projections which are projections on a Hilbert space that satisfy ⟨ π ( u ) , v ⟩ = ⟨ u, π ( v )⟩ for Extremal rays can also be deﬁned as a subset R of P that cannot be expressed as a (non-trivial) coniccombination of points not belonging to R . HAPTER 2. PRELIMINARIES the Hilbert space inner-product ⟨ , ⟩ between any vectors u and v in the space. In our case thisHilbert space is simply R n and ⟨ , ⟩ is the corresponding dot product, and in the remainder ofthe thesis, projections must be taken to mean orthogonal projections.For a polyhedron P ⊆ V expressed as P = {( x, y ) ∈ R n × R n ∶ Ax + By ≤ b } , with n + n = n ,the projection of P into the subspace of the y variables π y (P) ∶ V ↦ V is deﬁned as π y (P) ∶= {( , y ) ∶ y ∈ R n , and ∃ x ∈ R n such that ( x, y ) ∈ P} . (2.5) π y is a linear transformation that can be represented by a n × n matrix with an n × n identitymatrix as its second block, and zeroes everywhere else. In the V representation, a projection π transforms the convex/conic hull of a set of points { x , ..., x m } to the convex/conic hull ofthe points { π ( x ) , ..., π ( x m )} . Given a polyhedron P in the H -representation { x ∈ V ∶ Ax ≤ b } as input, the Fourier-Motzkin Elimination (FME) procedure outputs the H -representation, orthe inequalities describing the projected polyhedron π y (P) . The projection π y is implementedthrough FME by eliminating the ﬁrst n variables in the system of inequalities deﬁning P .Taking x to represent the n -dimensional vector with components x i , the following steps detailthe Fourier-Motzkin Elimination procedure for eliminating the variable x . This procedure canthen be iterated n times to eliminate all required variables.1. The inequalities Ax = b are partitioned into three sets I + , I − and I . I + denotes the setof all inequalities in Ax ≤ b where the variable x has a strictly positive coeﬃcient, i.e.,all inequalities where A i >

0. Similarly, I − is the set of all inequalities where x has astrictly negative coeﬃcient, and I is the set of all inequalities where x does not appeari.e., its coeﬃcient is 0.2. If I + is empty, then the inequalities in I − are ignored, and vice-versa. If both I + and I − are non-empty, then the following steps are undertaken. Every inequality A i x ≤ b i in I + is rearranged and expressed as x ≤ A i ( b i − ∑ k ≠ A ik x k ) ∶= f i ( x , ..., x n ) . (2.6)There are ∣ I + ∣ such inequalities. Similarly, every inequality A j x ≤ b j in I − is expressed as(noting that A j < x ≥ A j ( b j − ∑ l ≠ A jl x l ) ∶= g j ( x , ..., x n ) , (2.7)and there are ∣ I − ∣ inequalities of this type. Equations (2.6) and (2.7) are then combinedto give the following set of ∣ I + ∣ . ∣ I − ∣ inequalities that are independent of the variable x , g j ( x , ..., x n ) ≤ f i ( x , ..., x n ) . (2.8) Here, the projection π is deﬁned as an endomorphism on V i.e., a map from V to itself. However, it can beequivalently seen as a map from V to a lower dimensional space by dropping additional zeroes, i.e., replacing ( , y ) ∈ R n with y ∈ R n in the Equation (2.5). HAPTER 2. PRELIMINARIES x yz (1 , ,

0) (1 , , , , z = 0 − x + y + z = 0 − y + z = 0 (a) xy (1 , , , , − x + y = 0 (b) yz (0 , ,

3) (0 , , − y + z = 0 (c) xz (2 , , , , − x + z = 0 (d) Figure 2.1: Projections of a polyhedral cone (Example 2.2.1): (a) illus-trates a polyhedral cone ( P ), with 3 faces ( H -representation) and 3 extremal rays ( V -representation) as indicated. Projections of P on to the xy , yz and xz -planes are alsopolehedra, as shown in (b), (c) and (d) respectively. In all four subﬁgures, the correspond-ing polyhedron (blue region) is unbounded, and extends to inﬁnity along the direction ofthe extremal rays indicated.

3. The union of the ∣ I + ∣ . ∣ I − ∣ inequalities of Equation (2.8) with those of I characterise theﬁnal output set of inequalities corresponding to the projection into the space of the n − x , ..., x n .Note that the ﬁnal set of inequalities obtained in the FME procedure is often not minimal,in the sense that it can contain several redundancies. This can however be checked eﬃcientlythrough a linear program, and a characterisation of the projected polyhedron in terms of fewer,non-redundant inequalities can be obtained (as described in Section 2.2.3.1 and Figure 2.2). Example 2.2.1.

Consider the polyhedral cone

P ∈ V of Figure 2.1a which, in the H -representation HAPTER 2. PRELIMINARIES is given by

P = ⎧⎪⎪⎪⎨⎪⎪⎪⎩ ⎛⎜⎝ xyz ⎞⎟⎠ ∈ R RRRRRRRRRRRRRR ⎛⎜⎝− − − ⎞⎟⎠ ⋅ ⎛⎜⎝ xyz ⎞⎟⎠ ≤ ⎛⎜⎝ ⎞⎟⎠⎫⎪⎪⎪⎬⎪⎪⎪⎭ (2.9) The same polyhedron in expressed in the V -representation as P =

Cone ⎛⎜⎝⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠⎞⎟⎠ (2.10) The projection matrices in the 3 coordinate planes are, π xy = ⎛⎜⎝ ⎞⎟⎠ , π yz = ⎛⎜⎝ ⎞⎟⎠ , π xz = ⎛⎜⎝ ⎞⎟⎠ , (2.11) Then the projections of P onto each of the coordinate planes correspond to the polyhedra P xy (Figure 2.1b), P yz (Figure 2.1c) and P xz (Figure 2.1d) expressed in the V -representation as P xy = Cone ⎛⎜⎝⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠⎞⎟⎠ , P yz = Cone ⎛⎜⎝⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠⎞⎟⎠ P xz = Cone ⎛⎜⎝⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠ , ⎛⎜⎝ ⎞⎟⎠⎞⎟⎠ (2.12) The same projections can be obtained by performing Fourier-Motzkin Elimination on the H -representation of P (Equation (2.9) ) as shown below.1. For the projections π xy , π yz , π xz , we wish to eliminate the variables z , x and y respectively.Denoting the variable to be eliminated by a superscript, we have I z + = {− x + y + z ≤ , − y + z ≤ } , I z − = {− z ≤ } , I z = ∅ I x + = ∅ , I x − = {− x + y + z ≤ } , I x = {− y + z ≤ , − z ≤ } I y + = {− x + y + z ≤ } , I y − = {− y + z ≤ } , I y = {− z ≤ }

2. Since I x + is empty, we can ignore I x − and only need I x to describe P yz . I y + , I y − , I z + and I z − areall non-empty, so we rearrange the corresponding inequalities as I z + = { z ≤ x − y, z ≤ y } , I z − = { z ≥ } and I y + = { y ≤ x − z } , I y − = { y ≥ z } . Combining the inequalities in each casewe have x − y ≥ , y ≥ for the projection π xy , y − z ≥ , z ≥ for the projection π yz and x − z ≥ , z ≥ for the projection π xz .3. We obtain the H -representation of the projected polytopes as P xy = { ( xy ) ∈ R ∣ (− − ) ⋅ ( xy ) ≤ ( )}P yz = { ( yz ) ∈ R ∣ (− − ) ⋅ ( yz ) ≤ ( )}P xz = { ( xz ) ∈ R ∣ (− − ) ⋅ ( xz ) ≤ ( )} (2.13) One can then see that the V -representation of the polyhedra are indeed those found inEquation (2.12) , as illustrated in Figure 2.1. HAPTER 2. PRELIMINARIES

Algorithmic complexity of FME:

The FME algorithm can be computationally costly dueto its high algorithmic complexity, which is, in the worst case double exponential. For an initialsystem of n i inequalities, a maximum of ( n i ) inequalities can be obtained after one eliminationstep. After k elimination steps, the algorithm can produce a maximum of 4 ( n i ) k inequalities,which scales double-exponentially in the number of iterations k . This makes the algorithmquite ineﬃcient in general. However, the method remains useful in several cases for which thescaling is far from the worst-case scaling. It can also be marginally improved by removing someof the redundancies using the so-called Chernikov rules [49, 50]. For the work presented inthis thesis, the algorithm was mainly implemented on the porta software [55] that takes intoaccount some of the Chernikov rules. A linear program (LP) consists of optimising a linear function subject to linear equality/inequalityconstraints. Every linear program can be expressed as a primal and a dual problem. For x, y ∈ R n , A ∈ R m × n , c ∈ R n and b ∈ R m we have, Primal program

Minimise c T x Subject to Ax ≥ b, x ≥ Dual program

Maximise b T y Subject to A T y ≤ c, y ≥

0A vector satisfying all the constraints of the primal/dual program is said to be a feasible solution of that program. Hence the sets of feasible solutions for the primal and the dual programs are { x ∈ V ∶ Ax ≥ b, x ≥ } and { y ∈ V ∶ A T y ≤ c, y ≥ } respectively. An optimal solution forthe primal (/dual) program is a feasible solution of that program that attains the smallest(/largest) value of the objective function c T x (/ b T y ). A linear program is called feasible ifthe corresponding feasible set of solutions is non-empty and it is said to be unbounded if theobjective function is not bounded above in the case of maximization or not bounded below inthe case of minimization problems. Note that the geometry of the feasible region of a linearprogram is a polyhedron, and hence convex. In the rest of this thesis, we will refer to this asthe standard form of the LP . Duality:

Using the deﬁnitions of the primal and dual programs, we can see that c T x ≥( A T y ) T x = y T Ax ≥ y T b = b T y . In other words, for any feasible solutions x and y of the primal(minimization) problem and the dual (maximization) problem, c T x ≥ b T y holds. This propertyis called weak duality . Another property which is not immediately evident, but nevertheless This corresponds to the case where ∣ I + ∣ = ∣ I − ∣ = n i and ∣ I ∣ = This might diﬀer from other texts but we choose this convention for convenience, since this is also the formatused by Mathematica, the main computational software used in this thesis. HAPTER 2. PRELIMINARIES true is that of strong duality . It states that if the primal and dual problems are feasible, thenthere exist a pair of feasible solutions x ∗ and y ∗ of these problems such that c T x ∗ = b T y ∗ . Byweak duality, these solutions will be optimal. Removing redundant constraints using LPs:

Linear programs can be eﬃciently usedto check whether a given system of linear inequalities contains any redundant inequalitiesthat can be dropped without aﬀecting the solution set. In general, to check whether the i th inequality in Ax ≥ b is redundant, one can minimise A i x subject to the remaining inequalities.If the optimal feasible solution is strictly smaller than b i , then the inequality A i x ≥ b i is notredundant. Otherwise, the inequality is redundant and can be ignored. An example is illustratedin Figure 2.2. Iterating this procedure for all inequalities in the system, we can obtain aminimal set of non-redundant inequalities that are equivalent to the original system. For thework presented in the current thesis, this method was used to remove redundant inequalities inthe output of the Fourier-Motzkin elimination procedure. Reducing inequalities at the outputof each iteration also helps in speeding up the computational procedure of FME. This wasprimarily implemented on Mathematica using the inbuilt LinearProgramming function.The

LinearProgramming function in Mathematica can only handle linear programs where theobjective vector c , constraint matrix A and the constraint vector b do not contain unspeciﬁedvariables. For example, we might wish to optimize x + ax , where a is an unspeciﬁed constantin the range 0 ≤ a ≤

1, returning the solution for all values of a in the range. For solving linearprograms involving such additional unknown variables (other than those being optimised over),subject to assumptions on them, we developed a Mathematica Package, LPAssumptions [62]that was used to produce some of the main computational results of Chapters 4 and 5. Ourpackage is based on the two-phase simplex method for solving linear programming problems.

The simplex algorithm is widely used for solving linear programming problems and was origi-nally proposed by George Dantzig [69]. The algorithm is fairly involved, consisting of severalsteps. Our Mathematica Package [62] (see Appendix 4.5.2 for details) implements this algo-rithm, however the full details are not required for understanding the numerical results producedusing this package. Hence, we provide only a brief overview of the main mathematical conceptsunderpinning this algorithm and refer the reader to [70, 142, 104] for further details of thealgorithm, and other methods for solving linear programs.As we have previously noted, the feasible region of a linear program is a polyhedron, and byTheorem 2.2.1, every polyhedron can be represented as a Minkowski sum of two sets: the formera convex hull of a ﬁnite set of vertices, and the latter a conic hull of a ﬁnite set of extremal rays.An extreme point or vertex of the polyhedron deﬁning the feasible region of a linear programis called a basic feasible solution (BFS). The following theorem (Theorem 3.3 of [123]) is at thecore of the simplex method. HAPTER 2. PRELIMINARIES

Theorem 2.2.2 ([123] Theorem 3.3) . Let

P ⊂ V be polyhedron with at least one extreme point.Then every linear program with the feasible solution set { x ∈ P} , is either unbounded or attainsits optimal value at an extreme point of P . Further, if a given extreme point x e of P is not an optimal solution, then it can be shown thatthere exists an edge of P containing x e such that the objective function strictly improves (i.e.,increases in case of a maximization LP and decreases in case of a minimization LP) as we moveaway from x e along that edge (Section 3.8 of [123]). Finite edges connect extreme points toextreme points and moving along this direction will lead to a new extremal point x ′ e with abetter value of the objective function. If the identiﬁed edge emanating from x e is unbounded,then the linear program has no ﬁnite solution. The simplex method is based on this intuition—starting with an initial vertex of the feasible set P i.e., an initial basic feasible solution, onemoves along edges of P to other vertices that improve the objective function value, until anoptimal vertex is reached or it is revealed that the problem is unbounded. Example 2.2.2 (Intuition behind the Simplex algorithm) . Consider the following simple linearprogram (left), expressed in the standard form (right).Maximise x + y Subject to x + y ≤ ,x ≤ ,y ≤ ,x, y ≥ Minimise (− , − ) T ⋅ ( x, y ) Subject to ⎛⎜⎝− − − − ⎞⎟⎠ ⋅ ( xy ) ≥ ⎛⎜⎝− − − ⎞⎟⎠ ,x, y ≥ As shown in Figure 2.2, the feasible region is a polytope (bounded polyhedron) in 2 dimensionswith 5 vertices,

V = {( , ) , ( , ) , ( , ) , ( , ) , ( , )} and 5 facets F = { x = , y = , y = , x + y = , x = } . From the ﬁgure, it is evident that the maximum must be attained at a vertex sincethe objective function f ( x, y ) = x + y is linear. Calculating the value of the objective function f at all 5 vertices, we immediately see that the maximum value f = is attained at the vertex ( , ) . That this is indeed the maximum value of the constrained optimization problem, and thatit is attained at the said point, can also be veriﬁed by using the method of Lagrange multipliers,for example. Phases:

The simplex algorithm proceeds in two phases. Phase I concerns the problem ofﬁnding an initial basic feasible solution . This is done by formulating a new linear program LP based on the original LP for which an initial BFS x is readily found. The construction of LP is such that applying the simplex method to x either yields an optimum x that would be aninitial BFS for the original problem LP , or reveals that LP is infeasible. If feasible, PhaseII uses the initial BFS x obtained in Phase I to ﬁnd an optimum solution (if a ﬁnite solutionexists) of the original problem LP through the simplex method. HAPTER 2. PRELIMINARIES (0 , ,

3) (1 . ,

3) (2 , , xy f = 272 f := f ( x, y ) = 2 x + 3 yf = 0 f = 9 f = 12 f = 10 f = 42 x + y = 6 y = 3 x = 2 Figure 2.2: Intuition behind the Simplex algorithm:

This ﬁgure illustratesthe simple linear program of Example 2.2.2, where the feasible region corresponds toa polytope (shaded in blue). The thick blue lines correspond to diﬀerent level curves f ( x, y ) = c of the objective function f . The maximum value of c for which ( x, y ) belongsto the feasible region is our required solution. In the present example, the maximumvalue of f =

12 is attained at the vertex ( . , ) (in accordance with Theorem 2.2.2).Starting at any other vertex, there is always an edge (in this 2D case, edges are also thefacets) along which the value of the f increased, and following the path speciﬁed by suchan edge at each subsequent vertex, we can always reach the optimal vertex. This ﬁgurealso illustrates how redundant inequalities can be removed using linear programming(explained in Section 2.2.3.1). Suppose that we have the 5 inequalities of Example 2.2.2,expressed as in the standard form of an LP, along with a sixth inequality − x − y ≥ − .From the ﬁgure, it is evident that the sixth inequality is not required to characterise theblue polytope, and can be dropped. This is checked by minimising − x − y subject to theother inequalities, which is the LP of Example 2.2.2. Since the minimum value − ≥ − ,we conclude that this inequality is redundant. On the other hand, − x − y ≥ − Column geometry and eﬃciency:

It can be shown that a linear programming problemwith m constraints and n variables has at most ( mn ) = m ! n ! ( m − n ) ! basic feasible solutions when m ≥ n . Hence a brute-force method for ﬁnding the optimum solution would be quite ineﬃcientin general. The merit of the simplex method is derived from its ability to ﬁnd the optimummuch more eﬃciently (relative to a brute-force method) by exploiting the geometry of convexsets. This consists of ﬁrst expressing the original LP involving inequality constraints in anequivalent form involving only equality constraints, by introducing certain additional slackvariables . Once the LP is bought to the desired equivalent form, the geometry deﬁned by thecolumn vectors of the new constraint matrix provide a way to systematically compute a better HAPTER 2. PRELIMINARIES

BFS at each step or to decide whether the LP is unbounded. The improved BFS in everyiteration is found by computing its so-called reduced cost that indicates how much the objectivefunction value will be improved by moving to that BFS. Hence the method manages to reachthe optimum solution (if it exists) in much fewer iterations on an average, as compared tobrute-force and other graph-traversal methods [70, 142]. It was originally found that for manylinear programs admitting an optimal solution, the simplex method found it in O ( n ) iterations[69, 70], which was the main reason for the success of the algorithm. However, over the years,a number of examples have surfaced where the number of iterations of the simplex methodscales exponentially in the number of variables. In fact, determining the number of iterationsneeded for solving a given linear program, is known to be an NP-hard problem. Nevertheless,the simplex method continues to be useful in several practical cases (which includes the workpresented in this thesis). The task of converting from the H to the V representation of a convex polyhedron is calledvertex enumeration while the reverse is called as facet enumeration. Representation conversionproblems are in general known to be computationally hard— there is no known vertex/facetenumeration algorithm that is polynomial in the input and output size and in the dimensionof a polyhedron, and in the case of unbounded polyhedra, the problem is known to be NPhard [124]. In general, it is diﬃcult to characterise the size of the output of such problemsgiven the size of the input. For example, a hypercube in n -dimensions has 2 n facets and 2 n vertices. The output of the vertex enumeration in this case would scale exponentially in theinput while the output of the facet enumeration would be quite small relative to the input size.The computational complexity of vertex/facet enumeration problems continues to be a subjectof current research and several algorithms have been proposed for the same.The most basic algorithm for facet enumeration involves a direct application of Fourier-MotzkinElimination, which is not always very eﬃcient (as seen in Section 2.2.2). There are also bet-ter methods, such as the double description method, that use similar steps as FME, whileadditionally exploiting properties of polyhedral representations (such as the Minkowski-WeylTheorem 2.2.1) to achieve better eﬃciency. Broadly, there are two main classes of polyhedralrepresentation conversion algorithms, namely incremental and graph traversal algorithms. Thedouble description method mentioned above corresponds to an incremental algorithm. In thecase of polytopes, one method for implementing the latter is through reversing the procedure ofthe simplex method (discussed in Section 2.2.3.2), whereby starting at a vertex that optimises asuitably chosen objective function, all other vertices are systematically enumerated until everyvertex has been visited at least once [92]. Here, the columns of the constraint matrix provide a basis, an appropriate linear combination of which givesthe required BFS. Geometrically, the basis vectors in each iteration correspond to the vertices of a simplex andthe associated BFS lies in this simplex i.e., belongs to the convex hull of the vertices represented by those basisvectors. The geometry of the simplex provides a minimal, aﬃnely independent basis for representing the basicsolution at each step. HAPTER 2. PRELIMINARIES

Softwares such as porta [55], on which the main computational work of this thesis has beencarried out, employ FME-based incremental algorithms. We present an overview of a simpleFME-based approach for enumerating facets of polyhedral cones, without going into the detailsof more eﬃcient incremental or graph-traversal algorithms. The concept of a polar dual of apolyhedron, which essentially interchanges the role of the vertices and faces of a polyhedron (seeFigure 2.3) is then used such that we also get vertex enumeration for free, given an algorithmfor facet enumeration.

Facet enumeration:

Consider a polyhedron

P ⊆ V expressed as P =

Conv ( x , ..., x k ) + Cone ( y , ..., y l ) in the V -representation. To convert to the H -representation of P ,1. Express the convex and conic hulls in terms of inequalities as follows x = α x + ... + α k x k + λ y + ... + λ l y l , = α + ... + α k ,α , ..., α k , λ , ..., λ l ≥ x, α , ..., α k , λ , ..., λ l . Project out thevariables α , ..., α k , λ , ..., λ l using Fourier-Mozkin Elimination (See Section 2.2.2) to ob-tain linear inequalities only involving components of x , which corresponds to the required H -representation.In order to convert from the H to V -representation, we require the concept of the polar or polardual of a polyhedron. Deﬁnition 2.2.8 (Polar of a polyhedron) . The polar P ∆ of a polyhedron P ⊆ V is deﬁned as P ∆ = { c ∈ V ∶ c T x ≤ , ∀ x ∈ P} . (2.15)The following theorem (Theorem 9.1 of [188]) elucidates some elegant properties of the polar P ∆ and relates its H / V representations to those of the original polytope P . Theorem 2.2.3 (Properties of the polar) . Let

P ⊆ V be a polyhedron that contains the origin.Then,1. P ∆ is a polyhedron and P ∆∆ = P ,2. If P =

Conv ( x , ..., x k ) + Cone ( y , ..., y l ) ⊆ V , then P ∆ = { u ∈ V ∶ x Ti u ≤ , ∀ i = , ..., k and y Tj u ≤ , ∀ j = , ..., l } ,3. If P = { u ∈ V ∶ x Ti u ≤ , ∀ i = , ..., k, and y Tj u ≤ , ∀ j = , ..., l } , then P ∆ = Conv ( x , ..., x k )+ Cone ( y , ..., y l ) . HAPTER 2. PRELIMINARIES

Figure 2.3: A polytope (blue cube) and its dual (red octahedron):

The dual ofa polyhedron has the elegant property that there is a one-to-one correspondence betweenvertices of the original polyhedron and facets of the dual, and between facets of theoriginal and vertices of the dual. Using this concept, any algorithm for facet enumerationalso yields an algorithm for vertex enumeration. The notion of a polar dual that isrequired for this inter-conversion is deﬁned formally in the main text. Note that thisﬁgure is only for the purpose of illustrating the concept of a dual, and does not exactlycorrespond to the polar dual.

Vertex enumeration:

Consider a polyhedron

P ⊆ V that contains the origin in its interior.If a polyhedron does not contain the origin, then by an appropriate coordinate transformation,it can be transformed into one that does. This allows P to be expressed in the H -representationas P = { x ∈ V ∶ Ax ≤ b } , with b i ≥ ∀ i . To convert to the V -representation of P ,1. For all inequalities A i x ≤ b i such that b i >

0, divide throughout by b i to express in theform ˜ A i x ≤

1. Without loss of generality, let these be the ﬁrst k inequalities. Then P = { x ∈ V ∶ ˜ A i x ≤ , ∀ i = , ..., k and A i x ≤ , ∀ i = k + , ..., m } , where m is the totalnumber of inequalities.2. Use the third statement of Theorem 2.2.3 to ﬁnd the V -representation of the polar P ∆ as P ∆ = Conv ( ˜ A T , ..., ˜ A Tk ) + Cone ( A Tk + , ..., A Tm ) .3. Convert P ∆ to its H -representation using facet enumeration.4. Use the ﬁrst an third statements of Theorem 2.2.3 to ﬁnd the V -representation of P =(P ∆ ) ∆ using the H -representation of P ∆ obtained in the previous step. Entropies have played a crucial role in the study of thermodynamic phenomena and statisticalmechanics since the 19th century [58, 29, 97], and in 1932, John von Neumann generalisedthe concept of entropies to quantum systems [213]. An information-theoretic understanding ofentropy on the other hand, was ﬁrst developed by Claude Shannon in his seminal 1948 paper HAPTER 2. PRELIMINARIES [191], where he introduced the Shannon entropy as a measure of the information content ofa message and used it to characterise the information capacity of communication channels.Interestingly, the von Neumann entropy, even though chronologically prior, turns out to be aquantum generalisation of the Shannon entropy. Since then, entropies have become an integralpart of both classical and quantum information theory. Several other information-theoreticentropy measures have also been proposed (such as Rényi [180], Tsallis [201], min and maxentropies [178]), and serve as useful mathematical tools for tackling a plethora of information-theoretic problems such as data compression [191, 173], randomness extraction [24, 128], keydistribution [178, 12] as well as causal structures [46, 89, 214, 217]. Here, we provide a briefoverview of some of the classical and quantum information-theoretic entropy measures and theirproperties that are relevant for the work presented in this thesis.

Deﬁnition 2.3.1 (Shannon entropy) . Given a random variable X distributed according to thediscrete probability distribution P X , the Shannon entropy of X is given by H ( X ) = − ∑ x P ( x ) ln P ( x ) . (2.16)The Shannon entropy H ( X ) is a continuous and strictly concave function of the probabilitydistribution over X i.e., for two probability distributions P and Q over X and λ ∈ [ , ] ,denoting the dependence on the distribution by a subscript, H ( X ) λP +( − λ ) Q ≥ λH ( X ) P + ( − λ ) H ( X ) Q . (2.17) Deﬁnition 2.3.2 (Conditional Shannon entropy) . Given two random variables X and Y , dis-tributed according to P XY , the conditional Shannon entropy is deﬁned by H ( X ∣ Y ) = − ∑ x,y P ( xy ) ln P ( xy ) P ( y ) (2.18)It is also useful to deﬁne a quantity called the mutual information , which, as the name suggests,quantiﬁes the amount of information that two random variables carry about each other. Theconditional version of this quantity quantiﬁes how much information two variables share abouteach other, given the value of a third variable. Deﬁnition 2.3.3 (Shannon mutual information) . Given 3 random variables, X , Y and Z , the Shannon mutual information and

Shannon conditional mutual information between X and Y ,and X and Y conditioned on Z are respectively given by I ( X ∶ Y ) = H ( X ) − H ( X ∣ Y ) ,I ( X ∶ Y ∣ Z ) = H ( X ∣ Z ) − H ( X ∣ Y Z ) . (2.19) In fact, Shannon had corresponded with von Neumann, who is said to have suggested that Shannon call hisnew uncertainty measure “entropy”, due to its similarities with prior notions of thermodynamic and statisticalentropies. Note that it is common to take logarithms in base 2 and measure entropy in bits ; here we use base e corresponding to measuring entropy in nats . HAPTER 2. PRELIMINARIES

Properties of Shannon entropy:

Some useful information-theoretic properties satisﬁed bythe Shannon entropy and associated mutual information are discussed below. Some of theseproperties can be expressed in diﬀerent, but equivalent forms. We will distinguish between thesebecause equivalences that apply in the case of the Shannon entropy may not necessarily applyfor other entropies. Whenever we have an entropic equation/inequality where no conditionalentropies are explicitly involved (such as the left most ones in all of the properties below),we will refer to it as the unconditional form of that property, otherwise it will be called the conditional form . According to the deﬁning of mutual information (Equation (2.19)) adoptedin this thesis, the forms of properties expressed in terms of the mutual informations qualifyas conditional forms as well. In the following, R , S and T are three disjoint sets of randomvariables.1. Non-negativity and upper bound (UB):

The Shannon entropy is positive i.e., H ( R ) ≥ H ( R ) ≤ ln ∣ R ∣ , where ∣ . ∣ denotes cardinality. Equal-ity for the upper bound is achieved if and only if P R ( r ) = /∣ R ∣ for all possible values r ∈ R (i.e., if the distribution on R is uniform).2. Chain rule (CR):

For disjoint sets R ,..., R n , S of random variables, H ( R , R , . . . , R n ∣ S ) = n ∑ i = H ( R i ∣ R i − , . . . , R , S ) , (2.20)which in particular implies the two useful identities H ( R ∣ S ) = H ( RS ) − H ( S ) and H ( R ∣ ST ) = H ( RS ∣ T ) − H ( S ∣ T ) .3. Additivity (A): If R and S are independent i.e., if P RS = P R P S , then H ( RS ) = H ( R ) + H ( S ) CR ⇐(cid:212)⇒ H ( R ∣ S ) = H ( R ) (2.21)4. Monotonicity (M):

From the deﬁnition (2.18) of the conditional Shannon entropy, itfollows that H ( S ) ≤ H ( RS ) CR ⇐(cid:212)⇒ H ( R ∣ S ) ≥ Subadditivity (SA): H ( R ) + H ( S ) ≥ H ( RS ) CR ⇐(cid:212)⇒ H ( R ) ≥ H ( R ∣ S ) (2.19) ⇐(cid:212)(cid:212)⇒ I ( R ∶ S ) ≥ Strong subadditivity (SSA): H ( RT )+ H ( ST ) ≥ H ( RST )+ H ( T ) CR ⇐(cid:212)⇒ H ( R ∣ T ) ≥ H ( R ∣ ST ) (2.19) ⇐(cid:212)(cid:212)⇒ I ( R ∶ S ∣ T ) ≥ T is the empty set, but SA does not imply SSA. The un-conditional form of SSA is often called submodularity . The conditional forms of SA and SSA HAPTER 2. PRELIMINARIES (often referred to as data-processing inequalities) tell us that entropy cannot increase whenconditioning on more random variables, which is a crucial information-theoretic property of theentropy. The conditional and unconditional forms of the properties given above are equivalentfor entropic measures that satisfy the chain rule (CR) in the form given in Equation (2.20).However for entropies that do not satisfy this chain rule, the two forms will be inequivalentand we will explicitly specify which form is being referred to in such cases. In the rest of thisthesis, when we say that an entropy satisﬁes the properties given by Equations (2.20)-(2.24),this should be understood as— the corresponding property holds when the Shannon entropy H () in these equations is replaced by the entropy measure under consideration.The von Neumann entropy generalises the Shannon entropy to quantum states (represented bydensity operators), and is deﬁned as follows. Deﬁnition 2.3.4 (von Neumann entropy) . The von Neumann entropy of a density operator ρ A ∈ S ( H A ) is deﬁned as H ( A ) ρ ∶= − tr ( ρ ln ( ρ )) . (2.25)It follows that the von Neumann entropy is zero for pure quantum states.We will often drop the subscript ρ in the von Neumann entropy even though this will lead tothe same notation as the Shannon case. However, this is justiﬁed since classical probabilitydistributions can be equivalently encoded using diagonal density matrices (with the distributionalong the diagonal), in which case we indeed recover the Shannon entropy. We will often use X , Y , Z etc. to denote classical random variables and A , B , C etc. to denote labels of quantumsubsystems, to distinguish between the two entropies. With this, the conditional von Neumannentropy is simply deﬁned through the chain rule, i.e., H ( A ∣ B ) = H ( AB ) − H ( B ) . (2.26)The von Neumann mutual information and its conditional version are deﬁned exactly as inthe Shannon case, i.e., through Equation (2.19) with H () denoting the von Neumann entropyinstead. Properties of von Neumann entropy:

The von Neumann entropy is nonnegative and satis-ﬁes additivity (Equation (2.21)), chain rule (Equation (2.20)), subadditivity (Equations (2.23))and strong subadditivity (Equations (2.24)). Further, it also admits a corresponding upperbound, H ( A ) ≤ ln ( d A ) where d A denotes the dimension of the Hilbert space H A , with equalityif an only if ρ A is the maximally mixed state d A . Note however that the von Neumann entropy does not satisfy monotonicity (Equation (2.22)) i.e., the conditional entropy is not positive.This can be seen by taking ρ AB to be an entangled pure state in Equation (2.26), such that The strong subadditivity of the von Neumann entropy is an important theorem in quantum informationtheory, proven by Lieb and Ruskai [135]. The original proof (which is also the one presented in standardtextbooks [151]) is particularly known for its complexity and a number of simpler proofs have been proposedsince then (see for example [178]). HAPTER 2. PRELIMINARIES H ( A ) , H ( B ) > H ( AB ) =

0. Instead, the von Neumann entropy satisﬁes a weakerproperty ( weak monotonicity ) that states that for any tripartite state ρ ABC , the following holds. H ( A ∣ B ) ρ ≥ − H ( A ∣ C ) ρ (2.27) Deﬁnition 2.3.5 ((Classical) Tsallis entropies) . Given a random variable X distributed ac-cording to the discrete probability distribution P X , the order q Tsallis entropy of X for anon-negative real parameter q is deﬁned as [201] S q ( X ) = ⎧⎪⎪⎨⎪⎪⎩− ∑ { x ∈ X ∶ P X ( x )> } P ( x ) q ln q P ( x ) if q ≠ H ( X ) if q = q P ( x ) = P ( x ) − q − − q .The q -logarithm function converges to the natural logarithm in the limit q → q → S q ( X ) = H ( X ) and the function is continuous in q . For brevity, we will henceforth write ∑ x instead of ∑ { x ∶ P ( x )> } , keeping it implicit that probability zero events do not contribute to the sum. Anequivalent form of Equation (2.28) is the following. S q ( X ) = ⎧⎪⎪⎨⎪⎪⎩ − q ( ∑ x ∈ X P ( x ) q − ) if q ≠ H ( X ) if q = Deﬁnition 2.3.6 (Conditional Tsallis entropies) . Given two random variables X and Y , dis-tributed according to P XY , the order q conditional Tsallis entropy for q ≥ S q ( X ∣ Y ) ∶= ⎧⎪⎪⎨⎪⎪⎩− ∑ x,y P ( xy ) q ln q P ( x ∣ y ) if q ≠ H ( X ∣ Y ) if q = S q ( X ∣ Y ) converges to the Shannon conditional entropy H ( X ∣ Y ) in the limit q →

1. The unconditional and conditional Tsallis mutual informations are deﬁned analogously toEquation (2.19) for the Shannon case with I replaced by I q and H replaced by S q . I q ( X ∶ Y ) = S q ( X ) − S q ( X ∣ Y ) ,I q ( X ∶ Y ∣ Z ) = S q ( X ∣ Z ) − S q ( X ∣ Y Z ) . (2.31) Note that this means the Tsallis entropy for q < HAPTER 2. PRELIMINARIES

Properties of Tsallis entropies:

Tsallis entropies are non-negative and satisfy the uncon-ditional and conditional forms of monotonicity (Equation (2.22)) and the chain rule (Equa-tion (2.20)) for all q ≥

0. They also satisfy both the forms of subadditivity (Equations (2.23))and strong subadditivity (Equations (2.24)) for all q ≥

1. For q ≥

0, they admit an analogousupper-bound as the Shannon case i.e., S q ( X ) ≤ ln q ∣ X ∣ , where for q >

0, equality is achievedif and only if P X ( x ) = /∣ X ∣ for all x (i.e., if the distribution on X is uniform). However,Tsallis entropies are not additive (Equation (2.21)) in general and instead satisfy a weaker con-dition known as pseudo-additivity [68]— for two independent random variables X and Y i.e., P XY = P X P Y , and for all q , the Tsallis entropies satisfy S q ( XY ) = S q ( X ) + S q ( Y ) + ( − q ) S q ( X ) S q ( Y ) . (2.32)Note that in the Shannon case ( q = q <

1, strong subadditivity (Equation (2.24)) does not hold in general [93]. The resultspresented in this thesis rely on this property and hence we will restrict to the q ≥ Deﬁnition 2.3.7 (Quantum Tsallis entropies) . The order q quantum Tsallis entropy of a densityoperator ρ A ∈ S ( H A ) , for a non-negative real parameter q is deﬁned as [201] S q ( A ) = ⎧⎪⎪⎨⎪⎪⎩ − q ( tr ( ρ q ) − ) if q ≠ H ( A ) if q = , (2.33)where H ( A ) denotes the von Neumann entropy (Deﬁnition 2.3.4) of ρ A .Quantum Tsallis entropies can be seen as a generalisation of the von Neumann entropy sincethey converge to H ( A ) in the limit of q going to 1. The conditional quantum Tsallis entropyfor a density operator ρ AB ∈ S ( H A ⊗ H B ) can be simply deﬁned as S q ( A ∣ B ) = S q ( AB ) − S q ( B ) (2.34)(analogous to the von Neumann case, Equation (2.26)). Properties of quantum Tsallis entropies:

Quantum Tsallis entropies are non-negative andsatisfy the chain rule (Equation (2.20)) by deﬁnition of the conditional entropy. They also satisfypseudo-additivity (Equation (2.32)) ∀ q ≥

0, both forms of subadditivity (Equation (2.23)) ∀ q ≥ S q ( A ) ≤ ln q ( d A ) , where the bound is saturated for q > ρ A is the maximally mixed state. However, they no not satisfy monotonicity and strong-subadditivity in general. The former is evident since the von Neumann entropy, a special caseof quantum Tsallis entropies, also does not satisfy this property. The latter was noted in [163]where suﬃcient conditions for the strong subadditivity of quantum Tsallis entropies expressedin its unconditional form, were also analysed. HAPTER 2. PRELIMINARIES

AEq. (2.21) CREq. (2.20) MEq. (2.22) SAEq. (2.23) SSAEq. (2.24)Shannon ✓ [191] ✓ [191] ✓ [191] ✓ [191] ✓ [191]Von Neumann ✓ [213] ✓ (By deﬁnition) × [43] (Weak Mono. [135]) ✓ [7] ✓ [135]Tsallis × [201] (Pseudo-add.) ✓ [93] ✓ [71] ✓ [93] ( q ≥ ) ✓ [93] ( q ≥ ) Q. Tsallis × [201] (Pseudo-add.) ✓ (By deﬁnition) × [43] ✓ [13] ( q ≥ ) × [163]Rényi ✓ [180] × [120] (Alternate CR [148]) ✓ [136] ✓ (C. form [120]) ✓ (C. form [120]) Min ✓ [178] × [178] × [178] ✓ (C. form [178]) ✓ (C. form [178]) Max ✓ [178] × [178] × [178] ✓ (C. form [178]) ✓ (C. form [178]) Table 2.1: Information-theoretic properties of generalised entropies:

This table summarisesthe main properties of the entropy measures discussed in the main text along with those of the minand max entropies. The acronyms A, CR, M, SA, SSA and C. form denote additivity, chain rule,monotonicity, subadditivity, strong subadditivity and conditional form respectively. Rényi, Min andMax entropies satisfy SA and SSA only in the conditional form while the remaining entropies satisfythem (wherever indicated so) in both conditional and unconditional forms. For the von Neumannand quantum Tsallis entropies, the chain rule follows by deﬁnition (Equations (2.26) and (2.34)).Monotonicity fails for the von Neumann entropy and hence the quantum Tsallis entropy which is ageneralisation of it for q ≠

1. Weak Monotonicity (2.27) holds in the von Neumann case but an analoguefor the more general q ≥ α ∈ [ , ] for the quantumRényi entropy of [200]. The Rényi entropies do not satisfy the chain rule of Equation (2.20), but theysatisfy a weaker, cardinality-dependent chain rule of Equation (2.38). The min and max entropies(which are originally deﬁned as quantum entropies) are also included in this table for comparison,their deﬁnition and further properties can be found in [178]. Here we deﬁne only the classical versions of Rényi entropies, as the quantum version will notbe used in the remainder of this thesis.

Deﬁnition 2.3.8 ((Classical) Rényi entropies) . Given a random variable X distributed ac-cording to the discrete probability distribution P X , the order α Rényi entropy of X for anon-negative real parameter α is deﬁned as R α ( X ) = ⎧⎪⎪⎨⎪⎪⎩ − α ln ( ∑ x ∈ X P ( x ) α ) if α ≠ H ( X ) if α = HAPTER 2. PRELIMINARIES

There are several inequivalent deﬁnitions of the conditional Rényi entropy proposed in theliterature [120, 83], in this thesis, we will consider the following deﬁnition.

Deﬁnition 2.3.9 (Conditional Rényi entropy) . Given two random variables X and Y , dis-tributed according to P XY , the order α conditional Rényi entropy for α ≥ R α ( X ∣ Y ) = ⎧⎪⎪⎨⎪⎪⎩ − α log (∑ xy P ( xy ) α P ( y ) − α ) if α ≠ H ( X ∣ Y ) if α = α tends to 1. Properties of Rényi entropies:

The Rényi entropies are nonnegative and upper-boundedas R α ( X ) ≤ ln ∣ X ∣ for all α ≥

0, with equality (for α >

0) if and only if P X is the uniformdistribution over X [180]. Rényi entropies do not satisfy the chain rule (Equation (2.20)) ingeneral for α ≠

1. However, an alternate, dimension dependent chain rule has been proposedfor quantum Rényi entropies in [148], which therefore also hold for classical Rényi entropies(where dimension would correspond to the cardinality of the variables). This alternate chainrule is given as H α ( X ∣ Y Z ) ≥ H α ( XZ ∣ Y ) − ln ∣ Z ∣ . (2.38)Further, Rényi entropies also satisfy monotonicity (Equation (2.22)) in both the conditionaland unconditional forms i.e., H α ( X ) ≤ H α ( XY ) and H α ( X ∣ Y ) ≥

0, even though these formsare not equivalent due to failure of the chain rule (in contrast to the Shannon case). ClassicalRényi entropies have been shown to satisfy subadditivity (Equation (2.23)) as well as strongsubadditivity (Equation (2.24)), both only in the conditional forms. They fail to satisfy the(inequivalent) unconditional form of these properties for α ≠ Quantum theory has worked and continues to work remarkably well to explain and predictempirical observations. However there is no consensus on a set of natural physical princi-ples that single out quantum theory from a plethora of other possible theories that are alsocompatible with relativistic principles (such as ﬁnite signalling speed) but produce stronger-than-quantum correlations [171]. Generalised probabilistic theories (GPTs), of which classical HAPTER 2. PRELIMINARIES and quantum theories can be seen as particular members, were developed out the motivation toderive the mathematical formalism of quantum theory from fundamental physical principles oraxioms, and to probe deeper, theory-independent connections between information processingand physical principles [189, 139, 140, 141, 110, 16, 157]. Several properties such as (Bell)non-locality, contextuality, entanglement, non-unique decomposition of a mixed state into purestates, monogamy of correlations, information-theoretically secure cryptography and no-cloninghave been shown to extend beyond quantum theory [171, 110, 17, 144, 16]. However, telepor-tation and entanglement swapping are not always possible in non-classical GPTs despite theexistence of entangled states [193, 16, 194, 103, 219]. Further, from a foundational point ofview, it is of interest to understand whether interpretational issues are peculiarities of quantumtheory. The contextuality/non-locality of GPTs suggests that similar interpretational problemscould arise here. We will discuss this aspect in more detail in Chapter 8 where we derive aWigner’s friend type paradox in box world (a particular GPT).In the present chapter, we provide an overview of the framework for information processing inGPTs proposed by Barrett in [16]. In the remainder of this thesis, we will use the more collo-quial term, “box world” to denote the set of theories that Barrett originally calls

Generalisedno-signaling Theories . This theory allows arbitrary correlations between measurements on sep-arated systems, as long as they are non-signaling i.e., choice of measurement on one subsystemdoes not aﬀect the measurement outcome on the other. Here separation between systems isexpressed in terms of a tensor product structure (that allows us to identify subsystems), andwithout reference to space-time locations/space-like separation . Not all GPTs need to have atensor product structure for representing composite systems, but will focus only on those thatdo, for the purposes of this thesis. The following is based on the review sections of our paper[209]. Individual states.

The so-called generalised bit or gbit is a system completely characterizedby two binary measurements which can be performed on it [16]. Such sets of measurementsthat completely characterise the state of a system are known as ﬁducial measurements . Thestate of a gbit is thus fully speciﬁed by the vector ⃗ P gbit = ⎛⎜⎜⎜⎝ P ( a = ∣ X = ) P ( a = ∣ X = ) P ( a = ∣ X = ) P ( a = ∣ X = ) ⎞⎟⎟⎟⎠ , (2.39) The intuition here is that there is a trade-oﬀ between the state space and the eﬀect space (the dual ofthe former). A larger state space implies that the set of eﬀects leading to valid probabilities would be smaller,leading to a smaller eﬀects space. Hence, there exist GPTs with entangled states but no analogue of entanglingmeasurements, both of which are needed for teleportation-like protocols. Of course this leads to systems and correlations that are non-signaling also with respect to a space-timestructure once this is introduced into the picture. HAPTER 2. PRELIMINARIES

X a (a) G-bit.

A gbit is a function withbinary input and output, character-ized by the probability vector ⃗ P gbit ,also called the state vector. X aY b (b) PR box.

The PR box has two binary in-puts

X, Y and two binary outputs a, b , satisfy-ing XY = a ⊕ b , and otherwise uniformly ran-dom (state vector on the right). Usually it isapplied in the context of two space-like sepa-rated agents, each providing one of the inputsand obtaining the respective output. The box isnon-signaling, and maximally violates the CHSHinequality [171]. Figure 2.4: Boxes in Generalized Probabilistic Theories.

The modular objectsof GPTs are input/output functions depicted as boxes and characterized by probabilityvectors. Each function (or box) can be evaluated once, and it may or not correspond to aphysical system being probed; even if it is, nothing is assumed about the post-evaluationstate of the system (unlike quantum theory, which speciﬁes the post-measurement stateof a system given its initial state and the measurement device). where X = X = a ∈ { , } are the possibleoutcomes (Figure 2.4a). Analogously, a classical bit is a system characterized by a single binaryﬁducial measurement, ⃗ P bit = ( P ( a = ∣ X = ) P ( a = ∣ X = ) ) , (2.40)and, in quantum theory, a qubit is characterized by three ﬁducial measurements (corresponding,for example, to three directions X , Y and Z in the Bloch sphere), ⃗ P qubit = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝ P ( a = ∣ X = ) P ( a = ∣ X = ) P ( a = ∣ X = ) P ( a = ∣ X = ) P ( a = ∣ X = ) P ( a = ∣ X = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠ . (2.41) This ﬁgure is taken from our paper [209] which is joint work with Nuriya Nurgalieva and Lídia del Rio, andwas made by Lídia. HAPTER 2. PRELIMINARIES

For normalized states, we have ∣ ⃗ P ∣ = ∑ i P ( a = i ∣ X = j ) = , ∀ j . The set of possible states of agbit is convex, with the extreme points ⃗ P = ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ , ⃗ P = ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ , ⃗ P = ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ , ⃗ P = ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ . (2.42)These correspond to pure states, and the state space of a gbit is a polytope (due to the ﬁnitenumber of extreme points). In the qubit case, the extreme points correspond to all the (inﬁnitelymany) pure states on the surface of the Bloch sphere, some of these are ⃗ P ∣+⟩ = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝ / / / / ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠ , ⃗ P ∣−⟩ = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝ / / / / ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠ , ⃗ P ∣ ⟩ = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝ / / / / ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠ ⃗ P ∣ ⟩ = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝ / / / / ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠ . (2.43)Note that in box world, pure gbits are deterministic for both alternative measurements, whereasin quantum theory at most one ﬁducial measurement can be deterministic for each pure qubit,as reﬂected by uncertainty relations. We denote the set of allowed states of a system A by S A . Composite states.

The state of a bipartite system AB , denoted by ⃗ P AB ∈ S AB can bewritten in the form ⃗ P AB = ∑ i r i ⃗ P Ai ⊗ ⃗ P Bi where r i are real coeﬃcients and ⃗ P Ai ∈ S A , ⃗ P Bi ∈ S B can be taken to be pure and normalised states of the individual systems A and B [16]. Thus,a general 2-gbit state ⃗ P AB can be written as in Figure 2.4b (left), where X, Y ∈ { , } are thetwo ﬁducial measurements on the ﬁrst and second gbit and a, b ∈ { , } are the correspondingmeasurement outcomes. The PR box ⃗ P P R , on the right, is an example of such a 2 gbit statethat is valid in box world, which satisﬁes the condition a ⊕ b = xy [171]. State transformations.

Valid operations are represented as matrices that transform validstate vectors to valid state vectors. In addition, we only have access to the (single-shot) in-put/output behaviour of systems, so in practice all valid operations in box world take the formof classical wirings between boxes, which correspond to pre- and post-processing of input andoutput values, and convex combinations thereof [16]. For example, bipartite joint measure-ments on a 2-gbit system can be decomposed into convex combinations of classical “wirings”,as shown in Figure 2.5. In contrast, quantum theory allows for a richer structure of bipartitemeasurements by allowing for entangling measurements (e.g. in the Bell basis), which cannotbe decomposed into classical wirings. Bipartite transformations on multi-gbit systems turn outto be classical wirings as well [16]. Reversible operations in particular consist only of trivialwirings: local operations and permutations of systems [103]. One cannot perform entanglingoperations such as a coherent copy (the quantum CNOT gate) [16, 193, 103]. Note that it is not necessary that the coeﬃcients r i be positive and sum to one. If this is the case, then thecomposite state would be separable and hence local, otherwise, the state is entangled [16]. HAPTER 2. PRELIMINARIES

X Y

A B Y = f ( a ) bo = f ( a, b ) aoa Figure 2.5: Bipartite measurements in box world.

Any bipartite measurement ona 2-gbit box world system can be decomposed into a procedure (or convex combinationsthereof) of the following form. Alice ﬁrst performs a measurement X on one of the gbits(labelled A ), and forwards the outcome a to Bob. Bob then performs a measurement Y = f ( a ) , which may depend on a , on the other gbit (labelled B ), obtaining the outcome b . The ﬁnal measurement outcome o of the joint measurement can be computed by Bobas a function f ( a, b ) . All allowed bipartite measurements are convex combinations ofthis type of classical wirings [16]. In quantum theory, systems are described by states that live in a Hilbert space, measurementsand transformations on these states are represented by CPTP maps and the Born rule spec-iﬁes how to obtain the probabilities of possible measurement outcomes give these states andmeasurements. In more general theories, there is no reason to assume Hilbert spaces or CPTPmaps. In fact such a description of the state space and operations may not even be available,systems may be described as black boxes taking in classical inputs (choice of measurements)and giving classical outputs (measurement outcomes). What we can demand is that the theoryprovides a way for agents to predict the probabilities of obtaining various outputs based ontheir input choice and some operational description of the box. We have brieﬂy reviewed statesand transformations in Barrett’s framework for GPTs [16], we now discuss how measurementsand associated probabilities are characterised here.Consider a GPT, T . Denoting the set of all allowed states of a system in T by S , any validtransformation on a normalised GPT state ⃗ P ∈ S maps it to another normalised GPT statein S . The set of allowed normalized states is assumed to be closed and convex and transfor- The motivation for this assumption is that if it is possible to prepare two states P and P , then it should HAPTER 2. PRELIMINARIES mations on these states, assumed to be linear. Each valid transformation is then representedby a matrix M such that ⃗ P → M. ⃗ P under this transformation and M. ⃗ P ∈ S [16]. Further,operations that result in diﬀerent possible outcomes can be associated with a set of transfor-mations, one for each outcome. These also give an operational meaning to unnormalised stateswhere ∣ ⃗ P ∣ = ∑ i P ( a = i ∣ X = j ) = c ∀ j, c ∈ [ , ] (i.e., the norm is independent of the value of j ).Such an operation M on a normalised initial state ⃗ P can be associated with a set of matrices { M i } such that the unnormalised state corresponding to the i th outcome is M i . ⃗ P . Then theprobability of obtaining this outcome is simply the norm of this unnormalised state, ∣ M i . ⃗ P ∣ and the corresponding normalized ﬁnal state is M i . ⃗ P /∣ M i . ⃗ P ∣ . A set { M i } represents a validoperation if the following hold [16].0 ≤ ∣ M i . ⃗ P ∣ ≤ ∀ i, ⃗ P ∈ S (2.44a) ∑ i ∣ M i . ⃗ P ∣ = ∀ ⃗ P ∈ S (2.44b) M i . ⃗ P ∈ S ∀ i, ⃗ P ∈ S (2.44c)This is the analogue of quantum Born rule for GPTs. Box world is a GPT where the statespace S consists of all normalized states ⃗ P whose entries are valid probabilities (i.e., ∈ [ , ] )and satisfy the no-signaling constraints i.e., for a N -partite state ⃗ P , the marginal term ∑ a i P ( a , .., a i , .., a N ∣ X , .., X i , .., X N ) is independent of the setting X i for all i ∈ { , ..., N } When the GPT T is box world, the conditions of Equations 2.44a-2.44c result in the characteri-zation of measurements and transformations in the theory in terms of classical circuits or wirings as shown in [16]. We summarise the results of [16] characterising allowed transformations andmeasurements in box world and will only consider normalization-preserving transformations.• Transformations:–

Single system:

All transformations on single box world systems are relabellings ofﬁducial measurements or outcomes or a convex combination thereof. – Bipartite system:

Let X and Y be ﬁducial measurements performed on the trans-formed bipartite system with corresponding outcomes a and b , then all transfor-mations of 2-gbit systems can be decomposed into convex combinations of classicalcircuits of the following form: A ﬁducial measurement X ′ = f ( X, Y ) is performedon the initial state of the ﬁrst gbit resulting in the outcome a ′ followed by a ﬁducialmeasurement Y ′ = f ( X, Y, a ′ ) on the initial state of the second gbit resulting in theoutcome b ′ . The ﬁnal outcomes are given as ( a, b ) = f ( X, Y, a ′ , b ′ ) , where f , f and f are arbitrary functions. be possible to prepare a probabilistic mixture of the two (for example, by tossing a coin). This is in the spirit of relativistic causality since one would certainly expect that the input of one partydoes not aﬀect the output of others when the are all space-like separated from each other. HAPTER 2. PRELIMINARIES • Measurements:–

Single system:

All measurements on single box world systems are either ﬁducialmeasurements with outcomes relabelled or convex combinations of such. – Bipartite system:

All bipartite measurements on 2-gbit systems can be decomposedinto convex combinations of classical circuits of the following form (Figure 2.5): Aﬁducial measurement X is performed on the initial state of the ﬁrst gbit resultingin the outcome a ′ followed by a ﬁducial measurement Y = f ( a ′ ) on the second gbitresulting in the outcome b ′ . The ﬁnal outcome is a = f ′ ( a ′ , b ′ ) , where f and f ′ arearbitrary functions. Cause-eﬀect relationships between systems constrain the possible correlations that can be ob-served between them. These relationships are naturally represented using causal structures or directed graphs. Given a causal structure, a causal model provides a set of mathematicalrules to deﬁne the possible correlations that are compatible with the causal structure. Addi-tionally, it can also provide rules for how the correlations compatible with a causal structurechange under external manipulations of the systems involved. There are diﬀerent approachesfor mathematically formalising these concepts in general physical theories, depending on whatthe nodes and edges of the causal graph correspond to, the compatibility condition required bythe causal model and how external manipulations are described. For classical causal models, acommon approach is that of Bayesian networks, pioneered by Judea Pearl [160]. This has beengeneralised to quantum and more general, post-quantum causal structures in [115]. We willmainly adopt this approach in the thesis, and will brieﬂy discuss other approaches for quantumcausal modelling at the end of Section 2.5.2.3.

General deﬁnitions and terminology:

Throughout this thesis, we will only consider dis-crete and ﬁnite random variables, and corresponding causal structures with ﬁnite number ofsystems/nodes. More formally, we have the following deﬁnitions.

Deﬁnition 2.5.1 (Causal structure) . A causal structure is a directed graph G over a set ofnodes, some of which are labelled observed and the rest unobserved. Each observed node X corresponds to a classical random variable of the same name, while unobserved nodes cancorrespond to classical, quantum or post-quantum systems. For two (observed or unobserved)nodes X and Y of G , X is said to be a cause of a Y if and only if there exists a directed pathfrom X to Y in G .The majority of the literature on causal structures as well as Chapters 4 and 5 concern the caseof acyclic causal structures represented by directed acyclic graphs (DAGs). In Chapter 6 , we Observed variables may represent classical inputs (such as measurement settings) or outputs (such as mea-surement outcomes) of an experiment. HAPTER 2. PRELIMINARIES attempt to model cyclic causal structures as well, but an understanding of the acyclic case is apre-requisite for this as well. Hence, we will focus on the acyclic case in this review.Note that by the above deﬁnition of cause , X is said to be a cause of itself if and only if there isa directed path from X to itself in G , which is not possible if the graph is acyclic. We will use family relations to describe causal relationships between nodes of a directed graph— for i ≠ j ,if X i is a direct or indirect cause of X j , then X i is said to be an ancestor of X j or equivalently, X j a descendant of X i . We will use either explicitly as anc ( X i ) or more compactly as X ↓ todenote the set of all ancestors of X i in G and similarly, desc ( X i ) or X ↑ i for the set of all itsdescendants. If X i is a direct cause of X j , then X i is said to be a parent of X j or equivalently X j , a child of X i . We will use par ( X i ) or X ↓ i and child ( X i ) or X ↑ i to denote the sets of allparents and children respectively, of X i in G . If a node X i has no parents in G , then it is saidto be exogenous with respect to that graph, and is called endogenous otherwise.Figure 2.6 illustrates two causal structures that will be considered extensively Part I of thisthesis, particularly in Chapters 4 and 5. These are the biparite Bell and Triangle causalstructures, which we will denote by G B and G T respectively. Both the Bell [22, 57, 225, 122,63, 170, 143, 15, 66, 34, 215] and Triangle [88, 47, 48, 33, 216, 179, 129] causal structures havebeen studied extensively and continue to be a subject of ongoing research, by virtue of beingrelatively small causal structures that are known to support non-classical correlations. Weﬁrst outline the preliminaries of classical causal models [160] required for this thesis, and thendiscuss the generalisation to quantum and post-quantum causal structures, mainly followingthe approach of [115]. We now enunciate the following broad deﬁnition of a causal model. Deﬁnition 2.5.2 (General deﬁnition of a causal model) . A causal model consists of a causalstructure G , and a joint distribution P X ,...,X n over the set { X , ..., X n } of observed nodes of G that is compatible with the causal structure according to a compatibility condition . The section is primarily based on the framework of classical causal models developed by JudeaPearl [160]. ((Classical) causal structures) . A causal structure is called classical anddenoted G C if all its unobserved nodes correspond to classical random variables (denoted bythe same name as the associated node).Hence for classical causal structures, all nodes (observed and unobserved) correspond to classicalrandom variables and we will use the same label to denote the node of the DAG as well as the HAPTER 2. PRELIMINARIES

A BX Y Λ (a) ZX Y

AB C (b)

Figure 2.6: The bipartite Bell and Triangle causal structures:

Two widelystudied causal structures that are known to support a gap between correlations realisablethrough classical vs non-classical unobserved causes. Here observed nodes are circledwhile the uncircled nodes are unobserved. (a)

The bipartite Bell causal structure: Thiscorresponds to an information processing protocol involving two parties Alice and Bobwhere the observed nodes A and B represent the random variables corresponding theirchoice of input (measurement setting) while X and Y represent the random variablescorresponding to their outputs (measurement outcome). The outcomes X and Y areobtained by measuring the respective halves of a shared system Λ, which may be classical,quantum or post-quantum. (b) The Triangle causal structure: Here, the three observednodes X , Y and Z have unobserved, pairwise common causes A , B and C , but no jointcommon cause. This corresponds to a information processing protocol involving threeparties who produce correlated data ( X , Y and Z ) by pre-sharing information pairwise,and never having interacted as a group. random variable it represents. The framework of classical causal models [160] then speciﬁeswhen a joint probability distribution over all nodes is said to be compatible with the causalstructure. Deﬁnition 2.5.4 (Compatibility of distribution with G C ) . A distribution P X ,...,X n ∈ P n overthe random variables { X , . . . , X n } is said to be compatible with a classical causal structure G C (with these variables as nodes) if it satisﬁes the causal Markov condition i.e., the jointdistribution decomposes as P X ,...,X n = n ∏ i = P X i ∣ X ↓ i , (2.45)where X ↓ i denotes the set of all parent nodes of the node X i in the DAG G C .A crucial concept relating to the joint distribution of a causal structure is conditional indepen-dence deﬁned as follows. Deﬁnition 2.5.5 (Conditional independence) . Given pairwise disjoint subsets of random vari-ables X , Y and Z (where X and Y are non-empty) and a joint distribution P XY Z over them, X and Y are said to be conditionally independent given Z , denoted X Æ Y ∣ Z if P XY ∣ Z = P X ∣ Z P Y ∣ Z . (2.46)The relationship between the Markov compatibility condition of Deﬁnition 2.5.4 and conditionalindependence as deﬁned above are given by the following Theorem (Theorem 1.2.7 in [160]). HAPTER 2. PRELIMINARIES

Theorem 2.5.1 (Pearl 2009) . A distribution P ∈ P n is compatible with a classical causalstructure G C over a node set { X , ..., X n } if and only if every node X i is conditionally inde-pendent from its non-descendants, denoted X ↑̸ i given its parents X ↓ i in G C i.e., ∀ i ∈ { , . . . , n } X i Æ X ↑̸ i ∣ X ↓ i , or explicitly P X i X ↑̸ i ∣ X ↓ i = P X i ∣ X ↓ i P X ↑̸ i ∣ X ↓ i (2.47)The equivalent compatibility condition in terms of conditional independence provided by The-orem 2.5.1 is often referred to as local Markov or parental Markov condition in the literature.It is possible to read oﬀ the conditional independences satisﬁed by a Markov compatible distri-bution from the corresponding causal graph, rules for doing so were independently developedby Geiger [96] and Verma and Pearl [206]. The following graph theoretic notions are requiredto understand how this can be done. Deﬁnition 2.5.6 (Blocked paths) . Let G be a DAG in which X and Y are distinct nodes and Z be a set of nodes not containing X or Y . A path from X to Y is said to be blocked by Z ifit contains either A —→ W —→ B with W ∈ Z , A ←— W —→ B with W ∈ Z or A —→ W ←— B such that neither W nor any descendant of W belongs to Z .In the following, three node subgraphs of the form A —→ W ←— B will be known as colliders where W is referred to as the collider node. Other three node subgraphs will be referred to as non-colliders , in particular A ←— W —→ B corresponds to the common cause subgraph wherethe node W is a common cause of A and B . Deﬁnition 2.5.7 (d-separation) . Let G be a DAG in which X , Y and Z are disjoint sets ofnodes. X and Y are d-separated by Z in G , denoted as ( X ⊥ d Y ∣ Z ) G (or simply X ⊥ d Y ∣ Z if G is obvious from context) if every path from a variable in X to a variable in Y is blocked by Z ,otherwise, X is said to be d-connected with Y given Z .The concept of d-separation was developed independently in [96] and [206]. It follows fromthe above deﬁnition that d-separation is a symmetric relation in the ﬁrst two arguments (as isconditional independence) i.e., X ⊥ d Y ∣ Z is equivalent to Y ⊥ d X ∣ Z . When Z is the empty set ∅ , we will simply denote this as X ⊥ d Y , and similarly for conditional independences (whichwill be called independences when the conditioning set Z is empty). For arbitrary disjointsubsets X , Y and Z of the nodes, d-separation can be used to determine whether X and Y are conditionally independent given Z for a distribution compatible with the causal structure(according to Deﬁnition 2.5.4). The following result which is Theorem 1.2.5 of [160], and wasoriginally presented in [206] illustrates this. Theorem 2.5.2 (Verma and Pearl 1988) . Let G C be a classical causal structure and let X , Y and Z be pairwise disjoint subsets of nodes in G C . If a probability distribution P is compatiblewith G (according to Deﬁnition 2.5.4), then the d-separation X ⊥ d Y ∣ Z implies the conditionalindependence X Æ Y ∣ Z . Conversely, if all distributions P compatible with G C satisfy the con-ditional independence X Æ Y ∣ Z , then the corresponding d-separation relation X ⊥ d Y ∣ Z holds. HAPTER 2. PRELIMINARIES

SE H + −+

Figure 2.7: Example of ﬁne-tuned correlations compatible with a causal struc-ture:

Even though S is a cause of H , it is possible for the correlations p SEH to beﬁne-tuned such that the positive dependence S + —→ E + —→ H exactly cancels out the negativedependence S − —→ H resulting in S and H being uncorrelated (see Example 2.5.1 The forward direction of the above Theorem is referred to as the soundness of d-separation andthe reverse direction as the completeness of d-separation for determining conditional indepen-dences. Note that for the latter, it is important that all compatible distributions satisfy theconditional independence. It is possible for a distribution P that is compatible with a graph G to satisfy a conditional independence X Æ Y ∣ Z even when the d-separation relation X ⊥ d Y ∣ Z does not hold in G . A causal model deﬁned by such a distribution P and causal structure G issaid to be unfaithful or ﬁne-tuned . Therefore for a faithful causal model, the only conditionalindependence relations are those that can be read oﬀ from the d-separation relations in theassociated causal graph. For a faithful causal model, it can be shown that all the conditionalindependences in the distribution are implied by the n conditional independence constraints(one for each node) given by the parental/local Markov condition (Equation (2.47)), and canbe derived from these constraints and standard probability calculus based on Bayes’ rule. Ingeneral, not all of these n constraints may be required. The following example (taken from[187]) illustrates a compatible distribution that is unfaithful with respect to its causal graph. Example 2.5.1 (A ﬁne-tuned causal model) . Consider the causal structure of Figure 2.7 where + or − on an edge indicates whether the nodes connected by that edge are positively or negativelycorrelated as a result of that causal inﬂuence. Note that the Markov condition of Equation (2.45) does not impose any constraints on joint distributions P SEH compatible with this causal struc-ture, and neither does d-separation. Any probability distribution P SEH can be expressed as P SEH = P S P E ∣ S P H ∣ ES using Bayes’ rules, which is the same factorization obtained by applyingthe Markov condition to this graph. Further, S inﬂuences H through two causal mechanisms—a direct, negative causal inﬂuence and an indirect, positive inﬂuence. It is therefore possible tohave a compatible distribution P SEH in which these two causal mechanisms exactly cancel eachother out such that S and H appear to be uncorrelated i.e., P SH = P S P H , even though S and H are clearly d-connected in the graph. Note that such examples are ﬁne-tuned in the sense that the causal inﬂuences will have toconspire in a very speciﬁc way in order to just cancel out, and even slight deviations in thestrength of the correlations will destroy the apparent independence. We will consider suchexamples in further detail (including explicit distributions) in Chapter 6. In the rest of thissection, we will focus only on faithful correlations and causal models. We now provide anillustrative example for the concepts of d-separation and conditional independence.

Example 2.5.2 (d-separation and conditional independence) . Consider the causal structure of HAPTER 2. PRELIMINARIES

X R S T U V YW

Figure 2.8: Causal structure of Example 2.5.2

Figure 2.8, where all nodes are observed. Here, the node T (and the corresponding sub-graph S —→ T ←— U ) is a collider (and all remaining nodes are non-colliders ), and V is a commoncause of U and Y . We now consider d-separation relations involving diﬀerent conditioning sets Z . If Z is the empty set, X ⊥ d Y ∣ Z (or simply X ⊥ d Y ) because the path between X and Y is blocked by the collider T that does not belong to Z = ∅ . If { T, W } or a subset thereof iscontained in Z , then X and Y are not d-separated (or equivalently, they are d-connected ) by Z since the path contains the node T that either belongs to Z or has a descendant W that belongsto Z . Further, X and T as well as T and Y are d-connected by the empty set Z = ∅ . Theformer become d-separated when Z contains either one or both of R and S since these block thedirected path from Z to T . Similarly, T and Y become d-separated by Z if it contains one orboth of U and V . By the soundness of d-separation , this implies the conditional independencerelations X Æ Y , X Æ T ∣{ R, S } , T Æ Y ∣{ U, V } . By ﬁnding the set of non-descendants andparents for all 8 nodes, we have the following set of 8 conditional independences correspondingto Equation (2.47) . For a faithful causal model over the causal structure 2.8, all conditionalindependence relations in every compatible distribution (including those mentioned previously)are implied by these 8 relations. X Æ {

U, V, Y } ,R Æ {

U, V, Y }∣ X,S

Æ {

U, V, Y }∣ R,T

Æ {

X, R, V, Y }∣{

S, U } ,W Æ {

X, R, S, U, V, Y }∣ T,U

Æ {

X, R, S, Y }∣ V,V

Æ {

X, R, S } ,Y Æ {

X, R, S, T, W, U }∣ V. (2.48) d-separation and entropic causal constraints: An equivalent statement of Theorem 2.5.2can be obtained by encoding conditional independences in terms of the entropies (rather thanthe probability distribution) over the nodes. For a classical causal structure G with pairwisedisjoint subsets of nodes X , Y and Z , X Æ Y ∣ Z is equivalent to the entropic constraint I ( X ∶ Y ∣ Z ) =

0, where I ( X ∶ Y ∣ Z ) is the Shannon conditional mutual information (Deﬁnition 2.3.3).Then, X and Y are d-separated by Z in G if and only if I ( X ∶ Y ∣ Z ) = G [160]. The complete set of d-separation conditions give all the conditionalindependence relations implied by the DAG (by soundness of d-separation). In the case of HAPTER 2. PRELIMINARIES

X Y (a)

X Y (b)

X Y Λ (c) X Y Λ (d) X Y Λ (e) Figure 2.9: Two node causal structures with same set of compatible distri-butions:

All of the above causal structures contain two observed nodes X and Y .In all cases, neither the Markov compatibility condition (2.45) nor d-separation implyany constraints on the marginal distribution P XY over the observed nodes. Thereforeany 2-variable distribution P XY is compatible with all 5 of these graphs, and the causalstructure cannot be inferred directly from these correlations. However, if we were given,or could deduce additional information regarding the results of active interventions on X and Y , we would be able to rule out some of these explanations. For example, if forciblysetting X to some value x (without changing anything else), resulted in a change in thedistribution over Y , we would say that X is a cause of Y . In this case, the opposite directcause explanations (b) and (e) as well as the purely common cause explanation (c) wouldbe ruled out, leaving us with (a) and (d) as possible explanations. Shannon entropy for a DAG with n nodes these are all implied by the n constraints I ( X i ∶ X ↑̸ i ∣ X ↓ i ) = ∀ i ∈ { , . . . , n } . (2.49)In other words, a distribution over n variables satisﬁes Equation (2.47) and hence Equa-tion (2.45) (by Theorem 2.5.1) if and only if it satisﬁes Equation (2.49). Given a causal structure, we can apply Deﬁnition 2.5.4 to obtain the set of all joint distribu-tions compatible distributions with it. Causal inference concerns the reverse problem— givena compatible distribution (and possibly some additional information), can we infer the causalstructure that it is compatible with? It is not possible to infer an underlying causal structurefrom correlations alone: correlations are symmetric while causal relationships are directional.For example, if two variables X and Y are correlated, Reichenbach’s principle [177] asserts thateither X must be a cause of Y , Y must be a cause of X , X and Y share a common causeor any combination thereof and any distribution P XY over these variables is compatible with HAPTER 2. PRELIMINARIES all ﬁve of these causal structures illustrated in Figure 2.9. Hence these causal explanationscannot be distinguished immediately using observed correlations alone, and additional infor-mation/deductions are required in order to infer the underlying causal structure. Intuitively,we can argue that if actively intervening or “doing” something only to X changed somethingabout Y , then we deﬁnitely know that X is a cause of Y . This intuition is formalised in termsof interventions and do-conditionals [160], which in addition to the observed correlations allowfor the inference of the underlying causal structure in several scenarios that cannot be identiﬁedfrom the correlations alone. We will follow the augmented graph approach (Section 3.2.2 of[160]) to deﬁne interventions and do-conditionals and only consider the case of perfect or idealinterventions in this thesis.First, consider a causal model associated with a causal structure G over a set { X , ..., X n } ofnodes, all of which are observed. External intervention on a node X i can be described using anaugmented graph G I Xi which is obtained from the original graph G by adding a node I X i and anedge I X i —→ X i (with everything else unchanged). The intervention variable I X i can take valuesin the set { idle, { do ( x i )} x i ∈ X i } , where I X i = idle corresponds to the case where no interventionis performed (i.e., the situation described by the original causal model) and I X i = do ( x i ) forces X i to take the value x i by cutting oﬀ its dependence on all other parents. Hence the parents ofthe node X i in the original and augmented graphs are related as par G IXi ( X i ) = par G ( X i ) ∪ I X i .We will denote par G IXi ( X i ) simply as par I ( X i ) and the par G ( X i ) as par ( X i ) , in short-hand.The conditional probability of X i = x i conditioned on its parents par ( X i ) /par I ( X i ) taking thevalue p X i / p I Xi is given as. P ( x i ∣ p I Xi ) ∶= ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ P ( x i ∣ p X i ) , if I X i = idle , if I X i = do ( x ′ i ) and x ′ i ≠ x i , if I X i = do ( x ′ i ) and x ′ i = x i (2.50)From this, we see that whenever I X i ≠ idle , X i no longer depends on its original parents par ( X i ) .Therefore, conditioned on I X i ≠ idle , it is illustrative to consider a new graph which we denote by G do ( X i ) that represents the post-intervention causal structure after a non-trivial intervention hasbeen performed. The causal graph G do ( X i ) is obtained by cutting oﬀ all incoming arrows to X i except the one from I X i in the causal graph G I Xi , with everything else unchanged. An exampleof the graphs G , G I Xi and G do ( X i ) is given in Figure 2.10. Then the eﬀect of an interventionsetting X i = x ′ i , i.e., performing do ( x ′ i ) is to transform the original probability distribution P ( X = x , ..., X n = x n ) into a new probability distribution P ( X = x , ..., X n = x n ∣ do ( X i = x i )) (the do-conditional ) given as P ( x , ..., x n ∣ do ( x i )) ∶= P G do ( Xi ) ( x , ..., x n ∣ I X i = do ( x i )) , (2.51)where P ( x , ..., x n , I X i = do ( x i )) is compatible with the graph G do ( X i ) and Equation (2.50), whichimplies that P ( x , ...x ′ i , ..., x n ∣ do ( x i )) = x ′ i ≠ x i . Applying the Markov compatibility con-dition (Deﬁnition 2.5.4) to the original, pre-intervention graph G implies that all distributionscompatible with this graph must satisfy P X ,...,X n = Π nj = P X j ∣ par ( X j ) . In the post-interventiongraph G do ( X i ) , the only causal relations that change are those involving the intervened node X i where the set of parents of X i in G do ( X i ) is the single element set { I X i } , and the parent HAPTER 2. PRELIMINARIES sets of all other nodes remain the same as those of G . Then any distribution P X ,...,X n ,I Xi with I X i taking values in { do ( x i )} x i ∈ X i (i.e., not idle), that is compatible with G do ( X i ) satisﬁes P X ,...,X n ∣ I Xi = Π nj = ,j ≠ i P X j ∣ par ( X j ) P X i ∣ I Xi , for any distribution over I X i . Noting that P X i ∣ I Xi is thedeterministic distribution which is 1 whenever X i takes the same value set by I X i and 0 other-wise, this gives the following transformation between the do-conditional (Equation (2.51)) andthe original distribution, P ( x , ..., x n ∣ do ( x ′ i )) = ⎧⎪⎪⎨⎪⎪⎩ P ( x ,...,x n ) P ( x i ∣ par ( x i )) , if x ′ i = x i , if x ′ i ≠ x i . (2.52)Importantly, this allows us to decide, given two nodes X and Y , whether or not X is a causeof Y — if there exists a value x ∈ X such that P ( y ∣ do ( x )) ≠ P ( y ) for some y ∈ Y , then X is acause of Y (or equivalently, there is a directed path from X to Y ). Otherwise, we can concludethat X is not a cause of Y or equivalently that there is no directed path from X to Y (if thecausal model is faithful) . This formalises the intuition explained in Figure 2.9, where checkingwhether an active intervention ( I X ≠ idle ) on X changes the distribution over Y allowed usto rule out 3 of 5 causal explanations. As we will see in the following paragraphs, checkingwhether the do-conditional P ( y ∣ do ( x )) diﬀers from the regular conditional P ( y ∣ x ) , can allowus to pick out a unique causal structure in this case. Interventions on subsets of nodes:

This procedure naturally extends to interventions onsubsets of the nodes. If a simultaneous intervention is performed on a subset X of the nodes { X , ..., X n } , an intervention variable I X i will be introduced for each element X i of the subset X , along with the corresponding edge I X i —→ X i . Then conditioning on each of the I X i for X i ∈ X being idle , the original graph is obtained and conditioning on I X ′ ≠ idle for X ′ ⊆ X (i.e.,a “do” operation has been performed on every node in X ′ ), the corresponding graph G do ( X ′ ) isdeﬁned by cutting oﬀ all incoming edges (except I X i —→ X i for all X i ∈ X ′ ) in G . Then thedo conditional P ( x , ..., x n ∣ do ( x ′ )) and its relation to the original distribution P ( x , ..., x n ) aredeﬁned analogously, using Equations (2.51) and (2.52) for the graph G do ( X ′ ) .The importance of the transformation (2.52) is that it allows for counter-factual reasoning bymathematically relating the post-intervention distribution to the pre-intervention distribution.It tells us that if all the nodes in a causal structure are observed, then we can deduce the resultof any intervention (possibly on a subset of nodes) just by passively observing the correlationsover the variables and using mathematical manipulations, without physically performing anyinterventions . This ability to deduce the post-intervention distribution only from the observeddistribution in a consistent manner is known as identiﬁability . Identiﬁability of interventionaleﬀects is of crucial practical use, especially in situations where an actually intervention may bephysically (or ethically) forbidden, for example testing the eﬀect of certain drugs and treatmentson humans. In causal structures that include unobserved nodes, this kind of counter-factual Note that in unfaithful causal models, it is possible to have P ( y ∣ do ( x )) = P ( y ) for all x ∈ X and y ∈ Y even when X is a cause of Y , and checking for the violation of this condition only provides a suﬃcient test ofcausation but not a necessary one. This point is crucial for the results of Chapter 6 , which concern unfaithfulcausal models. HAPTER 2. PRELIMINARIES reasoning may not always be possible i.e., the eﬀect of interventions may not be identiﬁablefrom the observed distributions, as explained below.

Interventions in the presence of unobserved nodes:

For a causal structure containingunobserved nodes, interventions can only be performed on the observed nodes. For a clas-sical causal structure G C , the unobserved nodes are also random variables. Let { X , ..., X n } and { Λ , ..., Λ m } denote the set of observed and unobserved nodes of G C respectively. Anyobserved distribution P X ,...,X n compatible with G can be obtained by applying the Markovcondition (2.45) to the hypothetical distribution over all nodes and summing over all possiblevalues of the unobserved nodes i.e., P X ,...,X n = ∑ Λ ,..., Λ m Π ni = P X ∣ par ( X i ) Π mj = P Λ j ∣ par ( Λ j ) . (2.53)Considering interventions only on observed nodes, Equations (2.51) and (2.52) can be derivedanalogously using this compatibility condition. However there are two important distinctionsto be noted. Firstly, the observed distribution P X ,...,X n is in general not Markov (and hencenot compatible) with respect to the sub-graph involving only the observed nodes and edgesconnecting them. However the overall distribution over observed and unobserved nodes isMarkov with respect to the complete DAG involving all these nodes if we assume that graphaccounts for all the unobserved common causes and corresponds to a classical causal structure.Secondly, the post-intervention distribution may not be identiﬁable from the observed, pre-intervention distribution alone. This is because, an intervened node X i may contain unobservedparents in the original causal structure and Equation (2.52) would then require division by aquantity that cannot be obtained from the observed distribution alone. More explicitly, thefollowing example describes a non-identiﬁable causal structure involving an unobserved node. Example 2.5.3 (A non-identiﬁable causal structure) . Consider the causal structure of Fig-ure 2.9d with X , Y observed and Λ unobserved. The augmented and post-intervention causalstructures corresponding to an intervention on X are illustrated in Figure 2.10. Observeddistributions compatible with the original causal structure are given as P XY = ∑ Λ P Λ XY =∑ Λ P Λ P X ∣ Λ P Y ∣ X Λ . Note that any distribution P XY can be written in this form using Bayes’rule and marginalising over some variable Λ . The post-intervention distribution P XY ∣ I X (with I X ≠ idle ) is compatible with the graph G do ( X ) and is given as P XY ∣ I X = ∑ Λ P Λ P X ∣ I X P Y ∣ X Λ .Equation (2.52) in this case corresponds to P ( xy ∣ do ( x ′ )) = ⎧⎪⎪⎨⎪⎪⎩∑ λ ∈ Λ P ( λxy ) P ( x ∣ λ ) = ∑ λ ∈ Λ P ( λ ) P ( y ∣ xλ ) , if x ′ = x , if x ′ ≠ x. (2.54) Note that the transformation involves division by a value corresponding to the distribution P X ∣ Λ which cannot be determined from the observed distribution P XY alone. Hence the eﬀect of theintervention is not identiﬁable for this causal structure. We will discuss identiﬁability of causalstructures in further detail in Chapter 6 . In the above example, the do-conditional P ( y ∣ do ( x )) is simply the non-zero expression inthe above equation i.e., P ( y ∣ do ( x )) = ∑ λ ∈ Λ P ( λ ) P ( y ∣ xλ ) . The regular conditional P ( y ∣ x ) is HAPTER 2. PRELIMINARIES

X Y Λ (a) X Y Λ I X (b) X Y Λ I X (c) Figure 2.10: Pre-intervention, augmented and post-intervention causal struc-tures:

Taking the original, pre-intervention causal structure to be that of Figure 2.9d(repeated here in (a) for completeness), parts (b) and (c) of this ﬁgure illustrate theaugmented and post-intervention causal structures corresponding to an intervention on X in the causal structure of (a). In the former, the variable I X can take values in theset { idle, { do ( x )} x ∈ X } while in the latter, it can only take the values { do ( x )} x ∈ X corre-sponding to an active intervention. Conditioned on I X = idle , we eﬀectively obtain theoriginal causal structure (a) which corresponds to no intervention being performed, asspeciﬁed by Equation (2.51). obtained using the Markov compatibility condition as P ( y ∣ x ) = ∑ λ ∈ Λ P ( λ ) P ( x ∣ λ ) P ( y ∣ xλ ) P ( x ) . As long as P ( x ∣ λ ) ≠ P ( x ) for some values x ∈ X, λ ∈ Λ, the two conditionals will be diﬀerent. Note thatthis must indeed hold i.e., X and Λ cannot be independent if the distribution is faithful, sincethey are d-connected in the causal structure. Therefore, for faithful causal models over thecausal structure 2.9d, there exist values x ∈ X, y ∈ Y such that P ( y ∣ do ( x )) ≠ P ( y ∣ x ) . On theother hand, one can easily check that for the causal structure 2.9a, P ( y ∣ do ( x )) = P ( y ∣ x ) alwaysholds— the intuition being that, in this case X is a parentless (i.e., exogenous) node, hence theintervention does not change any of the causal relations that were present in the original graph.In general, interventions on an exogenous node X does not change the distribution, and whetheror not X is a cause of Y in this case can simply be decided by checking whether or not X and Y are correlated i.e., whether P Y ∣ X = P Y . In conclusion, while the do-conditional P ( y ∣ do ( x )) for the causal structure 2.9d is not identiﬁable from the original observed distribution P ( xy ) , itcan be determined experimentally by physically performing the intervention. Comparing thisexperimentally obtained estimate of the do-conditional with the original distribution will allowus to uniquely identify one among the 5 causal structures of Figure 2.9, if we assume that thecausal model is faithful. Remark 2.5.1.

Note that there is a slight but harmless abuse of notation in this section.The intervention variable I X takes diﬀerent values in the augmented graph G I X and the post-intervention graph G do ( X ) . It takes values in { idle, { do ( x )} x ∈ X } and { do ( x )} x ∈ X in the two cases HAPTER 2. PRELIMINARIES respectively. We nevertheless use the variable name to also denote the set of possible values ittakes, but will be careful to specify which graph we are referring to whenever this happens.

A typical laboratory experiment uses devices that take classical inputs (settings of knobs orbuttons) and produce classical outputs (reading of a pointer or numbers on a screen). Thisclassical, observed data may however be produced through quantum mechanisms. Bell [22]famously showed that local measurements on a shared quantum system can lead to strongercorrelations than those on any shared classical system. In recent years, Wood and Spekkens[225] have reformulated Bell’s Theorem in the language of Bayesian networks [160], showingthat quantum correlations cannot be explained by faithful classical causal models, and thereforethe framework of [160] must be suitably generalised for deﬁning quantum causal models . Moregenerally, one can also consider post-quantum systems such as non-local or non-signaling boxes[202, 171], that produce stronger-than-quantum correlations. These can be seen as elementsof a more general class of theories, generalised probabilistic theories (GPTs, see Section 2.4),of which quantum and classical theories emerge as particular cases. Hensen, Lal and Pusey(HLP) [115] were the ﬁrst to deﬁne a generalised causal model that allows unobserved nodesto be either classical, quantum or more generally, GPT. This enables a characterisation of thetheory-independent aspects of causal models and allows us to derive bounds on the correlationsrealisable in causal structures depending on the nature of their unobserved nodes, which hasfoundational as well as practical applications. (Generalised causal structure) . A generalised causal structure , denoted G G isa causal structure in which every observed node is associated with a classical random variableand every unobserved node is associated with a system belonging to a generalised probabilistictheory. Generalised causal structures include quantum G Q and classical causal structures G C .In the former case, each unobserved node is associated with a quantum system, and in thelatter case, each unobserved node is associated with a classical random variable.For a classical causal structure G C , the Markov factorization condition (2.45) provides a compat-ibility condition for a joint distribution with the causal structure. In case there are unobservednodes, the observed distribution is simply obtained by taking a joint distribution over all thenodes that factorises according to the Markov condition, and then marginalising over the un-observed variables. If the unobserved systems are non-classical, this does not work since a jointdistribution over all the nodes may no longer exist and a more general compatibility condition Unless one is willing to accept ﬁne-tuned classical causal explanations that involve hidden inﬂuences prop-agating superluminally or retrocausally. Quantum and GPT systems do not co-exist with the outcomes of measurements performed on them, there-fore one cannot in general assign a joint probability distribution over quantum/GPT nodes and their children. HAPTER 2. PRELIMINARIES that takes into account quantum and GPT mechanisms is required.

Post-quantum/GPT causal structures:

The operational probabilistic framework of [52]provides a method of representing operations and transformations on GPT systems usingcircuit-like diagrams. Elements of this generalised circuit are called tests , which take as in-put a system and produce as output another system, and a classical outcome variable. Inputsand outputs of diﬀerent tests are connected through wires which carry the associated systemsalong them. If a test has a trivial input system, then it is called a preparation test and if it has atrivial output system, it is called an observation test . Any test from the trivial system to itself isa probability distribution, hence probabilities can be generated by composing tests using wires.In the HLP framework [115], every node of a generalised causal structure is associated with atest. In the case of observed nodes, the output system of the test is simply a classical randomvariable and for unobserved nodes, this corresponds to a GPT system. Further, the systemsdescribed by the operational probabilistic framework [52] are causal in the sense that an out-come at an earlier time cannot depend on the choice of operation performed on the system at alater time. Then a distribution P over the observed nodes of a generalised causal structure G G is said to be compatible with G G if there exists a causal operational probabilistic theory, a testcorresponding to each node and a system corresponding to each edge of G in the theory thatgenerate the observed distribution P . The formula for generating P in this manner is analo-gous to the Markov condition (2.45), where, instead of a product of conditional distributions P X i ∣ X ↓ i , we have a product of tests that map the unobserved parents of a node to its unobservedchildren, conditioned on the observed parents. This is called the generalised Markov condition ,and the corresponding generalised causal structure is represented by a directed acyclic graphor a GDAG. The set of all distributions compatible with a generalised causal structure G G willbe denoted by P (G G ) . Note that by construction these are distributions over observed nodesonly. Quantum causal structures:

A GPT of particular interest is quantum theory and the causalstructure framework of [115] includes quantum causal structures as a special case. There arehowever several frameworks for describing quantum causal structures that are compatible witheach other, but diﬀer essentially on how nodes and edges of the causal structure are identiﬁedwith quantum systems and transformations. We will mainly stick to the description that followsfrom the generalised causal structures of [115], to provide a uniﬁed picture, but use a languageand notation similar to that of [216] for convenience. We will discuss the relation to otherapproaches at the end of this section. For a quantum causal structure G Q , every exogenousnode A i.e., those without any incoming edges correspond to a preparation of a density matrix ρ A ∈ S ( H A ) . In the case of observed nodes, ρ A is classical. This can be seen as a preparationtest, where the trivial input system is associated with the one dimensional Hilbert space H = C .Each directed edge corresponds to a quantum system and hence to a Hilbert space, which wewill label by the starting and ending node of the edge. For example, if a node A only has twochildren Y and Z , then the Hilbert space of A will be factorised as H A = H A Y ⊗ H A Z , wherethe subsystem H A Y corresponds to the edge from A to Y , and H A Z to the edge from A to Z . If A is a classical node, then the two factors A Y and A Z are taken to be copies of A , since HAPTER 2. PRELIMINARIES classical information can be copied.As in the HLP framework, each node is labelled by its output state which is classical for observednodes and quantum for unobserved nodes in G Q . Every observed node X represents the outputstatistics of a measurement performed on the incoming systems. If this node has unobservedparents then X can be seen as the result of a quantum measurement (implemented througha POVM) on those incoming quantum states, where the choice of measurement can possiblydepend on the classical value of other observed parents of X . If X only has observed parents,then it can be seen as the output as a stochastic map on the incoming classical random variables.More generally, tests correspond to CPTP maps that map the joint state of all incoming edgesto the joint state of all outgoing edges. Preparations, measurements (POVMs) and arbitraryclassical stochastic maps can be seen as special cases of CPTP maps. A distribution, P over theobserved nodes of a causal structure G Q is compatible with G Q if there exists a quantum statelabelling each unobserved node (with subsystems for each unobserved edge) and CPTP maps,i.e., preparations or quantum transformations for each unobserved node as well as POVMs orstochastic maps for each observed node, that allow for the generation of P in accordance withthe Born rule. We will denote the set of all observed distributions compatible with a quantumcausal structure G Q by P (G Q ) , and for a classical causal structure G C by P (G C ) . Theory dependent and theory-independent aspects:

We already noted that in causalstructures with non-classical nodes, a joint distribution over all nodes may no longer be deﬁned.Further in classical causal structures, it is implicitly assumed that the complete informationabout a node is transmitted to each of its children. Then each node receives complete informa-tion about all its parents and performs a classical stochastic map to obtain the output. Thestochastic nature of this map is taken to arise from lack of knowledge about additional “errorvariables”, and once the parents and the error variables are given, the map corresponds to adeterministic function . These aspects are not problematic in the classical case since classicalinformation can be copied. However arbitrary states of quantum and GPT systems cannotbe copied due to the no-cloning theorem . Hence, after a measurement on such a system, themeasurement outcome and the initial state do not co-exist and one can no longer condition onunobserved systems in the same manner . Neither can complete information about the stateof an unobserved node be transmitted to all its children. Further, arbitrary quantum or GPTmaps cannot be reduced to deterministic functions by including additional information. Thesetheory-dependent diﬀerences mean that the sets of compatible correlations P (G C ) , P (G Q ) ,and P (G G ) for a causal structure G need not coincide in general. Causal structures thatsupport such a gap between the sets of classical and non-classical correlations are termed as in- Introducing suitable error variables and expressing the functional dependence of each node on its parentsthrough structural equations is the basis of the structural causal model approach for classical causal structures.This is compatible with the Bayesian networks approach discussed in Section 2.5.1, and can been seen asproviding underlying classical mechanisms to the probability assignments. Both approaches are covered inPearl’s book [160]. The quantum causal models of [6, 132] tackle this issue by introducing the notion of a conditional quantumstate of an output conditioned on an earlier input. These are interpreted not as quantum states in the usual sense(since they correspond to systems at diﬀerent times that do not coexist), but as a mathematical representationsof the quantum channel transforming the input to the output system. HAPTER 2. PRELIMINARIES teresting causal structures , and [115] identify certain necessary conditions for a causal structureto be interesting. To do so, they prove the following theory-independent property of gener-alised causal structures, which is an analogue of Theorem 2.5.2, showing that the d-separationcondition applies to classical, quantum and post-quantum causal structures alike. Identifyingsuch theory-independent aspects is also useful because for instance, there is no consensus onthe representation of states and transformations in arbitrary GPTs, even though this exists forspeciﬁc GPTs such as box world (See Section 2.4).

Theorem 2.5.3 (Hensen, Lal, Pusey 2014) . Let G G be a generalised causal structure and let X , Y and Z be pairwise disjoint subsets of observed nodes in G G . If a probability distribution P iscompatible with G G , then the d-separation X ⊥ d Y ∣ Z implies the conditional independence X Æ Y ∣ Z . Conversely, if for every distribution P compatible with G G , the conditional independence X Æ Y ∣ Z holds, then the d-separation relation X ⊥ d Y ∣ Z also holds in G G . Using this, necessary conditions for a causal structure G to be interesting are found in [115]by ﬁrst identifying suﬃcient conditions under which the only restrictions on P (G C ) are thoseimplied by d-separation, and then using Theorem 2.5.3 to conclude that such causal structuresmust be uninteresting i.e., P (G C ) = P (G Q ) = P (G G ) . Simple examples of causal structuresthat satisfy this equivalence and are hence uninteresting, are those of Figures 2.9c- 2.9e. In thesecausal structures, there are no restrictions implied by d-separation on the observed distribution P XY and hence arbitrary distributions are compatible with these causal structures irrespectiveof the nature of the unobserved node Λ. Examples of causal structures that do not satisfythis equivalence and are hence interesting are the bipartite Bell ( G B ) and Triangle ( G T ) causalstructures illustrated in Figure 2.6. In the former case, d-separation implies the independenceof the inputs A and B as well as the no-signaling conditions ( X Æ B ∣ A and Y Æ A ∣ B ) which aresatisﬁed by all observed distributions P (G CB ) , P (G QB ) and P (G GB ) . However, P (G CB ) satisfyadditional constraints given by Bell inequalities, which are not satisﬁed by all distributionsin P (G QB ) and P (G GB ) . In fact, there is also a gap between the sets P (G QB ) and P (G GB ) ,the former satisﬁes Tsirelson’s inequalities while the latter does not. Example 2.5.4 describesthe sets P (G CB ) and P (G CB ) for the bipartite Bell causal structure G B . That G T is also aninteresting causal structure can be witnessed by suitably embedding the Bell scenario G B in theTriangle G T such that non-classicality of correlations in G T follows from that of G B [88] (this isexplained in Section 4.3.1 of Chapter 4). The simplest causal structure that is interesting i.e.,supports a classical-quantum gap is the instrumental causal structure of [159, 45, 204], whichis essentially the causal structure of Figure 2.9d (which by itself is not interesting) but with anadditional node and edge Z —→ X . Example 2.5.4 (Sets of compatible correlations in the bipartite Bell causal structure G B ) . In the classical causal structure G CB , the set of compatible (observed) distribution is obtainedby assuming a joint distribution P Λ XY AB ∈ P over all nodes, that satisﬁes the Markov condi-tion (2.45) and marginalising over the unobserved node Λ , P (G CB ) ∶= { P XY AB ∈ P ∣ P XY AB = ∑ Λ P Λ P A P B P X ∣ A Λ P Y ∣ B Λ } . (2.55) If Λ is a continuous random variable, the sum is replaced by an integral over Λ . This compat-ibility condition for the classical causal structure G CB is in fact identical to the local-causality HAPTER 2. PRELIMINARIES condition used in the derivation of Bell inequalities (see [39] for a comprehensive review).In the quantum causal structure G QB , the unobserved node Λ corresponds to a quantum state ρ Λ ∈ S ( H Λ ) = S ( H Λ X ⊗ H Λ Y ) , and the observed nodes X and Y are associated with thePOVMs { E XA } X and { E YB } Y , that act on the subsystems H Λ X and H Λ Y , depending on theinputs A and B respectively to generate the output distribution. P (G QB ) ∶= { P XY AB ∈ P ∣ P XY AB = tr (( E XA ⊗ E YB ) ρ Λ ) P A P B } . (2.56) Interventions in generalised causal structures can be modelled in the same manner as the clas-sical case (Section 2.5.1.2), where a non-trivial intervention on an observed node X (or a subsetof of nodes) introduces a new edge I X —→ X and cuts oﬀ all other incoming arrows to X . Thismeans that we can only intervene upon classical variables and not quantum systems. Notehowever that we can model situations involving the manipulation of quantum/GPT systems byallowing the choice of preparation, transformation or measurement on the system to depend ona classical variable. Then an intervention on such a variable will inﬂuence the map performedon the quantum/GPT system. Equation (2.51) relates the conditional probabilities P X i ∣ par ( X i ) for the pre and post intervention causal structures, but this can involve conditioning on possiblyunobserved parents which is not possible in generalised causal structures. Equation (2.51) canhowever be generalised by specifying how the test (i.e., the causal mechanisms) correspondingto each node transforms under an intervention. In [115], each node is associated with a test thatmaps all joint state of the incoming (unobserved) subsystems to the joint of the outgoing (unob-served) subsystems, depending on the values of the observed parents. Under an intervention onthe node X , the corresponding test would remain unchanged if I X = idle , and would eﬀectivelycorrespond to a preparation test that prepares X in a speciﬁc value x ∈ X (deterministically)whenever I X = do ( x ) , and traces out the systems associated with all other incoming edges. So far, the procedure is similar to the classical case, however, there is one crucial diﬀerence.Equation (2.52) which relates the pre and post intervention distributions and thereby providesa rule for counterfactual inference no longer holds for non-classical causal structures, since thisequation is derived using the Markov compatibility condition (2.45), which does apply to thiscase. Since non-classical causal structures by construction include at least one unobservednode, one would expect that the observed post-intervention distribution may not always be de-termined by the observed pre-intervention distribution, as is the case also with classical causalstructures involving unobserved nodes (see Example 2.5.3). [165] shows that it is impossible toderive any general inference rule relating the observed distributions in pre and post interven- Note that we can view do ( x ) eﬀectively as a preparation test even though the node X has an input edge I X —→ X , because a) I X is just a hypothetical variable that is introduced to provide an agency to the interventionin the augmented graph approach, but interventions can be equivalently described by an alternative approachthat does not introduce any additional variables [160], and b) The value of X is deterministically correlatedwith I X , such that if do ( x ) is seen as a preparation on the exogenous variable I X , this deterministically inducesthe preparation X = x on X . HAPTER 2. PRELIMINARIES tion quantum causal structures. This means that it may not always be possible to deducethe post-intervention distribution counterfactually using the pre-intervention distribution, in-stead one would have to physically perform the intervention (i.e., an additional experiment,after the original experiment that collected the pre-intervention statistics) to characterise thisdistribution and all the do-conditionals. However, [164] and [165] provide conditions for iden-tifying quantum causal structures where an inference rule analogous to (2.52) does exist andfully counterfactual inference is possible. This is still a ﬁeld of ongoing research, we will discussinterventions in non-classical causal structures in further detail in Chapter 6.

The main approach to quantum and post-quantum causal structures adopted in this thesis,that we have discussed so far is based on the framework of [115]. This approach retains d-separation and conditional independence criteria from the classical Bayesian networks approach[160] while generalising the Markov condition (2.45). There are a several other frameworks forquantum causal models that build on diﬀerent aspects of Pearl’s classical framework [160].These frameworks are consistent with each other but can diﬀer in scope due to how theyformalise the notion of a causal model in quantum settings i.e., how the nodes and edges arerepresented, what compatibility conditions are used, and type of interventions/manipulationsthat are considered.[166, 67, 6] have proposed generalisations of the Reichenbach principle [177] to the quantumsetting. Among these, [166] is based on a new graph separation condition for quantum causalstructures called q-separation, while [67, 6] are based on formulating quantum networks usingquantum channels and their Choi states. [6] as well as an earlier work [132] construct quan-tum networks by representing quantum channels as conditional quantum states (analogous toconditional probabilities). The framework of [6] will employed in one of our results, we brieﬂyoutline it here and present the relevant details in Chapter 4 where it will be used. In [6], nodescorrespond to systems and edges to transformations, in contrast to the HLP framework [115]where transformations (tests) occur at the nodes and the edges propagate systems. All nodesare taken to be quantum systems and a distinction between observed and unobserved nodes isnot made. Then a conditional quantum state ρ A ∣ par ( A ) for a node A conditioned on its parents(analogous to conditional distributions P X ∣ par ( X ) ) is deﬁned through the Choi-Jamiolkowski rep-resentation of the quantum channel that maps the quantum systems corresponding to par ( A ) to that corresponding to A . With this, a quantum Markov condition, completely analogous tothe classical case (2.45) is deﬁned. This also allows for the deﬁnition of a joint state σ A ,...,A n over all the nodes { A , ..., A n } , even though the systems corresponding to these nodes may notcoexist. This joint state σ characterises the structure of the channels connecting the variousnodes and not the quantum states of the systems. Interventions correspond to CPTP mapsthat act on quantum nodes and the causal model provides a prescription for obtaining thepost-intervention distribution given the joint state σ and the CPTP maps corresponding to theinterventions, through a formula analogous to the Born rule. The causal model framework of [165] is slightly diﬀerent than what we have presented here, but neverthelesscompatible with this picture. This is explained further in the next subsection. HAPTER 2. PRELIMINARIES

In another recent framework for quantum causal models [164, 165] (also noted in the previoussubsection), all nodes are taken to be classical, while edges correspond to propagating quantumsystems that are acted upon by transformations that depend on the values of the classicalnodes. Further, frameworks such as [182] have shown that quantum causal models can providean information-theoretic advantage for causal inference by utilising quantum properties such asentanglement and superposition.The discussion so far has focussed on the assumption that a ﬁxed causal structure (betweenclassical/quantum or post-quantum systems) exists, even if it may be unknown. More generalapproaches to non-classical notions of causality have considered scenarios where not only thesystems in a causal structure, but the causal structure itself is of a quantum nature [111,154, 231]. In such frameworks, quantum superpositions of the causal and temporal ordersbetween events are typically considered by dropping the assumption of a ﬁxed backgroundspace-time or causal structure, and the study of such indeﬁnite causal structures have garneredmuch research interest in recent years and shown to provide additional information-theoreticadvantages [53, 9, 8, 10, 21, 107, 32, 155, 172, 51, 153, 18]. Here, analogous to Bell inequalities,which certify the non-classicality of correlations in a ﬁxed causal structure, causal inequalitieshave been proposed for certifying the non-classicality of the causal structure itself [154]. Unlikethe former which have been violated in several experiments, violations of the latter are modelledin several theoretical frameworks [154, 106, 32, 18], but whether causal inequalities can bephysically violated remains a thriving open question. Further, scenarios where quantum systemsare superposed/entangled over space as well as in time have been physically implemented.The so-called causal box framework [172] models information processing tasks where quantumstates can be delocalised in states as well as time, in a manner compatible with our relativisticcausality. This framework has applications for relativistic quantum cryptography as well as forfoundational questions regarding causal inequalities, as we have shown in a previous work [210],and an ongoing work (both of which are not included in this thesis). Finally, there also existgeneral frameworks for quantum causality that take into account gravitational eﬀects, with theaim of developing an operational understanding of quantum gravity [111, 112, 231, 42]. In daily life and as part of the scientiﬁc method, we often use logic to reason about our observa-tions and to predict the possibility of future observations. We do so by assigning truth values todiﬀerent statements depending on what we observed and what we might already know. Modallogic is broadly speaking, a family of formal systems of logic that are associated with diﬀerentmodalities for assigning truth values to statements, such as “it is necessary that” (alethic), “it For example, three sequential operations on a quantum state are represented by the causal structure X —→ Y —→ Z , where X speciﬁes the preparation of a quantum state ρ X that travels to the next node where a CPTPmap E Y labelled by Y acts on ρ X , and ﬁnally a POVM labelled by Z , { E Z } Z acts on the output of the CPTPmap E Y ( ρ X ) . Vilasini, V., del Rio, L. and Renner, R. Causality in deﬁnite and indeﬁnite space-times. In preparation(2020). HAPTER 2. PRELIMINARIES is known that” (epistemic), “it ought to be the case that” (deontic), or “it has been the casethat” (temporal). In this thesis, we will focus on the branch of modal logic, namely epistemicmodal logic that is applied for reasoning about knowledge. This is particularly useful in sce-narios (both classical and quantum) where where multiple agents reason about each others’knowledge to make logical deductions. Examples include games like poker, logical hat puz-zles , extended Wigner’s friend experiments in quantum theory [85, 152] (Chapter 7) and inpost-quantum theories [209] (Chapter 8), among others [41, 82, 133, 134].Here we provide a brief summary including the main features of epistemic modal logic that arerelevant for this thesis. Modal logic applies to most classical multi-agent setups, and can be seenas a compact mathematical way to capture some of intuitive laws commonly used for reasoningin classical settings. We will ﬁrst review the basics of the standard framework [131], and thendiscuss the weaker versions of the axioms proposed recently in [152] for modelling scenariosinvolving multiple quantum agents. For further reading and a more in depth understanding ofthe modal logic framework, we refer the reader to [41, 101, 82, 133, 134]. We will consider a ﬁnite set of agents { A , ..., A n } who describe the world by means of primitivepropositions φ , φ , ... belonging to a some set Φ. These primitive propositions correspond tosimple statements about the world, for example, φ = “Alice is a person” or φ = “Alice hasa secret key”. The framework of modal logic allows for compactly representing the knowledgethat agents possess about these primitive statements as well more complex, chained statementsabout the knowledge of other agents, such as ˜ φ = “Alice knows that Bob knows that Eve doesn’tknow the secret key k , and Alice further knows that k = worlds ) as introducedby [131] in the context of modal logic: for example, in a world s ∈ Σ the key value is k = s ∈ Σ Eve could know that k =

0. The truthvalue of a proposition φ is then assigned depending on the possible world in Σ, and can diﬀerfrom one possible world to another. The setup we will employ for modelling multiple reasoningagents is associated the following structure. Deﬁnition 2.6.1. (Kripke structure)

A Kripke structure M for n agents over a set ofstatements Φ is a tuple ⟨ Σ , π, K , ..., K n ⟩ where Σ is a non-empty set of states, or possibleworlds, π is an interpretation, and K i is a binary relation on Σ.The interpretation π is a map π ∶ Σ × Φ → { true , false } , which deﬁnes a truth value of astatement φ ∈ Φ in a possible world s ∈ Σ. K i is a binary equivalence relation on a set of states Σ, where ( s, t ) ∈ K i if agent i considers For an analysis of classical hat puzzles in the epistemic modal logic language, see [152]. This is an entirely classical concept and is not to be confused with “many worlds” type interpretations ofquantum theory.

World here only represents potential alternative situations i.e. a classical list of possibilities,it does not mean that all these situations are actually realised. HAPTER 2. PRELIMINARIES world t possible given his information in the world s .The truth assignment tells us if the proposition φ ∈ Φ is true or false in a possible world s ∈ Σ;for example, if φ = “you are reading this PhD thesis” and s is the present world in which youhave indeed chosen to read this thesis, then π ( s, φ ) = true . On the other hand, if φ ′ = “Thereis a mammoth in the snow” then π ( s, φ ′ ) = false in the current world where mammoths are(unfortunately) extinct, however in a world s ′ corresponding to the ice age φ ′ can indeed be atrue statement. Therefore the truth value of a statement in a given structure M might varyfrom one possible world to another; we will denote that φ is true in world s of a structure M by ( M, s ) ⊧ φ , and ⊧ φ will mean that φ is true in any world s of a structure M .Naturally, agents need not possess complete information about a possible world they are in,and may consider other alternative worlds possible; for example, if Bob doesn’t know whetherAlice has a secret key, he can consider as possible both the world where Alice has such a key,and one where she doesn’t. This situation is captured by binary relations K i , as formalised inthe following deﬁnition. Deﬁnition 2.6.2 (Knowledge operators K i ) . We say that agent A i “knows” φ in a world s ∈ Σi.e., ( M, s ) ⊧ K i φ if and only if for all possible worlds t ∈ Σ such that ( s, t ) ∈ K i (that is, all the worlds deemedpossible by the agent, given their knowledge), it holds that ( M, t ) ⊧ φ .Knowledge operators combined with the standard logical operators allow us to compactly ex-press complex statements, for instance, our earlier example ˜ φ = “Alice knows that Bob knowsthat Eve doesn’t know the secret key k , and Alice further knows that k =

1” becomes K A [( K B ¬ K E k ) ∧ k = ] . The axioms of knowledge [95] provide certain rules for combining the statements producedby diﬀerent agents and to compress them for deducing new statements. These axioms mightappear to be common-sensical, but it is imperative to state them formally since not all of theseaxioms apply in the quantum world, that often surprises our daily common-sensical intuitions.The distribution axiom allows agents combine statement which contain inferences:

Axiom 1 (Distribution axiom.) . If an agent is aware of a proposition φ and that anotherproposition ψ follows from φ , then the agent can conclude that ψ holds: ( M, s ) ⊧ ( K i φ ∧ K i ( φ ⇒ ψ )) ⇒ ( M, s ) ⊧ K i ψ. Knowledge generalization rule permits agents use commonly shared knowledge: HAPTER 2. PRELIMINARIES

Axiom 2 (Knowledge generalization rule.) . All agents know all the propositions that are alwaysvalid in a structure: ( M, s ) ⊧ φ ∀ s ∈ Σ ⇒⊧ K i φ ∀ i. At ﬁrst sight, this might appear to be a strong assumption. However, this depends on whatΣ is chosen to be for a particular purpose, and characterises what statements can be assumedto be the basic, common knowledge of all agents. For example, we would ﬁnd it reasonable toexpect that the propositions φ =“objects fall when we drop them”, φ =“guitars have strings”, φ =“we cannot signal from future to past” to be common knowledge, such that they hold inevery s ∈ Σ that we wish to consider. However, if we wish to consider hypothetical worldswhere guitars are wind instruments or where we can signal to the past (as physicists workingin foundations, including ourselves often do), we can simply include a world s ′ ∈ Σ where thestatements φ or φ are false, and hence do not correspond to the common knowledge of theagents. In fact, parts of Chapter 6, will indeed be set in such a hypothetical world where φ isnot true. One can also treat a theory (such as quantum theory) or aspects of the theory (suchas the Born rule) to be the common knowledge of the agents that we wish to consider, as wewill see in Chapter 8. Axiom 3 (Truth axiom) . If an agent knows that a proposition is true then the proposition istrue, ( M, s ) ⊧ K i φ ⇒ ( M, s ) ⊧ φ In philosophy, the truth axiom is often considered as a candidate for a property that distin-guishes knowledge from belief.Positive and negative introspection axioms highlight the ability of agents to reﬂect upon theirknowledge:

Axiom 4 (Positive and negative introspection axioms.) . Agents can perform introspectionregarding their knowledge: ( M, s ) ⊧ K i φ ⇒ ( M, s ) ⊧ K i K i φ (Positive Introspection), ( M, s ) ⊧ ¬ K i φ ⇒ ( M, s ) ⊧ K i ¬ K i φ (Negative Introspection). Weaker version:

The truth axiom assigns objective reality to all statements that agentsknow which can be particularly problematic for quantum settings. Hence a weaker alternativeto the truth axiom was proposed in [152].The trust structure governs the way the information is passed on between agents:

Deﬁnition 2.6.3 (Trust) . We say that an agent i trusts an agent j (and denote it by j ↝ i )if and only if ( M, s ) ⊧ K i K j φ (cid:212)⇒ K i φ, ∀ φ ∈ Φ , s ∈ Σ HAPTER 2. PRELIMINARIES

Note that trust is neither a symmetric nor a transitive relation and hence does not deﬁne apre-order on the agents. This is also in line with what we might expect common-sensically; Imight trust my favourite news reporter (R) to deliver reliable news such that K I K R φ ⇒ K I φ would hold in all worlds s relevant to me and for all statements φ relating to the news (whichI could take to characterise the sets Σ and Φ) i.e., R ↝ I . However R, who does not knowme has no reason to trust what I say, even if it relates to the news hence K I K R φ /⇒ K I φ orequivalently R /↝ I . Similarly, R might have a trusted source S for acquiring the news, and S inturn may acquire that news through a source T whom she trusts i.e., S ↝ R and T ↝ S . WhileR would trust the same information when relayed indirectly through the agent S, R may notknow of T’s existence and hence not trust T if he approached R directly, hence T ↝ R does notnecessarily hold.While replacing truth with the notion of trust may suﬃce in standard quantum settings, as wewill see in Chapters 7 and 8, it turns out to be incompatible with situations where reasoningagents are themselves modelled as physical systems of quantum or post-quantum theories [152]. art IApproaches to causality innon-classical theories orrelation doesn’t imply causation, but it does waggle its eyebrowssuggestively and gesture furtively while mouthing ‘look over there.’- Randal l Munroe HAPTER Overview of techniques for analysing causal structures I n Section 2.5, we provided an overview of the frameworks used for modelling causal structuresin classical, quantum and post-quantum theory. We motivated the problem of certifying thenon-classicality of causal structures, which has become a focal point of research interest inquantum information by virtue of having deep foundational as well as practical implications.In this chapter, we outline some of the techniques used for this purpose. Section 3.1 andSection 3.2 provide an overview of the main probabilistic and entropic techniques employed inChapters 4 and 5, taking as example the bipartite Bell causal structure (Figure 2.6a), whichis a causal structure of particular interest in this thesis. These sections are largely based onthe review sections of our published papers [207] and [208]. The main results of these papersare presented in Chapters 4 and 5. All of the above focus on the case of faithful and acycliccausal structures, as does the majority of literature in the ﬁeld, and justiﬁably so, since ourobservations suggest that causal inﬂuences propagate in one direction i.e., from past to future.The present chapter will also focus only on the acyclic case. In Chapter 6, we develop aframework for modelling cyclic and ﬁne-tuned causal structures in non-classical theories andwill review the necessary preliminaries for cyclic causal structures there.

In the probability space characterisation of bipartite Bell causal structure G B (Figure 2.6a), theobserved correlations are represented through the conditional distribution P XY ∣ AB . Each pointin the probability space corresponds to such a conditional distribution and it will be convenientto denote the cardinalities ∣ A ∣ , ∣ B ∣ , ∣ X ∣ and ∣ Y ∣ as i A , i B , o A and o B (representing the inputs andoutputs of the parties Alice and Bob) respectively. Each Bell scenario is deﬁned by the 4-tuple ( i A , i B , o A , o B ) . In a given scenario we will be interested in the sets of local and non-signaling distributions, which are the sets of conditional distributions P XY ∣ AB corresponding to the sets58 HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES P (G CB ) and P (G GB ) (Section 2.5.2.1) of observed correlations P XY AB . Following the notationof [203], we will express the distribution using a matrix, rather than a vector. For instance, inthe case where all the variables take values in { , } and using P ( xy ∣ ab ) as an abbreviation for P XY ∣ AB ( xy ∣ ab ) , this is done as P XY ∣ AB = P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) P ( ∣ ) (3.1)and the generalisation to larger alphabets is analogous (see, e.g., [203]). This format is conve-nient because it makes it easy to check whether a distribution is non-signaling, i.e., to checkthat P X ∣ AB is independent of B and that P Y ∣ AB is independent of A . Both the local and non-signaling sets form convex polytopes, that are highly symmetric. In particular, such polytopesare invariant under local relabellings and/or relabelling parties. By local relabellings we meancombinations of relabelling the inputs (e.g., A ↦ A ⊕

1) and outputs conditioned on the localinput (e.g., X ↦ X ⊕ αA ⊕ β where α, β ∈ { , } and ⊕ denotes modulo-2 addition). One mightalso think about more general global relabellings that depend on both inputs (for instance mapsof the form X ↦ X ⊕ αA ⊕ βB ⊕ γ with α, β, γ ∈ { , } ), but these do not preserve the non-signaling set in general so will not be considered here. The only global relabelling we consideris exchange of the two parties, which corresponds to transposing the distribution in the matrixnotation of Equation (3.1) and always preserves non-signaling.More speciﬁcally, the non-signaling set is restricted only by the conditional independences X Æ B ∣ A and Y Æ A ∣ B (no-signaling constraints) , which follow from the d-separation rela-tions in G B (Figure 2.6a), and naturally the positivity and normalisation conditions since theseare valid probability distributions. These restrictions can be written out as linear inequal-ity/equality constraints, and characterise the H representation of the non-signaling polytope.The corresponding V -representation is characterised by the extremal non-signaling distribu-tions. On the other hand, the local set is the set of all observed distributions P XY ∣ AB thatcan arise when Λ is classical corresponds to the set of correlations that admit a local hiddenvariable model, i.e., the set of distributions that can be expressed in the form P XY ∣ AB = ∫ Λ dΛ P Λ P A P B P X ∣ A Λ P Y ∣ B Λ . (3.2)Here, Λ can be any (possibly continuous) random variable. In the rest of the thesis, we will referto such correlations either as local or as classical and denote the set of all such distributions L . We also use L ( i A ,i B ,o A ,o B ) to denote the local set in the ( i A , i B , o A , o B ) scenario.The local set forms a convex polytope, which can be speciﬁed in terms of a ﬁnite set of Bellinequalities, each a necessary condition for classicality of the correlation. Along with the pos-itivity and normalisation constraints, these deﬁne the H -representation of L ( i A ,i B ,o A ,o B ) . The V -representation of the local polytope is deﬁned by the set of all local deterministic distributions i.e., distributions P XY ∣ AB that are deterministic over X and Y for every value of A and B . There The independence A Æ B (measurement independence) also follow from the graph, but this is not relevantwhen we are considering the conditional distributions P XY ∣ AB . HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES are o i A A ⋅ o i B B distinct local deterministic distributions in a ( i A , i B , o A , o B ) scenario. In the matrixnotation (3.1), each block corresponds to a ﬁxed value for A and B . The local deterministicdistributions expressed in this notation have exactly one non-zero entry ( =

1) in each blockin accordance with the no-signaling condition required by the causal structure. The set of allsuch distributions can be easily enumerated. Hence, the V -representation of the local polytope(local deterministic distributions) and the H -representation of the non-signaling polytope (no-signaling, positivity and normalisation constraints) can be easily found for a given scenario.The H -representation of the local polytope (Bell inequalities) and the V -representation of thenon-signaling polytope (extremal non-signaling distributions) can in principle be found throughfacet and vertex enumeration respectively. However, this becomes computationally intractablefor larger cardinalities of the observed variables. In the following, we describe the structure ofthese sets in more detail for the ( , , , ) and ( , , , ) bipartite Bell scenarios, which are ofparticular interest in this thesis. In the ( , , , ) scenario, there are eight extremal Bell inequalities (facets of the local polytope),which are equivalent under local relabellings to the following inequality [57, 203] I CHSH ∶= P ( X = Y ∣ A = , B = ) + P ( X = Y ∣ A = , B = ) + P ( X = Y ∣ A = , B = )+ P ( X ≠ Y ∣ A = , B = ) ≤ I k CHSH for k ∈ [ ] , where I ∶= I CHSH and [ n ] stands for the set { , , ...., n } where n is a positive integer. This provides the facet description of the ( , , , ) local polytope.One can also express I CHSH in matrix form using M CHSH = , (3.4)so that the Bell inequality can be written tr ( M T CHSH P ) ≤

3, where P is the distrubution ex-pressed in the matrix form (3.1) and T denotes the transpose.In the vertex picture, the ( , , , ) local polytope has 16 local deterministic vertices and the ( , , , ) non-signaling polytope shares the vertices of the local polytope and has eight more:the Popescu-Rohrlich (PR) box and seven distinct local relabellings [203, 171]. The PR boxdistribution (discussed in Section 2.4) satisﬁes X ⊕ Y = A.B and takes the following form inthe current notation P PR =

12 12 . (3.5) HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES

We denote the eight extremal non-signaling vertices equivalent under local relabellings to P PR by P k PR , k ∈ [ ] where P ∶= P PR . Note that the 8 CHSH inequalities { I k CHSH } are in one-to-onecorrespondence with these 8 extremal non-signaling points i.e., each P k PR violates exactly oneCHSH inequality and each CHSH inequality is violated by exactly one P k PR . In the ( , , , ) Bell scenario, there are two classes of Bell inequalities that completely char-acterize the local polytope: the CHSH inequalities and the I inequalities [122, 64]. Arepresentative example of the latter is I ∶= [ P ( X = Y ∣ A = , B = ) + P ( X = Y − ∣ A = , B = ) + P ( X = Y ∣ A = , B = )+ P ( X = Y ∣ A = , B = )] − [ P ( X = Y − ∣ X = , B = ) + P ( X = Y ∣ A = , B = )+ P ( X = Y − ∣ A = , B = ) + P ( X = Y + ∣ A = , B = )] ≤ , (3.6)where all the random variables take values in { , , } and all additions and subtractions ofthe random variables are modulo 3. CHSH-type inequalities for the ( , , , ) scenario canbe obtained from those of the ( , , , ) scenario (Equation (3.4)) through a procedure knownas “lifting”. The inequalities of the larger scenario are known as the lifted CHSH inequalities [168]. Evaluating the value of a lifted CHSH inequality attained by a given distribution inthe ( , , , ) scenario is equivalent to ﬁrst coarse-graining the distribution by combining twooutcomes into a single outcome, and then evaluating the corresponding CHSH inequality of the ( , , , ) scenario. For instance, the lifted CHSH inequality corresponding to M CHSH (3.4) andthe coarse-graining of always mapping outcomes 1 and 2 to 1 is given in matrix form in thefollowing equation, along with a representative inequality of the I type. M ( , , , ) CHSH = M I = − − − − − − − − − − − − . (3.7)The ( , , , ) local polytope has a total of 1116 facets, 36 of which correspond to positivityconstraints, 648 to CHSH facets (these are equivalent to ﬁrst coarse-graining two of the outputsinto one (for each party and each input) and then applying one of the eight ( , , , ) CHSHinequalities), and the remaining 432 are I -type [63] (we label these I i for i ∈ { , , . . . , } with I = I ).The facets of the non-signaling polytope correspond to positivity constraints, since the no-signaling constraints being equalities, only reduce the dimension of the polytope. Convertingthis facet description to the vertex description (e.g., using the Porta software [55]) one can HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES obtain all the vertices of the ( , , , ) non-signaling polytope. This comprises 81 local deter-ministic vertices, 648 PR-box type vertices and 432 extremal non-signaling vertices (for eachof the I inequalities there is one of the latter that gives maximal violation). We call thesenew vertices the I -vertices . The speciﬁc vertex that maximally violates (3.6) is P NL ∶=

00 0

13 13 I vertices of the ( , , , ) non-signaling polytope are related to each other throughlocal relabellings . The I inequalities are a special case of the CGLMP inequalities whichcorrespond to facets of the local polytope L ( , ,d,d ) ( d ≥

3) [64, 143]. For the d > In the bipartite Bell causal structure (Figure 2.6a), that we discussed above, the set of alljoint conditional distributions P XY ∣ AB over the observed nodes X , Y , A , B that can arise whenΛ is classical is relatively well understood. For ﬁxed input and output sizes, it forms a convexpolytope and hence membership can be checked using a linear program (although the size of thelinear program scales exponentially with the number of inputs and the problem is NP-complete[170]). Because of this, the complete set of Bell inequalities characterizing these polytopes areunknown for ∣ X ∣ , ∣ Y ∣ > ∣ A ∣ , ∣ B ∣ > P (G C ) of a causal structure G [149], and involves constructing a new causal structure, known as the inﬂation of the original In general, equivalent points of the non-signaling polytope may be related by local relabellings or exchangeof the two parties (which is a global operation). In the ( , , , ) scenario there are 2 × ( × ) = I type (those which maximally violate a I inequality) can be generated using only local relabellingsof P NL , and similarly all 648 extremal points of the CHSH type can be generated through local relabellings of P PR embedded in the ( , , , ) scenario (by adding zero probabilities to the third outcome). HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES causal structure such that Bell inequalities for the inﬂated causal structure also hold for theoriginal causal structure. However, the method does not tell us how to construct a suitableinﬂation of the causal structure in order to achieve this, or how large this inﬂation needs to be.Thus, in general, using the inﬂation technique becomes intractable in practice.One approach to overcoming the diﬃculties of non-convex correlation sets is to analyse theproblem in entropy space [226]. This has proven to be useful in a number of cases (see, e.g., [89,44], or [216] for a detailed review), since the problem is convex in entropy space and the entropicinequalities characterising the relevant sets are independent of the number of measurementoutcomes. These advantages have motivated the entropic analysis of causal structures andthe entropy vector method is employed for this purpose. In causal structures with observedparentless nodes (such as the Bell scenario), this method can be supplemented with the post-selection technique [34] that can allow for the detection of non-classicality that was previouslyundetectable in entropy space. In the following, we outline the basics of the entropy vectormethod, with and without post-selection. For a more thorough overview of entropic techniquesfor analysing causal structures, we refer the reader to [216].

The results of this thesis are related to certifying the gap between the classical and quantumsets of correlations in entropy space. For this, one must ﬁnd necessary conditions satisﬁed bythe set of classical entropies over the observed nodes, and look for quantum violations of theseconditions. Hence the entropy vector method in this case only involves classical entropies. Inthe following, we describe the entropy vector method for the Shannon entropy H () , but thesame concepts will be later applied to other measures such as the Tsallis or Rényi entropies. Deﬁnition 3.2.1 (Entropy vector) . Given a joint distribution P X ,...,X n ∈ P n over n randomvariables X , X , . . . , X n , the entropy vector of P X ,...,X n is deﬁned as a vector with 2 n − { X , X , . . . , X n } (excluding the empty set), i.e., the entropy vector can be expressed as ( H ( X ) , ..., H ( X n ) , H ( X X ) , H ( X X ) ..., H ( X X ...X n )) . The entropy vector of a distribution P ∈ P n is denoted as H ( P ) ∈ R n − . Deﬁnition 3.2.2 (Entropy cone) . Let Γ ∗ n be set of all vectors in R n − that are entropy vectorsof a probability distribution P ∈ P n , i.e., Γ ∗ n = { v ∈ R n − ∶ ∃ P X ,...,X n s.t. v = H ( P X ,...,X n )} .Its closure Γ ∗ n is known as the entropy cone and it includes vectors v for which there exists asequence P k ∈ P n such that H ( P k ) tends to v as k → ∞ .Note that the conditional entropies and mutual informations can be encoded in the entropyvector description through the chain rule (2.20) which gives the relations H ( X ∣ Y ) = H ( XY ) − H ( Y ) , I ( X ∶ Y ) = H ( X ) + H ( Y ) − H ( XY ) and I ( X ∶ Y ∣ Z ) = H ( XZ ) + H ( Y Z ) − H ( XY Z ) − H ( Z ) . The entropy cone Γ ∗ n is known to be a convex set for any n [228], but it is diﬃcult tocharacterise and various approximations to it have been considered [216]. HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES

An outer approximation to the entropy cone, known as the

Shannon cone is obtained by not-ing that the information-theoretic properties of entropies (discussed in Section 2.3.1) implyconstraints on valid entropy vectors. These include non-negativity of the entropies, monotonic-ity (2.22) (i.e., H ( R ) ≤ H ( RS ) ) and submodularity (2.24) (also known as strong-subadditivity; H ( RT ) + H ( ST ) ≥ H ( RST ) + H ( T ) ). As noted in Section 2.3.1, in the Shannon case, mono-tonicity and submodularity are equivalent to the non-negativity of the conditional entropy H ( S ∣ R ) and the conditional mutual information I ( R ∶ S ∣ T ) respectively and hold for any threedisjoint subsets R , S and T of { X , . . . , X n } . Deﬁnition 3.2.3 (Shannon constraints and the Shannon cone) . The set of linear constraintscomprising of non-negativity, monotonicity and submodularity are together known as the

Shan-non constraints and the set of vectors u ∈ R n − obeying all the Shannon constraints form theconvex cone known as the Shannon cone , Γ n .Following standard practice, we will include non-negativity implicitly, such that every entropyvector v belongs to the space R n − ≥ . Excluding non-negativity, there are a total of n + n ( n − ) n − independent Shannon constraints for n variables [226]. By deﬁnition, the Shannon cone is anouter approximation to Γ ∗ n i.e., Γ ∗ n ⊆ Γ n . Hence all entropy vectors derived from a probabilitydistribution P ∈ P n obey the Shannon constraints but not all vectors u ∈ R n − obeying theShannon constraints are such that H ( P ) = u for some joint distribution P ∈ P n . Entropic causal constraints:

The constraints on the entropy vectors mentioned so far areindependent of the causal structure. We know from Section 2.5.1 that a causal structure imposesfurther constraints on the entropies of the random variables associated with its nodes. In thecase of Shannon entropies and a classical causal structure G C over n nodes, all the constraintsimplied by the causal structure on the entropies can be derived from the following n constraints(of Equation (2.49), reproduced here for convenience), one for each node X i I ( X i ∶ X ↑̸ i ∣ X ↓ i ) = v ∈ Γ ∗ n that also satisfy the n entropic causalconstraints (3.9) for a classical causal structure G C are denoted as Γ ∗ n (G C ) . Similarly, the setof all entropy vectors in the outer approximation, v ∈ Γ n that satisfy the causal constraints for G C are denoted as Γ n (G C ) . Then Γ ∗ n ⊆ Γ n by construction. Further, it can be shown (Lemma9 of [216]) that Γ ∗ n (G C ) is indeed the set of all achievable entropy vectors in G C i.e.,Γ ∗ n (G ∗ ) = { v ∈ R n − ≥ ∣∃ P = Π ni = P X i ∣ X ↓ i ∈ P n s.t. H ( P ) = v } , (3.10)and that its closure Γ ∗ n (G C ) is convex. Note that it need not be ﬁnitely generated. However,its outer approximation Γ n (G C ) is by construction deﬁned through a ﬁnite set of inequalitiesi.e., the n + n ( n − ) n − Shannon constraints and the n causal constrains (3.9).Given a causal structure G , we wish to characterise the set of entropy vectors over the observednodes that can arise in the classical causal structure G C . Say the causal structure G has n nodes For n ≤

3, the cones coincide, but for n ≥ HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES

A B X Y Λ (a) X X Y Y Λ (b) Figure 3.1: The bipartite Bell causal structure and its post-selected version:(a)

The bipartite Bell causal structure. The nodes A and B represent the randomvariables corresponding to independently chosen inputs, while X and Y represent therandom variables corresponding to the outputs. Λ is an unobserved node representingthe common cause of X and Y . (b) The post-selected bipartite Bell causal structure forthe case of binary inputs. The observed nodes X a represent the outputs when the inputis a ∈ { , } and likewise for Y b . Note that X and X are never simultaneously observedand likewise Y and Y . { X , ..., X n } of which, without loss of generality, the ﬁrst k are observed and the remaining n − k are unobserved. Then the marginalisation is performed by projecting the entropy conein R n − (over all the nodes) to its marginal entropy cone in R k − (over the observed nodesonly). The ﬁnitely generated outer approximation, Γ n (G C ) is often considered for this purposesince the cone Γ ∗ n (G C ) is diﬃcult to characterise. The projection of the cone Γ n (G C ) intothe subspace of the k observed nodes deﬁnes the marginal cone Γ k (G C ) , and can be obtainedthrough Fourier-Motzkin Elimination [222] (Section 2.2.2). This provides a H -representation ofthe marginal cone Γ k (G C ) , and every marginal entropy vector over the observed nodes arisingfrom a compatible distribution in the classical causal structure G C necessarily belongs to thiscone. Since non-classical causal structures do not satisfy the initial assumption of the existenceof the joint distribution/entropies, they may give rise to correlations that do not satisfy themarginal constraints on the observed nodes obtained through this procedure. A violation ofany of the inequalities deﬁning Γ k (G C ) by the entropy vector v = H ( P Q ) of a distribution P Q compatible with the quantum causal structure G Q , certiﬁes the non-classicality of P Q . In causal structures with one or more observed parentless nodes, supplementing the entropy vec-tor method with an additional technique can allow for tighter characterisations of the marginalscenario. This technique involves deﬁning, for each given causal structure with at least oneparentless node, a new causal structure called the post-selected causal structure. The generaltechnique for doing this can be found in [216] (for example). Here, we will only consider thebipartite Bell causal structure (Figure 2.6a, and reproduced in Figure 3.1a for convenience)with two inputs per party and the post-selected version thereof.The post-selected causal structure is obtained by removing the parentless observed nodes A HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES and B in the original causal structure 3.1a and replacing the descendants X and Y with twocopies of each i.e., X A = , X A = , Y B = , Y B = such that the original causal relations are preservedand there is no mixing between the copies (this is shown in Figure 3.1b). It makes sense to dothis in the classical case because classical information can be copied, so we can simultaneouslyconsider the outcome X given A = A =

1. By contrast, in the quantum casethe values of A correspond to diﬀerent measurements that are used to generate X , and theassociated variables X A = and X A = may not co-exist. It hence does not make sense to considera joint distribution over X A = and X A = in this case. We therefore only consider the subsets ofthe observed variables that co-exist S ∶= { X , X , Y , Y , X Y , X Y , X Y , X Y } , (3.11)where we use the short form X Y for the set { X , Y } etc. Any non-trivial inequalities derivedfor the co-existing sets in the classical case can admit quantum or GPT violations. In [34], Braunstein and Caves derived a set of constraints on the post-selected causal structureof Figure 3.1b and showed that these constraints can be violated by quantum correlations. Infact, it is now known that in the absence of post-selection, the sets of entropies over the observednodes in the classical and quantum causal structures, G CB and G QB coincide and therefore, thenon-classicality of G B cannot be detected entropically [215]. To discuss post-selected entropicinequalities, we ﬁrst introduce the notion of entropic classicality . For every distribution P XY ∣ AB in the Bell causal structure (Figure 3.1a), we can associate an entropy vector v ∈ R in the post-selected causal structure (Figure 3.1b) whose components are the entropies of each element ofthe set S (Equation (3.11)) distributed according to P X a Y b ∶= P XY ∣ A = a,B = b . Let H be the mapthat takes the observed distribution to its corresponding entropy vector in the post-selectedcausal structure. Deﬁnition 3.2.4 (Entropic classicality) . An entropy vector v ∈ R is classical with respect tothe bipartite Bell causal structure (Figure 3.1a) if there exists a classical distribution P XY ∣ AB ∈ L such that H ( P XY ∣ AB ) = v , where L is the local or classical polytope (Equation (3.2)).Further, a distribution P XY ∣ AB is entropically classical if there exists a classical distribution withthe same entropy vector, i.e., if there exists a classical entropy vector v such that H ( P XY ∣ AB ) = v .The set of all classical entropy vectors forms a convex cone. We now review how the Braun-stein Caves (BC) Inequalities are derived for the case when the observed parentless nodes A and B are binary. In this case, the post-selected causal structure 3.1b imposes no additionalconstraints on the distribution (or entropies) of the observed nodes X , X , Y and Y becausethey share a common parent and thus any joint distribution over X , X , Y and Y can berealised in the causal structure 3.1b. By contrast, any correlations in the original causal struc-ture 3.1a must obey the no-signaling constraints over the observed nodes A , B , X and Y since A does not inﬂuence Y and B does not inﬂuence X in this causal structure. The inequalitiesderived by Braunstein and Caves follow by applying monotonicity (2.22), strong subadditivity HAPTER 3. OVERVIEW OF TECHNIQUES FOR ANALYSING CAUSAL STRUCTURES in unconditional form (2.24) and the chain rule (2.20) to the entropies of the observed variables { X , X , Y , Y } in the post-selected causal structure. The derived relations hold for the classi-cal causal structure (and not necessarily for the quantum and GPT cases) because only in theclassical case does it make sense to consider a joint distribution over these four variables thatin the quantum and GPT cases do not co-exist (c.f. Section 3.2.3). The BC inequalities areentropic Bell inequalities i.e., they hold for every classical entropy vector in the post-selectedcausal structure 3.1b. There are four BC inequalities I ∶= H ( X Y ) + H ( X ) + H ( Y ) − H ( X Y ) − H ( X Y ) − H ( X Y ) ≤ I ∶= H ( X Y ) + H ( X ) + H ( Y ) − H ( X Y ) − H ( X Y ) − H ( X Y ) ≤ I ∶= H ( X Y ) + H ( X ) + H ( Y ) − H ( X Y ) − H ( X Y ) − H ( X Y ) ≤ I ∶= H ( X Y ) + H ( X ) + H ( Y ) − H ( X Y ) − H ( X Y ) − H ( X Y ) ≤ Lemma 3.2.1.

A distribution in the post-selected Bell scenario with binary A and B is en-tropically classical if and only if it satisﬁes the four BC inequalities (3.12) . Note that the BC inequalities (3.12) and this completeness result are independent of the cardi-nality of X and Y . However, a crucial point is that entropic classicality does not imply classi-cality of the associated distribution. For example, the entropy vector H ( P P R ) associated withthe maximally non-classical distribution P P R (Equation (3.5)) satisﬁes all the BC inequalitiesis hence entropically classical. Since entropic inequalities are non-linear in the probabilities, itis nevertheless possible to have a convex combination of two entropically classical distributionsthat is entropically non-classical. This allows for the non-classicality of a distribution such as P P R to be “activated” in entropy space by mixing with a classical distribution. Hence a naturalquestion that arises is whether post-selected entropic inequalities can identify all non-classicaldistributions in this manner, after a suitable and reasonable post-processing operation (such asmixing with a classical distribution). This is known to be the case for ( d, d, , ) Bell scenarioswith d ≥ d outcome Bell scenarios or ( , , d, d ) scenarios with d ≥

3, and our results suggest that the answer is negative in this casefor both Shannon and Tsallis entropic inequalities. The Triangle causal structure (Figure 2.6b)on the other hand cannot be analysed using post-selection since it has no observed parentlessnodes. We analyse this causal structure using Tsallis entropies, in the absence of post-selectionin Chapter 4 (based on [207]). We leave further details of the entropic technique and of theseresults to Chapters 4 and 5. HAPTER Entropic analysis of causal structures withoutpost-selection A s discussed in Section 3.2, the main motivation for the entropic technique for analysingcausal structures stems from their ability to overcome certain problems that arise in proba-bility space, such as the non-convexity of correlation sets and the drastic scaling of the numberof Bell inequalities in the cardinality of the variables. The entropic method for analysing causalstructures has proven to be useful in a number of cases (see, e.g., [90, 44], or [216] for a de-tailed review). However, it was shown in [215] that the entropy vector method with Shannonentropies cannot detect the classical-quantum gap for line-like causal structures, in the absenceof post-selection. This includes causal structures such as the Bell scenario which are well knownto support non-classical correlations. Although new Shannon entropic inequalities have beenderived using this method (e.g., in the Triangle causal structure), no quantum violation ofthese have been found even when non-classical correlations are known to exist in these causalstructures [217, 47]. Due to these limitations of Shannon entropies, it is natural to ask whetherother entropic quantities could do better.This chapter is predominantly based on work carried out with Roger Colbeck [207], while Ap-pendix 4.5.3 reports observations from a side-project with Mirjam Weilenmann. The mainquestion we address here is whether generalised entropic inequalities can overcome these limi-tations of Shannon entropies in the absence of post-selection. In particular we employ Tsallisentropies and analyse the Bell 2.6a and Triangle causal structures 2.6b. A motivation for consid-ering Tsallis entropies for the task is that they are a family with an additional (real) parameter.The set of entropies for all possible values of this parameter conveys more information about theunderlying probability distribution than a single member of the family and hence the ability tovary a parameter may give advantages for analysing causal structures. Tsallis entropies appearto be a good candidate since they satisfy monotonicity, strong subadditivity (in both condi-tional and unconditional forms) and the chain rule which are desirable properties for their use68 HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION in the entropy vector method. Tsallis entropies have been considered in the context of causalstructures before [214] where they were shown to give an advantage over Shannon entropy indetecting the non-classicality of certain states in the post-selected Bell scenario 3.1b . In thischapter we analyse causal structures in the absence of post-selection and consider post-selectionin Chapter 5. Note however that the post-selection technique (c.f. Section 3.2.3) can only beapplied to causal structures with at least one observed parentless node i.e., it can be appliedto the Bell, but not to the Triangle causal structure. The entropy vector method requires as input the constraints implied by a causal structure onthe entropies of its nodes. Finding that these constraints in for Tsallis entropies cannot takethe same form as the Shannon ones (Equation (3.9)), we derive the constraints on the classicalTsallis entropies that are implied by a given causal structure in Section 4.2. As an additionalresult, we generalise these entropic causal constraints to quantum Tsallis entropies for certaincases, in Appendix 4.5.1. In Section 4.2.2, we use these causal constraints in the entropyvector method with Tsallis entropies but ﬁnd that the computational procedure becomes tootime consuming even for simple causal structures such as the bipartite Bell scenario. Despitethis limitation, we derive new Tsallis entropic inequalities for the Triangle causal structurein Section 4.3, using known Shannon entropic inequalities of [47] and our Tsallis constraintsof Section 4.2. In Section 4.4, we discuss the reasons for the computational diﬃculty of thismethod, the drawbacks of using Tsallis entropies for analysing causal structure and identifypotential future directions stemming from our work. In Appendix 4.5.2, we provide details ofthe Mathematica package

LPAssumptions [62] that we developed and used for obtaining someof the computational results of this, and the next chapter. Additionally, in Appendix 4.5.3,we summarise the results obtained together with Mirjam Weilenmann where we also foundlimitations of using Rényi entropies for certifying non-classicality in causal structures. Belowis a poetic summary of our paper [207] on which the current chapter is based, following whichwe present the results (in a more rigorous, non-poetic fashion).

About underlying physics we want to tell,Using observed correlations, as did BellFor a scenario that was blessed,In probability space, with convex correlation sets.Generally though,That may not be so.So we move to entropy space, Other examples of more general entropy measures such as the Rényi entropy [180] do not satisfy one ormore of these properties, making it more diﬃcult to get entropic constraints on them using the entropy vectormethod (c.f. Table 2.1). Note that non-classicality cannot be detected Shannon entropically in the Bell causal structure (Figure 2.6a)without post-selection [215]. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

Putting convexity back in place,But we can not be too smug,For the Shannon cones aren’t too snug.So here we try our luck with Tsallis entropies,Hoping they give us tighter inequalities.On the way we ﬁnd some constraints that are new,That follow from classical causal structures, and some quantum ones tooThen we use them in the method of entropy vectors.Alas, that seems too much for our little computers.But we have a cheeky way to put our constraints to use,Using known Shannon results, new Tsallis inequalities we produce.Though we found nice math results along the way,Our Tsallis inequalities also seem hard to violate,Computationally costly to ﬁnd tighter inequalities,A feature or a bug of the entropic technique?

In Section 2.3.2, we discussed some of the general properties of Tsallis entropy that hold ir-respective of the underlying causal structure over the variables. These include non-negativity,monotonicity (2.22), and strong subadditivity (2.24) which characterise the set of

Shannonconstraints (c.f. Deﬁnition 3.2.3). We summarise these properties here for convenience. Forany joint distribution over the random variables involved the following properties hold.1.

Pseudo-additivity [68]:

For two independent random variables X and Y i.e., P XY = P X P Y , and for all q , the Tsallis entropies satisfy S q ( XY ) = S q ( X ) + S q ( Y ) + ( − q ) S q ( X ) S q ( Y ) . (4.1)Note that in the Shannon case ( q = Upper bound [94]:

For q ≥ S q ( X ) ≤ ln q d X . For q > P X ( x ) = / d X for all x (i.e., if the distribution on X is uniform).3. Monotonicity [71]:

For all q , S q ( X ) ≤ S q ( XY ) . (4.2)4. Strong subadditivity [93]:

For q ≥ S q ( XY Z ) + S q ( Z ) ≤ S q ( XZ ) + S q ( Y Z ) . (4.3) HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION Chain rule [93]:

For all q , S q ( X , X , . . . , X n ∣ Y ) = n ∑ i = S q ( X i ∣ X i − , . . . , X , Y ) . (4.4)The chain rules S q ( XY ) = S q ( X ) + S q ( Y ∣ X ) and S q ( XY ∣ Z ) = S q ( X ∣ Z ) + S q ( Y ∣ XZ ) emerge asparticular cases and allow the Tsallis mutual informations (Equation (2.31)) to be written asfollows. Note that, due to the chain rules, the conditional version of the properties are alsosatisﬁed. I q ( X ∶ Y ) = S q ( X ) + S q ( Y ) − S q ( XY ) ,I q ( X ∶ Y ∣ Z ) = S q ( XZ ) + S q ( Y Z ) − S q ( Z ) − S q ( XY Z ) . (4.5)The causal structure imposes the causal Markov constraints on the joint probability distributionas well as entropic causal constraints over the variables involved (Section 2.5.1). We haveseen in Section 3.2 that the entropy vector method requires as an input, the complete set ofentropic causal constraints that encode all the conditional independences of the causal structure.In the case of Shannon entropies, for an n node causal structure, the n Shannon entropiccausal constraints of Equation (3.9) imply all the conditional independences that follow fromd-separation for that causal structure. To analyse causal structures using Tsallis entropies, werequire analogous Tsallis entropic causal constraints.A ﬁrst observation is that Tsallis entropy vectors do not in general satisfy the causal constraints(Equation (3.9)) satisﬁed by their Shannon counterparts. For a concrete counterexample, con-sider the simple, three variable causal structure where Z is a common cause of X and Y , andwhere there are no other causal relations. In terms of Shannon entropies, the only causal con-straint in this case is I ( X ∶ Y ∣ Z ) =

0. Taking

X, Y and Z to be binary variables with possiblevalues 0 and 1, the distribution P ( xyz ) = / ∀ x ∈ X, y ∈ Y if z = P ( xyz ) = P ( xy ∣ z ) = P ( x ∣ z ) P ( y ∣ z ) ∀ x ∈ X, y ∈ Y and z ∈ Z but has a q = I ( X ∶ Y ∣ Z ) = . Hence when using Tsallis entropies (and conditionalTsallis entropy as deﬁned in Section 2.3.2), the causal constraint cannot be simply encoded by I q ( X ∶ Y ∣ Z ) = q > d X to denote the cardinality/alphabetsize ∣ X ∣ of a random variable X in order the make the notation consistent with the quantumcase (Section 4.5.1) where d X will denote the dimension of the Hilbert space associated with thesubsystem X . Then for classical states, d X coincides with the cardinality of the correspondingrandom variable, which justiﬁes this notation. Theorem 4.2.1.

If a joint probability distribution P XY over random variables X and Y withalphabet sizes d X and d Y factorises as P XY = P X P Y , then for all q ∈ [ , ∞) , the Tsallis mutual Using the chain rule, the monotonicity and strong subadditivity relations (Equations (4.2) and (4.3)) areequivalent to the non-negativity of the unconditional and conditional Tsallis mutual informations. For q < q ≥ HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION information I q ( X ∶ Y ) is upper bounded by I q ( X ∶ Y ) ≤ f ( q, d X , d Y ) , where the function f ( q, d X , d Y ) is given by f ( q, d X , d Y ) = ( q − ) ( − d q − X ) ( − d q − Y ) = ( q − ) ln q d X ln q d Y . For q ∈ ( , ∞) ∖ { } , the bound is saturated if and only if P XY is the uniform distribution over X and Y .Proof. The proof follows from the pseudo-additivity of Tsallis entropies (Property 1) and theupper bound (Property 2). Using these, for all q ≥ P XY = P X P Y , we have I q ( X ∶ Y ) = S q ( X )+ S q ( Y )− S q ( XY ) = ( q − ) S q ( X ) S q ( Y ) ≤ ( − d q − X ) ( − d q − Y ) q − = f ( q, d X , d Y ) . (4.6)Whenever q ∈ ( , ∞) ∖ { } , the bound is saturated if and only if P XY is uniform over X and Y since, for these values of q , S q ( X ) and S q ( Y ) both attain their maximum values if and only ifthis is the case. Theorem 4.2.2.

If a joint probability distribution P XY Z satisﬁes the conditional independence P XY ∣ Z = P X ∣ Z P Y ∣ Z , then for all q ≥ the Tsallis conditional mutual information I q ( X ∶ Y ∣ Z ) isupper bounded by I q ( X ∶ Y ∣ Z ) ≤ f ( q, d X , d Y ) . For q > , the bound is saturated only by distributions in which for some ﬁxed value k the jointprobabilities are given by P ( xyz ) = ⎧⎪⎪⎨⎪⎪⎩ d X d Y if z = k otherwise for all x , y and z .Proof. Writing out I q ( X ∶ Y ∣ Z ) in terms of probabilities we have I q ( X ∶ Y ∣ Z ) = q − [ ∑ xyz P q ( xyz ) + ∑ z P q ( z ) − ∑ xz P q ( xz ) − ∑ yz P q ( yz )]= q − ∑ z P q ( z )[ ∑ xy P q ( xy ∣ z ) + − ∑ x P q ( x ∣ z ) − ∑ y P q ( y ∣ z )]= ∑ z P q ( z ) I q ( X ∶ Y ) P XY ∣ Z = z . Using this and Theorem 4.2.1, we can bound I q ( X ∶ Y ∣ Z ) asmax P XY Z = P Z P X ∣ Z P Y ∣ Z I q ( X ∶ Y ∣ Z ) = max P XY Z = P Z P X ∣ Z P Y ∣ Z ∑ z P ( z ) q I q ( X ∶ Y ) P XY ∣ Z = z ≤ max P Z ∑ z P ( z ) q max P X ∣ Z = z P Y ∣ Z = z I q ( X ∶ Y ) P XY ∣ Z = z = max P Z ∑ z P ( z ) q f ( q, d X , d Y ) = f ( q, d X , d Y ) . These distributions have deterministic Z and there is one such distribution for each value that Z can take. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

The last step holds because for all q > ∑ z P ( z ) q is maximized by deterministic distributionsover Z with a maximum value of 1 i.e., only distributions P XY Z that are deterministic over Z saturate the upper bound of f ( q, d X , d Y ) . This completes the proof.Two corollaries of Theorem 4.2.2 naturally follow. Corollary 4.2.1.

Let X , Y and Z be random variables with ﬁxed alphabet sizes. Then for all q ≥ we have max P XY Z P XY ∣ Z = P X ∣ Z P Y ∣ Z I q ( X ∶ Y ∣ Z ) = max P XY P XY = P X P Y I q ( X ∶ Y ) . Furthermore, for q > , the maximum on the left hand side is achieved only by distributions inwhich for some ﬁxed value k the joint probabilities are given by P ( xyz ) = ⎧⎪⎪⎨⎪⎪⎩ d X d Y if z = k otherwise ,while the maximum on the right hand side occurs if and only if P XY is the uniform distribution. The signiﬁcance of these new relations for causal structures is then given by the followingcorollary.

Corollary 4.2.2.

Let P X ...X n be a distribution compatible with the classical causal structure G C and X , Y and Z be disjoint subsets of { X , . . . , X n } such that X and Y are d-separatedgiven Z . Then for all q ≥ we have I q ( X ∶ Y ∣ Z ) ≤ f ( q, d X , d Y ) , where d X is the product of d X i for all X i ∈ X , and likewise for d Y . Remark 4.2.1.

The results of this section can be generalised to the quantum case under certainassumptions i.e., as constraints on quantum Tsallis entropies implied by certain quantum causalstructures (see Appendix 4.5.1 for details). Note that only constraints on the classical Tsallisentropy vectors derived in this section are required to detect the classical-quantum gap. Hence,Appendix 4.5.1 is not pertinent to the main results of this chapter but can be seen as additionalresults regarding the properties of quantum Tsallis entropies.

We saw previously that in the Shannon case ( q = n conditions (3.9) of the form I ( X i ∶ X ↑̸ i ∣ X ↓ i ) = i = , . . . , n ) imply all the independence relations that follow from the causal struc-ture. In the Tsallis case however, the n conditions of the form I q ( X i ∶ X ↑̸ i ∣ X ↓ i ) ≤ f ( q, d X i , d X ↑̸ i ) (Corollary 4.2.2) do not do the same. In the bipartite Bell and Triangle causal structures, inthe case where the dimension (cardinality) of each individual node is taken to be d , we ﬁndthat there are 53 and 126 distinct Tsallis entropic inequalities that are implied by the respectivecausal structures. These are in one-to-one correspondence with the d-separation relations inthe corresponding DAGs, and there are no redundancies in these constraints i.e., all of them are HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION required for characterising the conditional independences in the DAGs. In more detail, we usedlinear programming to show that each implication of d-separation yields a non-trivial entropiccausal constraint for all q > d > I ( A ∶ BC ) = I ( A ∶ B ) = I ( A ∶ C ) =

0, whereas the analogous implication does not hold in the Tsalliscase in general: although I q ( A ∶ BC ) ≤ f ( q, d A , d BC ) implies I q ( A ∶ B ) ≤ f ( q, d A , d BC ) , it is notthe case that I q ( A ∶ BC ) ≤ f ( q, d A , d BC ) implies I q ( A ∶ B ) ≤ f ( q, d A , d B ) . The number of distinct conditional independences (and hence the number of independent Tsallisconstraints that follow from d-separation) in a DAG depends on the speciﬁc graph, however forany DAG G n with n nodes, the number of such constraints can be upper bounded by that ofthe n -node DAG where all n nodes are independent i.e., the n node DAG with no edges. Thenumber of conditions in this DAG can be thought of as the number of ways of partitioning n objects into four disjoint subsets such that the ﬁrst two are non-empty and where the orderingof the ﬁrst two does not matter. Therefore, there are at most ( n − × n + n ) such conditions. We used the causal constraints of Corollary 4.2.2 in the entropy vector method (Section 3.2)with the aim of deriving new quantum-violatable entropic inequalities for the Triangle causalstructure (Figure 2.6b). To do so, we started with the variables

A, B, C, X, Y, Z of the Tri-angle causal structure, the Shannon constraints and causal constraints satisﬁed by the Tsallisentropy vectors over these variables (Corollary 4.2.2) and used a Fourier-Motzkin (FM) elimi-nation (Section 2.2.2) algorithm (from the porta software [55]) to eliminate the Tsallis entropycomponents involving the unobserved variables

A, B, C and obtain the constraints on the ob-served nodes

X, Y, Z . The Tsallis entropy vector for the six nodes has 2 − =

63 components.The required marginal scenario with the observed nodes

X, Y, Z has Tsallis entropy vectors with2 − = For an explicit counterexample, consider P ABC = { , , , , , , , } over binary A , B and C forwhich I ( A ∶ BC ) = / < / = f ( , , ) but I ( A ∶ B ) = / > / = f ( , , ) . The four subsets correspond to the three arguments of the conditional mutual information and a set of‘leftovers’. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION tried starting with a subset comprising 15 of the 126 Tsallis entropic causal constraints i.e., 261constraints on 63 dimensional vectors. We considered the case of q = n inequalities can result in up to n / d successive elimination steps can yield a doubleexponential complexity of 4 ( n / ) d [222]. This rate of increase can be kept under control whenthe resulting set of inequalities has many redundancies. This happens in the Shannon case wherethe causal constraints are simple equalities and the system of 246 Shannon constraints plus 6Shannon entropic causal constraints reduces to a system of just 91 independent inequalitiesbefore the FM elimination. In the Tsallis case, no reduction of the system of inequalitiesis possible in general due to the nature of the causal constraints. The fact that the Tsallisentropic causal constraints are inequality constraints rather than equalities also contributes tothe computational diﬃculty since each independent equality constraint in eﬀect reduces thedimension of the problem by 1.We also tried the same procedure on the bipartite Bell causal structure (Figure 2.6a), again for q = Despite the limitations encountered in applying the entropy vector method to Tsallis entropies(Section 4.2.2), here we ﬁnd new Tsallis entropic inequalities for the Triangle causal structure These included the 6 that follow from “each node N i is conditionally independent of its descendants givenits parents” (denoted as N i ⊥ d N ↑̸ i ∣ N ↓ i ) and 9 more chosen arbitrarily from the total of 126 independent Tsallisconstraints we found for the Triangle. The 6 former constraints for the Triangle (Figure 2.6b) are A ⊥ d CXB , B ⊥ d CY A , C ⊥ d BZA , X ⊥ d Y AZ ∣ CB , Y ⊥ d XBZ ∣ AC and Z ⊥ d Y CX ∣ AB . An example of 9 more constraintsfor which the procedure did not work are X ⊥ d Y ∣ CB , X ⊥ d A ∣ CB , X ⊥ d Z ∣ CB , Y ⊥ d X ∣ AC , Y ⊥ d B ∣ AC , Y ⊥ d Z ∣ AC , Z ⊥ d Y ∣ AC , Z ⊥ d C ∣ AB and Z ⊥ d X ∣ AB . We also tried some other choices and number ofconstraints but this did not lead to any improvement. For example, we were able to obtain I ( A ∶ BY ) ≤ and I ( B ∶ AX ) ≤ , while, in the case of binaryvariables and q =

2, the independences in the DAG together with Theorem 4.2.1 imply I ( A ∶ BY ) ≤ and I ( B ∶ AX ) ≤ , which are the Tsallis entropic equivalents of the two no-signaling constraints. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION for all q ≥ Including all permutations of X , Y and Z , these yield 7 inequalities. − H ( X ) − H ( Y ) − H ( Z ) + H ( XY ) + H ( XZ ) ≥ , (4.7a) − H ( X ) − H ( Y ) − H ( Z ) + H ( XY ) + H ( XZ ) + H ( Y Z ) − H ( XY Z ) ≥ , (4.7b) − H ( X ) − H ( Y ) − H ( Z ) + H ( XY ) + H ( XZ ) + H ( Y Z ) − H ( XY Z ) ≥ . (4.7c)By replacing the Shannon entropy H () with the Tsallis entropy S q () on the left hand sideof these inequalities and minimizing the resultant expression over our outer approximation tothe classical Tsallis entropy cone for the Triangle causal structure, one can obtain valid Tsallisentropic inequalities for this causal structure. More precisely, the outer approximation to theclassical Tsallis entropy cone for the Triangle is characterised by the 6 + ( − ) − = LPAssumptions [62] (Appendix 4.5.2), a Mathematicapackage that we developed for solving linear programs involving unspeciﬁed variables (otherthan those being optimised over), by implementing the simplex method (Section 2.2.3.2). Inour case, we assumed that the dimensions of all the unobserved nodes ( A , B and C ) are equalto d u and those of all the observed nodes ( X , Y and Z ) is d o , and so the unspeciﬁed variablesare q ≥ d u ≥ d o ≥

2. We obtained the following Tsallis entropic inequalities for theTriangle. − S q ( X ) − S q ( Y ) − S q ( Z ) + S q ( XY ) + S q ( XZ ) ≥ B ( q, d o , d u ) , (4.8a) − S q ( X ) − S q ( Y ) − S q ( Z ) + S q ( XY ) + S q ( XZ ) + S q ( Y Z ) − S q ( XY Z )≥ B ( q, d o , d u ) ∶= max ( B ( q, d o , d u ) , B ( q, d o , d u )) , (4.8b) − S q ( X )− S q ( Y )− S q ( Z )+ S q ( XY )+ S q ( XZ )+ S q ( Y Z )− S q ( XY Z ) ≥ B ( q, d o , d u ) , (4.8c)where, B ( q, d o , d u ) = − q − ⎛⎝ − d − qo ⎞⎠⎛⎝ − d − qo − d − qu ⎞⎠ , (4.9a) Note that a tighter entropic characterization was found in [217] based on non-Shannon inequalities, andthat the techniques introduced here could also be applied to these. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION B ( q, d o , d u ) = − q − ⎛⎝ + d − qu + d − qo + d − qo d − qu − d − qu − d − qo ⎞⎠ ,B ( q, d o , d u ) = − q − ⎛⎝ + d − qo d − qu + d − qo + d − qo d − qu − d − qu − d − qo ⎞⎠ , (4.9b) B ( q, d o , d u ) = − q − ⎛⎝ + d − qo d − qu + d − qo + d − qo d − qu − d − qu − d − qo ⎞⎠ . (4.9c)Note that lim q → B = lim q → B = lim q → B = ∀ d u , d o ≥

2, recovering the original inequalitiesfor Shannon entropies (Equations (4.7a)–(4.7c)) as a special case.In [183], an upper bound on the dimensions of classical unobserved systems needed to reproducea set of observed correlations is derived in terms of the dimensions of the observed systems.In the case of the Triangle causal structure with d X = d Y = d Z = d o and d A = d B = d C = d u asconsidered here, the result of [183] implies that all classical correlations P XY Z can be reproducedby using hidden systems of dimension at most d o − d o . Since the dimension of the unobservedsystems is unknown, it makes sense to take the minimum of the derived bounds over all d u between 2 and d o − d o . By taking their derivative, one can verify that for q > B , B , B and B is monotonically decreasing in d o and d u , and hence that theminimum is obtained for d u = d o − d o for any given d o ≥

2. It follows that for all q > d o ≥ B ∗ ( q, d o ) = B ( q, d o , d o − d o )= − q − ⎛⎝ + d − qo − d − qo + d o (− d o + d o ) − q − d o (− d o + d o ) − q − d − qo (− d o + d o ) − q + d − qo (− d o + d o ) − q ⎞⎠ (4.10a) B ∗ ( q, d o ) = B ( q, d o , d o − d o )= − q − ⎛⎝ + d − qo − d − qo + (− d o + d o ) − q − (− d o + d o ) − q + d − qo (− d o + d o ) − q ⎞⎠ ,B ∗ ( q, d o ) = B ( q, d o , d o − d o )= − q − ⎛⎝ + d − qo − d − qo + d − qo (− d o + d o ) − q − (− d o + d o ) − q + d − qo (− d o + d o ) − q ⎞⎠ , (4.10b) B ∗ ( q, d o ) = B ( q, d o , d o − d o )= − q − ⎛⎝ + d − qo − d − qo + d − qo (− d o + d o ) − q − (− d o + d o ) − q + d − qo (− d o + d o ) − q ⎞⎠ . (4.10c) HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

A quantum violation of any of these bounds would imply that no unobserved classical systemsof arbitrary dimension could reproduce those quantum correlations.

Remark 4.3.1.

Because they are monotonically decreasing, the bounds for d u = d o − d o are notas tight as the d u -dependent bounds for general q >

1. Nevertheless, as q →

1, all the bounds B ∗ ( q, d o ) tend to 0, reproducing the known result of [90] for the Shannon case. Remark 4.3.2.

In some cases it may be interesting to show quantum violations of theseinequalities for low values of d u , hence ruling out classical explanations with hidden systems oflow dimensions, while possibly leaving open the case of arbitrary classical explanations. Thiswould be interesting if it could be established that using hidden quantum systems allows formuch lower dimensions than for hidden classical systems, for example. It is known that the Triangle causal structure (Figure 2.6b) admits non-classical correlationssuch as Fritz’s distribution [88]. The idea behind this distribution is to embed the CHSHgame in the Triangle causal structure such that non-locality for the Triangle follows fromthe non-locality of the CHSH game. To do so, C is replaced by the sharing of a maximallyentangled pair of qubits, and A and B are taken to be uniformly random classical bits. Theobserved variables X , Y and Z in Figure 2.6b are taken to be pairs of the form X ∶= ( ˜ X, B ) , Y ∶= ( ˜ Y , A ) and Z ∶= ( A, B ) , where ˜ X and ˜ Y are generated by measurements on the halves ofthe entangled pair with B and A used to choose the settings such that the joint distribution P ˜ X ˜ Y ∣ BA maximally violates a CHSH inequality. By a similar post-processing of other non-localdistributions in the bipartite Bell causal structure (Figure 2.6a) such as the Mermin-Peresmagic square game [147, 162] and chained Bell inequalities [34], one can obtain other non-localdistributions in the Triangle that cannot be reproduced using classical systems. We explorewhether any of these violate any of our new inequalities.Since the values of B i ( q, d o , d u ) are monotonically decreasing in d o and d u , if a distributionrealisable in a quantum causal structure does not violate the bounds (4.8a)–(4.8c) for all q ≥ d o and d u , then no violations are possible for d ′ o > d o , d ′ u > d u .We therefore take the smallest possible values of d o and d u when showing that a particulardistribution cannot violate any of the bounds.For Fritz’s distribution [88], C is a two-qubit maximally entangled state, A and B are binaryrandom variables while X , Y and Z are random variables of dimension 4, i.e., the actualobserved dimensions are ( d X , d Y , d Z ) = ( , , ) in this case. Here we see that taking d o = d u which is d u =

2, the left hand sides of Equations (4.8a)–(4.8c) evaluatedfor Fritz’s distribution do not violate the corresponding bounds B i ( q, d o = , d u = ) for any q ≥

1. This means that it is not possible to detect any quantum advantage of this distribution(even over the case where the unobserved systems are classical bits) using this method, andautomatically implies that it cannot violate the bounds B i ( q, d o = , d u ) for d u ≥ HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION d i smallestScenario Ineq. (4.8a) ( i =

1) Ineq. (4.8b) ( i =

2) Ineq. (4.8c) ( i =

3) observed dim. ( d min o ) N = N = N = N = N = N = N = N = N =

10 10 3 10 20Magic Sq. 4 2 4 9

Table 4.1: Values of d i for the chained Bell and magic square correlations embedded inthe Triangle causal structure. The values of N correspond to the number of inputs per party in thechained Bell inequality, which always has two outputs per party (the N = ( d X , d Y , d Z ) = ( N, N, N ) . The last column of the table gives the minimum of the observednode dimensions ( d X , d Y , d Z ) for each N , which is simply 2 N . For the magic square, the dimensions ( d X , d Y , d Z ) are ( , , ) . In all cases, the minimum value of d i such that the Inequalities (4.8a)–(4.8c) with bounds B i ( q, d o = d i , d u = ) are not violated for any q ≥ d min o , and hence no violations of (4.8a)–(4.8c) could be found for the relevant casewith d o = d min o . causal structure analogously to the case discussed above. For each of these, we deﬁne d i tobe the smallest value of d o for which the bound B i ( q, d o = d i , d u = ) cannot be violated forany q >

1. The values of d i are given in Table 4.1 for the diﬀerent cases of the chained Bellcorrelations and the magic square. Since the values of d i are always lower than the smallest ofthe observed dimensions in the problem, and due to the monotonicity of the bounds it followsthat none of these quantum distributions violate any of our inequalities when the observeddimension is set to d min o .We further checked for violations of Inequalities (4.8a)–(4.8c) by sampling random quantumstates for the systems A , B and C and random quantum measurements whose outcomes wouldcorrespond to the classical variables X , Y and Z . The value of q was also sampled randomlybetween 1 and 100. We considered the cases where the shared systems were pairs of qubitswith 4 outcome measurements ( d X = d Y = d Z =

4) and qutrits with 9 outcome measurements( d X = d Y = d Z =

9) but were unable to ﬁnd violations of any of the inequalities even for thebounds with the d o = , d u = d o = , d u = Remark 4.3.3.

In the derivation of Inequalities (4.8a)–(4.8c), we set the dimensions of theobserved nodes X , Y and Z to all be equal and those of the unobserved nodes A , B and C to alsoall be equal. One could in principle repeat the same procedure taking diﬀerent dimensions forall 6 variables but we found the computational procedure too demanding. However, Table 4.1 HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION shows that even when we consider the bounds B i ( q, d o , d u ) with d o and d u much smaller thanthe actual dimensions, known non-local distributions in the Triangle considered in Table 4.1do not violate the corresponding Inequalities (4.8a)–(4.8c) for any q ≥

1. Since the bounds aremonotonically decreasing in d u and d o , even if we obtained the general bounds for arbitrarydimensions of X , Y , Z , A , B and C , they would be strictly weaker than B i ( q, d i , d u = )∀ i ∈ { , , } , q ≥ We have investigated the use of Tsallis entropies within the entropy vector method to causalstructures, showing how causal constraints imply bounds on the Tsallis entropies of the variablesinvolved. Although Tsallis entropies for q ≥ α do not satisfy strong subadditivity for α ≠ ,

1, while the Rényi as well as the minand max entropies fail to obey the chain rules for conditional entropies. Thus, use of these inthe entropy vector method, would require an entropy vector with components for all possibleconditional entropies as well as unconditional ones, considerably increasing the dimensionalityof the problem, which we would expect to make the computations harder. In some cases, nothaving a chain rule has not been prohibitive [218], but our results of Appendix 4.5.3 revealsigniﬁcant limitations of Rényi entropies for analysing causal structure.Further, one could consider using algorithms other than Fourier-Motzkin elimination to obtainnon-trivial Tsallis entropic constraints over observed nodes starting from the Tsallis cone overall the nodes (see e.g., [99]). These could in principle yield solutions even in cases whereFM elimination becomes intractable. However, we found that the FM elimination procedurebecame intractable even when starting out with only a small subset of the Tsallis entropic causalconstraints for a simple causal structure such as the Bell one. This suggests that the diﬃcultyis not only with the number of constraints, but also with their nature (in particular, that theyare not equalities and depend non-trivially on the dimensions). Consequently, we bypassed FMelimination and used an alternative technique to obtain new Tsallis entropic inequalities for theTriangle causal structure (Section 4.3).It is also worth noting that the following alternative deﬁnition of the Tsallis conditional entropy HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION was proposed in [3].˜ S q ( X ∣ Y ) = − q ∑ y P ( y ) q S q ( X ∣ Y = y )∑ y P ( y ) q = − q ( ∑ x,y P ( xy ) q ∑ y P ( y ) q − ) . (4.11)Using this deﬁnition, Tsallis entropies would satisfy the same causal constraints as the Shannonentropy (Equation (2.49)). However, the conditional entropies deﬁned this way do not satisfythe chain rules of Equation (2.20) but instead obey a non-linear chain rule, S q ( XY ) = S q ( X ) + S q ( Y ∣ X ) + ( − q ) S q ( X ) S q ( Y ∣ X ) [3]. This would again mean that conditional entropies wouldneed to be included in the entropy vector. Furthermore, since Fourier-Motzkin elimination onlyworks for linear constraints, an alternative algorithm would be required to use this chain rulein conjunction with the entropy vector method.That the inequalities for Tsallis entropy derived in this work depend on the dimensions of thesystems involved could be used to certify that particular observed correlations in a classicalcausal structure require a certain minimal dimension of unobserved systems to be realisable.To show this would require showing that classically-realisable correlations violate one of theinequalities for some d u . Such bounds would then complement the upper bounds of [183].However, in some cases we know our bounds are not tight enough to do this. As a simpleexample, within the Triangle causal structure we tried taking X = ( X B , X C ) , Y = ( Y A , Y C ) and Z = ( Z A , Z B ) with X B = Z B , X C = Y C and Y A = Z A where each are uniformly distributedwith cardinality D , for D ∈ { , . . . , } . In this case it is clear that the correlations cannot beachieved with classical unobserved systems with d u =

2. Taking the bound with d u = d o = D no violations of (4.8a)–(4.8c) were seen by plotting the graphs for q ∈ [ , ] , for therange of D above. Hence, our bounds are too loose to certify lower bounds on d u in this case.While our analysis highlights signiﬁcant drawbacks of using Tsallis entropies for analysingcausal structures, it does not rule out the possibility of Tsallis entropies being able to detectthe classical-quantum gap in these causal structures, or others. To overcome the diﬃcultieswe encountered we would either need increased computational power, and/or the developmentof new, alternative techniques for analysing causal structures (with or without entropies). The main results of this Chapter only required the Tsallis entropic causal constraints of Theo-rems 4.2.1 and 4.2.2, which were derived for classical causal structures. Here, we present addi- Proving that Tsallis entropies are unable to do this would also be diﬃcult. For instance, the proof of [215]that Shannon entropies are unable to detect the gap in line-like causal structures involves ﬁrst characterising themarginal polytope through Fourier-Motzkin elimination, which itself proved to be computationally infeasiblewith Tsallis entropies even for the simplest line-like causal structure, the bipartite Bell scenario. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION tional results that generalise these theorems to certain quantum causal structures. For these,we recap some of the properties of quantum Tsallis entropies encountered in Section 2.3.2.Tsallis entropies as deﬁned for classical random variables in Section 2.3.2 are easily generalisedto the quantum case by replacing the probability distribution by a density matrix [119]. Fora quantum system described by the density matrix ρ ∈ S ( H ) on the Hilbert space H and q >

0, the quantum Tsallis entropy is deﬁned by S q ( ρ ) = ⎧⎪⎪⎨⎪⎪⎩− Tr ρ q ln q ρ, q ≠ .H ( ρ ) , q = . (4.12)where H ( ρ ) = − Tr ρ ln ρ is the von Neumann entropy of ρ and ln q ( x ) = x − q − − q as in Sec-tion 2.3.2. Given a density operator ρ AB ∈ S ( H A ⊗ H B ) , the conditional quantum Tsallis entropy of A given B can then be deﬁned by S q ( A ∣ B ) ρ = S q ( AB ) − S q ( B ) , the mutual information between A and B by I q ( A ∶ B ) ρ = S q ( A ) + S q ( B ) − S q ( AB ) , and for ρ ABC ∈ S ( H A ⊗ H B ⊗ H C ) the conditional Tsallis information between A and B given C is deﬁned by I q ( A ∶ B ∣ C ) ρ = S q ( A ∣ C ) + S q ( B ∣ C ) − S q ( AB ∣ C ) . In this section we use d S to represent the dimensions of theHilbert space H S .The following properties of quantum Tsallis entropies will be useful for what follows.1. Pseudo-additivity [201]: If ρ AB = ρ A ⊗ ρ B , then S q ( AB ) = S q ( A ) + S q ( B ) + ( − q ) S q ( A ) S q ( B ) . (4.13)2. Upper bound [13]:

For all q >

0, we have S q ( A ) ≤ ln q d A and equality is achieved if andonly if ρ A = A / d A .3. Subadditivity [13]:

For any density matrix ρ AB with marginals ρ A and ρ B , the followingholds for all q ≥ S q ( AB ) ≤ S q ( A ) + S q ( B ) . (4.14)Using these we can generalize Theorem 4.2.1 to the quantum case. This corresponds to thecausal structure with two independent quantum nodes and no edges in between them. Theorem 4.5.1.

For all bipartite density operators in product form, i.e., ρ AB = ρ A ⊗ ρ B with ρ A ∈ S ( H A ) and ρ B ∈ S ( H B ) , the quantum Tsallis mutual information I q ( A ∶ B ) ρ is upperbounded as follows for all q > I q ( A ∶ B ) ρ ≤ f ( q, d A , d B ) , Analogously to the classical case we keep it implicit that if ρ has any 0 eigenvalues these do not contributeto the trace. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION where the function f ( q, d A , d B ) is given by f ( q, d A , d B ) = ( q − ) ( − d q − A ) ( − d q − B ) = ( q − ) ln q d A ln q d B . The bound is saturated if and only if ρ AB = A d A ⊗ B d B .Proof. The proof goes through in the same way as the proof of Theorem 4.2.1 for the classicalcase (Properties 1 and 2 are analogous to those needed in the classical proof).Next, we generalise Theorem 4.2.2 and Corollaries 4.2.1 and 4.2.2. This would correspond to thecausal constraints on quantum Tsallis entropies implied by the common cause causal structurewith C being a complete common cause of A and B (which share no causal relations amongthemselves). Here, one must be careful in precisely deﬁning the conditional mutual informationand interpreting it physically. For example, if the common case C were quantum and the nodes A and B were classical outcomes of measurements on C , then A , B and C do not coexist andthere is no joint state ρ ABC in such a case. This is a signiﬁcant diﬀerence in quantum causalmodelling compared to the classical case, and there have been several proposals for how do dealwith it [132, 67, 6, 164]. In the following we consider two cases:1. When C is classical, all 3 systems coexist and ρ ABC can be described by a classical-quantum state (See Theorem 4.5.2).2. When C is quantum, one approach is to view ρ ABC not as the joint state of the 3 systemsbut as being related to the Choi-Jamiolkowski representations of the quantum channelsfrom C to A and B (See Section 4.5.1.1) as done in [6].The following Lemma proven in [125] is required for our generalization of Theorem 4.2.2 in theﬁrst case. Lemma 4.5.1 ([125], Lemma 1) . Let H A and H Z be two Hilbert spaces and {∣ z ⟩} z be anorthonormal basis of H Z . Let ρ AZ be classical on H Z with respect to this basis i.e., ρ AZ = ∑ z P ( z ) ρ ( z ) A ⊗ ∣ z ⟩⟨ z ∣ , where ∑ z P ( z ) = and ρ ( z ) A ∈ S ( H A ) ∀ z . Then for all q > , S q ( AZ ) ρ = ∑ z P ( z ) q S q ( ρ ( z ) A ) + S q ( Z ) , where S q ( Z ) is the classical Tsallis entropy of the variable Z distributed according to P Z . Note that the above Lemma immediately implies that S q ( A ∣ Z ) ρ = ∑ z P ( z ) q S q ( ρ ( z ) A ) . (4.15) HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

Theorem 4.5.2.

Let ρ ABC = ∑ c P ( c ) ρ ( c ) AB ⊗ ∣ c ⟩⟨ c ∣ , where ρ ( c ) AB = ρ ( c ) A ⊗ ρ ( c ) B ∀ c , then, for all q ≥ , I q ( A ∶ B ∣ C ) ρ ABC ≤ f ( q, d A , d B ) . For q > the bound is saturated if and only if ρ ABC = A d A ⊗ B d B ⊗ ∣ c ⟩⟨ c ∣ C .Proof. Using (4.15) we have, I q ( A ∶ B ∣ C ) ρ ABC = S q ( A ∣ C ) ρ + S q ( B ∣ C ) ρ − S q ( AB ∣ C ) ρ = ∑ c P ( c ) q [ S q ( ρ ( c ) A ) + S q ( ρ ( c ) B ) − S q ( ρ ( c ) AB )]= ∑ c P ( c ) q I q ( A ∶ B ) ρ ( c ) AB . The rest of the proof is analogous to Theorem 4.2.2, where using the above, Theorem 4.5.1 anddeﬁning the set

R = { ρ ABC ∈ H A ⊗ H B ⊗ H C ∶ ρ ABC = ∑ c P ( c ) ρ ( c ) A ⊗ ρ ( c ) B ⊗ ∣ c ⟩⟨ c ∣} we have,max R I q ( A ∶ B ∣ C ) ρ = max R ∑ c P ( c ) q I q ( A ∶ B ) ρ ( c ) AB ≤ max { P ( c )} c ∑ c P ( c ) q ( c ) max { ρ ( c ) A } c , { ρ ( c ) B } c I q ( A ∶ B ) ρ ( c ) AB = f ( q, d A , d B ) , where the last step follows because for all q ≥ ∑ c P ( c ) q is maximized by deterministic distri-butions over C with a maximum value of 1 and I q ( A ∶ B ) ρ ( c ) AB for product states is maximisedby the maximally mixed state over A and B for all c (Theorem 4.5.1). Thus, for q >

1, thebound is saturated if and only if ρ ABC = A d A ⊗ B d B ⊗ ∣ c ⟩⟨ c ∣ C for some value c of C . There is a fundamental problem with naively generalising classical conditional independencessuch as P XY ∣ Z = P X ∣ Z P Y ∣ Z to the quantum case by replacing joint distributions by densitymatrices: it is not clear what is meant by a conditional quantum state e.g., ρ A ∣ C since it isnot clear what it means to condition on a quantum system, specially when the (joint state ofthe) system under consideration and the one being conditioned upon do not coexist. Thereare a number of approaches for tackling this problem, from describing quantum states in spaceand time on an equal footing [117] to quantum analogues of Bayesian inference [132] and causalmodelling [67, 6, 164]. In the following, we will focus on one such approach that is motivated bythe framework of [6]. Central to this approach is the Choi-Jamiołkowski isomorphism [121, 54]from which one can deﬁne conditional quantum states. For q > HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

E E i E ii E iii = E CU U A U B | ψ i| φ i E B E A A B A BTr A Tr B Figure 4.1: A circuit decomposition of the channel E ∶ S ( H C ) → S ( H A ⊗ H B ) when C is a complete common cause of A and B: If the map E from the system C to the systems A and B can be decomposed as shown here, then C is a complete commoncause of A and B ([6]). We build up our result step by step considering the channelsgiven by E i (unitary), E ii (unitary followed by local isometries) and E iii = E . Deﬁnition 4.5.1 (Choi state) . Let ∣ γ ⟩ = ∑ i ∣ i ⟩ R ∣ i ⟩ R ∗ ∈ H R ⊗ H R ∗ , where H R ∗ is the dual spaceto H R and {∣ i ⟩ R } i , {∣ i ⟩ R ∗ } i are orthonormal bases of H R and H R ∗ respectively. Given a channel E R ∣ S ∶ S ( H R ) → S ( H S ) , the Choi state of the channel is deﬁned by ρ S ∣ R = ( E R ∣ S ⊗ I)(∣ γ ⟩⟨ γ ∣) = ∑ ij E (∣ i ⟩⟨ j ∣ R ) ⊗ ∣ i ⟩⟨ j ∣ R ∗ . Thus, ρ S ∣ R ∈ P ( H S ⊗ H R ∗ ) .Now, if a quantum system C evolves through a unitary channel E I (⋅) = U ′ (⋅) U ′ † to two systems A ′ and B ′ where U ′ ∶ H C → H A ′ ⊗ H B ′ , it is reasonable to call the system C a quantumcommon cause of the systems A ′ and B ′ . Further, this would still be reasonable if one were tothen perform local completely positive trace preserving (CPTP) maps on the A ′ and B ′ systems.By the Stinespring dilation theorem, these local CPTP maps can be seen as local isometriesfollowed by partial traces, and the local isometries can be seen as the introduction of an ancillain a pure state followed by a joint unitary on the system and ancilla. This is illustrated inFigure 4.1 and is compatible with the deﬁnition of quantum common causes presented in [6].In other words, a system C can be said to be a complete (quantum) common cause of systems A and B if the corresponding channel E ∶ S ( H C ) → S ( H A ⊗ H B ) can be decomposed as inFigure 4.1 for some choice of unitaries U ′ , U A , U B and pure states ∣ φ ⟩ E A , ∣ ψ ⟩ E B . Note that amore general set of channels ﬁt the deﬁnition of quantum common cause in Ref. [6] than weuse here; whether the theorems here extend to this case we leave as an open question.In [6] it is shown that whenever a system C is a complete common cause of systems A and B then the Shannon conditional mutual information evaluated on the state τ ABC ∗ = d A ρ AB ∣ C HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION satisﬁes I ( A ∶ B ∣ C ∗ ) τ = ρ AB ∣ C is the Choi state of the channel from C to A and B . Wegeneralise this result to Tsallis entropies for q ≥ C to its children A and B is ( i ) unitary ( E i = U ′ ); ( ii ) unitary followed by local isometries ( E ii ); ( iii ) Unitaryfollowed by local isometries followed by partial traces on local systems ( E iii = E ). Lemma 4.5.2.

Let E i ∶ S ( H C ) → S ( H A ′ ⊗ H B ′ ) be a unitary quantum channel i.e., E i (⋅) = U ′ (⋅) U ′ † , where U ′ ∶ H C → H A ′ ⊗ H B ′ is an arbitrary unitary operator. If ρ A ′ B ′ ∣ C is the correspondingChoi state, then the Tsallis conditional mutual information evaluated on the state τ A ′ B ′ C ∗ = d C ρ A ′ B ′ ∣ C ∈ S ( H A ′ ⊗ H B ′ ⊗ H C ∗ ) satisﬁes I q ( A ′ ∶ B ′ ∣ C ∗ ) τ = f ( q, d A ′ , d B ′ ) ∀ q > . Proof.

The conditional mutual information I q ( A ′ ∶ B ′ ∣ C ∗ ) τ can be written as I q ( A ′ ∶ B ′ ∣ C ∗ ) τ = q − ( Tr A ′ B ′ C ∗ τ qA ′ B ′ C ∗ + Tr C ∗ τ qC ∗ − Tr A ′ C ∗ τ qA ′ C ∗ − Tr B ′ C ∗ τ qB ′ C ∗ ) . (4.16)We will now evaluate every term in the above expression for the case where the channel thatmaps the C system to the A ′ and B ′ systems is unitary. In this case, τ A ′ B ′ C ∗ is a pure stateand can be written as τ A ′ B ′ C ∗ = ∣ τ ⟩⟨ τ ∣ A ′ B ′ C ∗ where ∣ τ ⟩ A ′ B ′ C ∗ = √ d C ∑ i U ′ ∣ i ⟩ C ⊗ ∣ i ⟩ C ∗ . (4.17)This means that Tr A ′ B ′ C ∗ τ qA ′ B ′ C ∗ = Tr A ′ B ′ C ∗ τ A ′ B ′ C ∗ ∀ q >

0. Since τ A ′ B ′ C ∗ is a valid quantumstate, it must be a trace one operator and we haveTr A ′ B ′ C ∗ τ qA ′ B ′ C ∗ = ∀ q > τ C ∗ = Tr A ′ B ′ τ A ′ B ′ C ∗ = C ∗ d C and henceTr C ∗ τ qC ∗ = d q − C = d q − A ′ d q − B ′ . (4.19)The second step follows from the fact that U ′ ∶ H C → H A ′ ⊗ H B ′ is unitary so d C = d A ′ d B ′ .Now, the marginals over A ′ and B ′ are τ A ′ = Tr B ′ C ∗ τ A ′ B ′ C ∗ = A ′ d A ′ and τ B ′ = Tr A ′ C ∗ τ A ′ B ′ C ∗ = B ′ d B ′ .By the Schmidt decomposition of τ A ′ B ′ C ∗ , the non-zero eigenvalues of τ A ′ are the same as thoseof τ B ′ C ∗ . Since the Tsallis entropy depends only on the non-zero eigenvalues, S q ( A ′ ) = S q ( B ′ C ∗ ) and hence Tr B ′ C ∗ τ qB ′ C ∗ = d A ′ ⎛⎝ d qA ′ ⎞⎠ = d q − A ′ . (4.20) HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

By the same argument it follows thatTr A ′ C ∗ τ qA ′ C ∗ = d B ′ ⎛⎝ d qB ′ ⎞⎠ = d q − B ′ . (4.21)Combining Equations (4.16)-(4.21), we have I q ( A ′ ∶ B ′ ∣ C ∗ ) τ = q − ⎛⎝ + d q − A ′ d q − B ′ − d q − A ′ − d q − B ′ ⎞⎠ = f ( q, d A ′ , d B ′ ) ∀ q > . (4.22) Lemma 4.5.3.

Let E ii ∶ S ( H C ) → S ( H ˜ A ⊗ H ˜ B ) be a quantum channel of the form E ii (⋅) = ( U A ⊗ U B )[∣ φ ⟩⟨ φ ∣ E A ⊗ U ′ (⋅) U ′ † ⊗ ∣ ψ ⟩⟨ ψ ∣ E B ]( U A ⊗ U B ) † , where U ′ ∶ H C → H A ′ ⊗ H B ′ , U A ∶ H E A ⊗ H A ′ → H ˜ A and U B ∶ H B ′ ⊗ H E B → H ˜ B are arbitraryunitaries and ∣ φ ⟩ E A and ∣ ψ ⟩ E B are arbitrary pure states. If ρ ˜ A ˜ B ∣ C is the corresponding Choistate, then the Tsallis conditional mutual information evaluated on the state τ ˜ A ˜ BC ∗ = d C ρ ˜ A ˜ B ∣ C ∈ S ( H ˜ A ⊗ H ˜ B ⊗ H C ∗ ) satisﬁes I q ( ˜ A ∶ ˜ B ∣ C ∗ ) τ = f ( q, d A ′ , d B ′ ) ∀ q > . Proof.

Note that the map E ii is the unitary map E i (⋅) = U ′ (⋅) U ′ † followed by local isometries V A and V B on the A ′ and B ′ systems respectively. Since the expression for the conditional mutualinformation I q ( ˜ A ∶ ˜ B ∣ C ∗ ) τ can be written in terms of entropies, which are functions of theeigenvalues of the relevant reduced density operators, and since the eigenvalues are unchangedby local isometries, this conditional mutual information is invariant under local isometries. Therest of the proof is identical to that of Lemma 4.5.2 resulting in I q ( ˜ A ∶ ˜ B ∣ C ∗ ) τ = I q ( A ′ ∶ B ′ ∣ C ∗ ) τ = f ( q, d A ′ , d B ′ ) ∀ q > . (4.23)For the last case where E iii (⋅) = Tr A ′′ B ′′ [( U A ⊗ U B )[∣ φ ⟩⟨ φ ∣ E A ⊗ U ′ (⋅) U ′ † ⊗ ∣ ψ ⟩⟨ ψ ∣ E B ]( U A ⊗ U B ) † ] ,one could intuitively argue that tracing out systems could not increase the mutual informationand one would expect that I q ( AA ′′ ∶ BB ′′ ∣ C ∗ ) τ ≥ I q ( A ∶ B ∣ C ∗ ) τ . (4.24)Since I q ( AA ′′ ∶ BB ′′ ∣ C ∗ ) τ = I q ( A ∶ B ∣ C ∗ ) τ + I q ( AA ′′ ∶ B ′′ ∣ BC ∗ ) τ + I q ( A ′′ ∶ B ∣ AC ∗ ) τ , Equa-tion (4.24) would follow from strong subadditivity used twice i.e., I q ( AA ′′ ∶ B ′′ ∣ BC ∗ ) τ ≥ I q ( A ′′ ∶ B ∣ AC ∗ ) τ ≥

0. However, it is known that strong subadditivity does not hold in generalfor Tsallis entropies for q > I q ( AA ′′ ∶ B ∣ C ) τ (or I q ( A ∶ BB ′′ ∣ C ) τ ) corresponding to the map E iii where only one of A ′′ or B ′′ is traced out butnot both. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

Lemma 4.5.4 (Suﬃciency condition for strong subadditivity of Tsallis entropies) . If ρ ABC isa pure quantum state, then for all q ≥ we have I q ( A ∶ B ∣ C ) ρ ≥ .Proof. We have I q ( A ∶ B ∣ C ) = S q ( AC ) + S q ( BC ) − S q ( ABC ) − S q ( C ) . Since ρ ABC is pure we have S q ( ABC ) = ∀ q > S q ( AC ) = S q ( B ) , S q ( BC ) = S q ( A ) and S q ( C ) = S q ( AB ) . Thus, I q ( A ∶ B ∣ C ) = S q ( A ) + S q ( B ) − S q ( AB ) = I q ( A ∶ B ) ≥ , which follows from subadditivity of quantum Tsallis entropies for q ≥ ρ ABC , strong subadditivity of Tsallis entropies is equivalent to their subadditivity whichholds whenever q ≥ Corollary 4.5.1.

Let E iii ∶ S ( H C ) → S ( H ˜ A ⊗ H B ) be a quantum channel of the form E iii (⋅) = Tr B ′′ [( U A ⊗ U B )[∣ φ ⟩⟨ φ ∣ E A ⊗ U ′ (⋅) U ′ † ⊗ ∣ ψ ⟩⟨ ψ ∣ E B ]( U A ⊗ U B ) † ] , where U ′ ∶ H C → H A ′ ⊗ H B ′ , U A ∶ H E A ⊗ H A ′ → H ˜ A ≅ H A ⊗ H A ′′ and U B ∶ H B ′ ⊗ H E B → H ˜ B ≅ H B ⊗ H B ′′ are arbitrary unitaries and ∣ φ ⟩ E A and ∣ ψ ⟩ E B are arbitrary pure states. If ρ ˜ AB ∣ C isthe corresponding Choi state, then the Tsallis conditional mutual information evaluated on thestate τ ˜ ABC ∗ = d C ρ ˜ AB ∣ C ∈ S ( H ˜ A ⊗ H B ⊗ H C ∗ ) satisﬁes I q ( ˜ A ∶ B ∣ C ∗ ) ∶= I q ( AA ′′ ∶ B ∣ C ∗ ) τ ≤ f ( q, d A ′ , d B ′ ) ∀ q ≥ . Proof.

Since I q ( AA ′′ ∶ BB ′′ ∣ C ∗ ) τ = I q ( AA ′′ ∶ B ∣ C ∗ ) τ + I q ( AA ′′ ∶ B ′′ ∣ BC ∗ ) τ , the purity of τ ˜ A ˜ BC ∗ = τ AA ′′ BB ′′ C ∗ and Lemma 4.5.4 imply that I q ( AA ′′ ∶ BB ′′ ∣ C ∗ ) τ ≥ I q ( AA ′′ ∶ B ∣ C ∗ ) τ , ∀ q ≥ , or (equivalently) in more concise notation, I q ( ˜ A ∶ ˜ B ∣ C ∗ ) τ ≥ I q ( ˜ A ∶ B ∣ C ∗ ) τ ∀ q ≥ . Finally, using Lemma 4.5.3 we obtain the required result.Now, for Equation (4.24) to hold, we do not necessarily need strong subadditivity. Even if I q ( A ′′ ∶ B ∣ AC ) τ ≥ I q ( AA ′′ ∶ B ′′ ∣ BC ) τ + I q ( A ′′ ∶ B ∣ AC ) τ ≥

0. This motivates the following conjecture.

Conjecture 4.5.1.

Let E iii ∶ S ( H C ) → S ( H A ⊗ H B ) be a quantum channel of the form E iii (⋅) = Tr A ′′ B ′′ [( U A ⊗ U B )[∣ φ ⟩⟨ φ ∣ E A ⊗ U ′ (⋅) U ′ † ⊗ ∣ ψ ⟩⟨ ψ ∣ E B ]( U A ⊗ U B ) † ] , where U ′ ∶ H C → H A ′ ⊗ H B ′ , U A ∶ H E A ⊗ H A ′ → H A ⊗ H A ′′ and U B ∶ H B ′ ⊗ H E B → H B ⊗ H B ′′ arearbitrary unitaries and ∣ φ ⟩ E A and ∣ ψ ⟩ E B are arbitrary pure states. If ρ AB ∣ C is the correspondingChoi state, then the Tsallis conditional mutual information evaluated on the state τ ABC ∗ = d C ρ AB ∣ C ∈ S ( H A ⊗ H B ⊗ H C ∗ ) satisﬁes I q ( A ∶ B ∣ C ∗ ) τ ≤ f ( q, d A ′ , d B ′ ) ∀ q ≥ . HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

Notice that in Corollary 4.5.1 and Conjecture 4.5.1, the bounds are functions of d A ′ and d B ′ and not of the dimensions of the systems A and B (those in the quantity on the left handside). In the case that d A ≥ d A ′ and d B ≥ d B ′ , the fact that f ( q, d A , d B ) is a strictly increasingfunction of d A and d B ∀ q ≥ I q ( ˜ A ∶ B ∣ C ∗ ) τ ≤ f ( q, d ˜ A , d B ) and I q ( A ∶ B ∣ C ∗ ) τ ≤ f ( q, d A , d B ) under the conditions of Corollary 4.5.1 and Conjecture 4.5.1 respectively. However,if d A ≤ d A ′ and/or d B ≤ d B ′ , the bounds f ( q, d ˜ A , d B ) and f ( q, d A , d B ) are tighter than the bound f ( q, d A ′ , d B ′ ) and so not implied. However, based on the several examples that we have checked,we further conjecture the following. Conjecture 4.5.2.

Under the same conditions as Conjecture 4.5.1 I q ( A ∶ B ∣ C ∗ ) τ ≤ f ( q, d A , d B ) ∀ q ≥ . Further, it is shown in [6] that if C is a complete common cause of A and B then the corre-sponding Choi state, ρ AB ∣ C decomposes as ρ AB ∣ C = ( ρ A ∣ C ⊗ B )( A ⊗ ρ B ∣ C ) or ρ AB ∣ C = ρ A ∣ C ρ B ∣ C in analogy with the classical case where if a classical random variable Z is a common causeof the random variables X and Y , then the joint distribution over these variables factorisesas p XY ∣ Z = p X ∣ Z p Y ∣ Z . Then we have that τ ABC ∗ = d C ρ AB ∣ C = d C ρ A ∣ C ρ B ∣ C . By further analogywith the classical results of Section 4.2, one may also consider instead a state of the formˆ σ ABCC ∗ = σ C ⊗ d C ρ A ∣ C ρ B ∣ C = σ C ⊗ τ ABC ∗ , where σ C ∈ S ( H C ) . Note that ˆ σ ABCC ∗ is a validdensity operator on H A ⊗ H B ⊗ H C ⊗ H C ∗ . Lemma 4.5.5.

The state ˆ σ ABCC ∗ = σ C ⊗ τ ABC ∗ deﬁned above satisﬁes I q ( A ∶ B ∣ CC ∗ ) ˆ σ ≤ f ( q, d A , d B ) , whenever I q ( A ∶ B ∣ C ∗ ) τ ≤ f ( q, d A , d B ) holds for the state τ ABC ∗ = d A ρ AB ∣ C , where ρ AB ∣ C rep-resents the quantum channel from C to A and B and σ C is the input quantum state to thischannel.Proof. Since ˆ σ is a product state between the C and ABC ∗ subsystems, by the pseudo-additivityof quantum Tsallis entropies and the chain rule we have I q ( A ∶ B ∣ CC ∗ ) ˆ σ = S q ( ACC ∗ ) + S q ( BCC ∗ ) − S q ( ABCC ∗ ) − S q ( CC ∗ )= S q ( AC ∗ ) + S q ( BC ∗ ) − S q ( ABC ∗ ) − S q ( C ∗ )− ( q − ) S q ( C ) ( S q ( AC ∗ ) + S q ( BC ∗ ) − S q ( ABC ∗ ) − S q ( C ∗ ))= ( − ( − q ) S q ( C )) I ( A ∶ B ∣ C ∗ )= Tr ( σ qC ) I ( A ∶ B ∣ C ∗ ) . Now let p c be the distribution whose entries are the eigenvalues of σ C . We have Tr ( σ qC ) = ∑ c p qc .Thus if q > ∑ c p qc ≤ p c = c . It follows that I q ( A ∶ B ∣ CC ∗ ) ˆ σ ≤ I q ( A ∶ B ∣ C ∗ ) τ . Therefore, if I q ( A ∶ B ∣ C ∗ ) τ ≤ f ( q, d A , d B ) , we also have I q ( A ∶ B ∣ CC ∗ ) ˆ σ ≤ f ( q, d A , d B ) . This is the analogue of the statement P ABC = P C P A ∣ C P B ∣ C for probability distributions. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

LPAssumptions : A new linear program solver for Mathemat-ica

Our new Tsallis entropic inequalities for the Triangle causal structure, presented in Section 4.3were obtained using the

LPAssumptions

Mathematica package that we developed for solvinglinear programs involving unspeciﬁed variables. The package is available as open source onGithub [62]. Here we provide a brief overview of the main functionality of this package andhow it was used for obtaining the results presented earlier in this chapter.Linear programs are discussed in Section 2.2.3.1 of the preliminaries, but we restate the stan-dard form of a linear program (according to the convention followed in this thesis) here forcompleteness. A generic linear programming problem can be speciﬁed by giving a vector c , amatrix M and a vector b such that the problem corresponds toMinimize c.x Subject to

M x ≥ bx ≥ LinearProgramming function. If c , M and b are completely speciﬁed(i.e., consist of numerical values), LinearProgramming can already be used to solve the problem.The aim of our package [62] is to also be able to cope with the case where there are unspeciﬁedconstants in vector c representing the objective function. For example, we might wish to opti-mize x + ax , where a is an unspeciﬁed constant in the range 0 ≤ a ≤

1, returning the solution forall values of a in the range. In our case, for the results of Section 4.3, we optimised the entropicexpressions of Equations (4.7a)-(4.7c) with the Shannon entropies H () replaced by Tsallis en-tropies S q () over our outer approximation to the classical Tsallis entropy cone for the Trianglecausal structure. This outer approximation is characterised by the Shannon constraints andour Tsallis entropic causal constraints (Corollary 4.2.2). The latter involve unspeciﬁed variablesother than those being optimised over, namely the dimensions/cardinality of the variables andthis optimisation cannot be performed using Mathematica’s inbuilt LinearProgramming function,and requires our

LPAssumptions package.

LPAssumptions works by using the two phase simplex algorithm [70] (Section 2.2.3.2). The al-gorithm involves visiting adjacent vertices of the feasible region (a polyhedron), starting from aninitial vertex, until the vertex corresponding to the optimal solution is reached. The coeﬃcientsof the objective function and the constraints are encoded in a tableau and at each step, the nextvertex to be visited is decided by checking the sign of certain elements in this tableau, whichwould indicate whether or not moving to that vertex would improve the value of the objectivefunction. In our case, these elements of the tableau can depend on the additional unspeci-ﬁed variables in the problem.

LPAssumptions relies on Mathematica’s

Simplify[expr,assump] command to decide on on the sign of these elements, which will then tell it how to proceedthrough the computation. Here, expr is an inequality that checks for the sign and assump arethe constraints on the unspeciﬁed variables that can be input by the user, along with any addi-tional assumptions on the expressions that are made during the computation. If Mathematica HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION is unable to decide whether the inequality is true or false, the algorithm splits into two cases,one in which it assumes expr is true, and the other in which it assumes it is false. It then carrieson adding additional assumptions as necessary until termination (or the number of iterations isexceeded). Problems can arise if expr is true (or false) but Mathematica’s

Simplify is unable todetermine this (see the examples notebook mentioned in the next paragraph for such a case).The main function of the package, is

LPAssumptions[c, M, b, assump, (Options)] . It takes asinput c (a vector corresponds to the objective function), M (the constraint matrix), b (theconstraint vector) and assump (any initial assumptions on the unspeciﬁed variables involved).The output is a set of pairs {( constr i , vec i )} i , where constr i speciﬁes the set of constraints underwhich the optimum is achieved by the vector vec i . With no assumptions and no unspeciﬁedvariables, provided the problem is feasible and bounded, the answer given by LPAssumptions should match that of

LinearProgramming (up to the slightly diﬀerent structure of the output).Note that the current version of our package can only handle linear programs in which all theadditional, unspeciﬁed variables occur in the objective function. Linear programs where all suchvariables occur in the constraint vector can also be solved using our program by the followingtrick: convert the original LP to the desired form (variables appearing only in the objectivevector c ) by taking the dual of the LP, solve the dual LP using LPAsssumptions and use theduality theorems of Section 2.2.3.1 to obtain a solution for the original LP. We had to use thistrick for obtaining our results of Section 4.3.A detailed explanation of the various functionalities and commands can be found in the usermanual we have made available along with the package [62], and a Mathematica notebook(

LPAssumptions_examples.nb ) with examples illustrating the use of this package is availableat . In the examples, we alsoshow how to compute the local weight (deﬁned in Appendix 5.6.3, see also [229, 66]) of a non-signaling probability distribution where the distribution has unspeciﬁed parameters (speciﬁcallywe consider a noisy PR-box with ineﬃcient detectors, see Section IV D F of [167]). This packagewas also used in some of the results of [208] which are presented in Chapter 5.

Given the limitations of Shannon as well as Tsallis entropies found in [215] and the workpresented in this chapter [207], a natural question that arises is— what about other generalisedentropies? We already noted that other entropies like the Rényi, min or max entropies do notsatisfy the chain rule (Table 2.1). The chain rule is important for the entropy vector methodsince it allows conditional entropies to be written in terms of unconditional ones (Equation (4.4))and hence these need not be included as independent components of the entropy vector. In theabsence of a chain rule, one would need to include the conditional entropies involving all possiblesubsets of nodes as additional components of the entropy vector, thereby signiﬁcantly increasingthe dimensionality of the problem and one would expect this to make the computations harder.This was an important motivation behind the choice of Tsallis entropies for the work presentedin this chapter. However, in some cases, it has been possible to obtain non-trivial results usingthe entropic technique even in the absence of a chain rule [218]. HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION

In a side project with Mirjam Weilenmann, we investigated the possibility of using Rényientropies for analysing causal structures. For this, we considered the conditional Rényi entropiesof Deﬁnition 2.3.9. Some of the properties of Rényi entropies and this conditional version werediscussed in Section 2.3.3 and summarised in Table 2.1, these include: the non-negativity of theentropy i.e. H α ( X ) ≥ ∀ α ≥

0, additivity i.e., if P XY = P X P Y , then H α ( XY ) = H α ( X )+ H α ( Y )∀ α ≥ H α ( X ) ≤ H α ( XY ) ∀ α ∈ ( , ) ∪ ( , ∞) [136] and the conditionalform of strong subadditivity, also known as data processing H α ( X ∣ Y ) ≥ H α ( X ∣ Y Z ) ∀ α ≥ Property 1: (Non-negativity of conditional entropy) H α ( X ∣ Y ) ≥ ∀ α ≥ Proof.

Since ∑ x P ( x ∣ y ) α ≤ ∑ x P ( x ∣ y ) = ∀ y ∈ Y and α ≥

1, we have that ∑ y P ( y ) ∑ x P ( x ∣ y ) α ≤ ∑ y P ( y ) = (∑ y P ( y ) ∑ x P ( x ∣ y ) α ) ≤ ∀ α ≥

1. Thus H α ( X ∣ Y ) ≥ ∀ α ≥ Property 2: (Independence) If P XY = P X P Y , then H α ( X ∣ Y ) = H α ( X ) ∀ α ≥ Proof.

This directly follows from using P ( xy ) = P ( x ) P ( y ) ∀ x ∈ X, y ∈ Y in the deﬁnitionof the conditional Rényi entropy (Equation (2.36)), H α ( X ∣ Y ) = − α log (∑ y P ( y ) ∑ x P ( x ) α ) = − α log (∑ x P ( x ) α ) = H α ( X ) . • Property 3: (Generalised chain rule) ∀ α ∈ ( , ) ∪ ( , ∞) , ∀ P XY Z , H α ( X ∣ Y Z ) ≥ H α ( XZ ∣ Y ) − log d Z This is derived for quantum Rényi entropies in [148] and follows as a special case for ourclassical deﬁnition.•

Property 4: (Conditional independence or entropic causal constraint) If P XY ∣ Z = P X ∣ Z P Y ∣ Z ,then H α ( X ∣ Y Z ) = H α ( X ∣ Z ) or equivalently, I α ( X ∶ Y ∣ Z ) = ∀ α ≥ Proof.

Using the conditional independence P XY ∣ Z = P X ∣ Z P Y ∣ Z ⇒ P ( xyz ) = P ( x ∣ z ) P ( y ∣ z ) P ( z )= P ( x ∣ z ) P ( yz ) ∀ x ∈ X, y ∈ Y, z ∈ Z in the deﬁnition of the conditional Rényi entropy HAPTER 4. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITHOUT POST-SELECTION (Equation (2.36)) we have H α ( X ∣ Y Z ) = − α log (∑ xyz P ( xyz ) α P ( yz ) − α )= − α log (∑ xyz P ( x ∣ z ) α P ( yz ) α P ( yz ) − α )= − α log (∑ xz P ( x ∣ z ) α ∑ y P ( yz ))= − α log (∑ xz P ( x ∣ z ) α P ( z )) = H α ( X ∣ Z ) . (4.25)By the deﬁnition of the conditional mutual information, I α ( X ∶ Y ∣ Z ) = H ( X ∣ Y Z ) − H ( Y ∣ Z ) , it follows that I α ( X ∶ Y ∣ Z ) = Property 5: (Monotonicity under conditioning) H α ( X ∣ Z ) ≤ H α ( XY ∣ Z ) ∀ α ≥ Proof.

Noting that ∑ y P ( xy ∣ z ) α ≤ ( ∑ y P ( xy ∣ z )) α = P ( x ∣ z ) α ∀ α ≥

0, we have H α ( XY ∣ Z ) = − α log (∑ z P ( z ) ∑ x ∑ y P ( xy ∣ z ) α )≥ − α log (∑ z P ( z ) ∑ x P ( x ∣ z ) α ) = H α ( X ∣ Z ) . (4.26)Using these constraints to deﬁne the outer approximation to the classical Rényi entropy conefor the bipartite Bell causal structure (Figure 3.1a), we applied the entropy vector method forthis case. The range of α for which all these constraints hold is α ≥

1, and the following appliesto this range. The program terminated and no non-trivial entropic inequalities were obtained,indicating that this outer approximation to the classical marginal Rényi entropy cone coincideswith the quantum one for this causal structure. This is the same outcome as the Shannon case( α =

1) found in [215], while for the Tsallis case analysed in the main part of this chapter, wefound the problem to be computationally too costly and were hence unable to ﬁnd any non-trivial Tsallis entropic inequalities. In the Rényi case, the program ran for several days (5-7days on a regular desktop PC), probably owing to the dimension-dependency of the generalisedchain rule (Property 4.5.3), suggesting that this method would be computationally costly forlarger causal structures such as the Triangle (Figure 2.6b).In the presence of post-selection, we saw that the Shannon entropic Braunstein-Caves inequal-ities (3.12) can be derived for the post-selected Bell causal structure of Figure 3.1b. Theseare derived by using the monotonicity (2.22) and strong-subadditivity (2.24) (unconditionalform) on the entropies of the set { X , X , Y , Y } . In the Rényi case however, strong sub-additivity does not hold in the unconditional form, and such inequalities cannot be derivedanalogously. Implementing the entropy vector method for Rényi entropies in the post-selectedBell causal structure, it was found by Mirjam that no non-trivial inequalities are obtained. Webelieve that this failure to detect non-classicality using Rényi entropies is because the gener-alised chain rule 4.5.3 is not a tight enough constraint, and as a consequence, the initial outerapproximation to classical entropy cone is not a good one. HAPTER Entropic analysis of causal structures with post-selection T he results of Chapter 4 along with those of [215] have revealed important drawbacks of theentropic technique for analysing causal structures in the absence of post-selection, usingShannon, Tsallis and well as Rényi entropies. In the present chapter, we analyse the post-selected Bell causal structure (Figure 3.1b) using entropies. We have seen that post-selectionallows for the derivation of non-trivial entropic inequalities (3.12) in causal structures such asthe Bell scenario that do not support a classical-quantum gap in the absence of post-selection[215]. It is then natural to ask whether the non-classicality of a distribution can always bedetected through post-selected entropic inequalities such as those of Equation (3.12). For the ( d, d, , ) Bell scenarios with d ≥

2, this is known to be the case [44] in the following sense. Forevery non-classical distribution in the ( d, d, , ) Bell scenario, there is a transformation thatdoes not make any non-classical distribution classical, and such that the resulting distributionviolates one of the BC entropic inequalities. The main purpose in this chapter is to investigatewhether a similar result holds for non-binary outcomes. To do so, we need to specify a classof post-processing operations. The most general operations that we could consider are the non-classicality non-generating (NCNG) operations, i.e., those that do not map any classicaldistribution to a non-classical one. An interesting subset of these is the class of post-processingsachievable through local operations and shared randomness (LOSR) , which are physical in thesense that two separated parties with shared randomness could perform them. Because ofthe diﬃculty of dealing with arbitrary NCNG operations, for the majority of our analysis weconsider LOSR supplemented with the additional (NCNG) operation where the parties areexchanged (and convex combinations). We use LOSR+E to refer to this supplemented set.We study the ( , , , ) Bell scenario with LOSR+E post-processing operations, to see whetherwhen applied to any non-classical distribution the result violates an entropic Bell inequality.We investigate this using both Shannon and Tsallis entropies. Our motivation for considering In general NCNG operations used on the correlations prior to evaluating an entropic inequality need notbe physical in this sense. HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

Tsallis entropies is that they are known to provide an advantage over the Shannon entropyin detecting non-classicality in the absence of post-processing [214] in the sense that thereare non-classical distributions that violate Tsallis entropic inequalities but not the analogousShannon-entropic inequality. In the ( , , , ) case however, due to the result of [44], thisadvantage is less apparent when post-processings are considered. It is unclear whether or notthis is also the case for the ( , , , ) scenario, and hence we consider Tsallis entropies in thiswork. The present chapter is based on the paper [208] which is joint work with Roger Colbeck. The result of [44] that Shannon inequalities are always suﬃcient for detecting distributionsthat violate the CHSH inequality in the ( , , , ) scenario readily extends to the lifted-CHSHinequalities of the ( , , , ) scenario (c.f. Corollary 5.4.1). We summarise this result in Sec-tion 5.2. Due to Corollary 5.4.1, we are particularly interesting in the region containing non-classical distributions that satisfy all the CHSH-type inequalities in the ( , , , ) scenario.In Section 5.3, we ﬁnd the vertex description of the convex polytope containing non-signalingdistributions in the ( , , , ) scenario that satisfy all the lifted-CHSH inequalities. We showthat every non-classical distribution in this polytope violates exactly one I -type inequality(Proposition 5.3.1) and identify non-classical distributions in this region that violate entropicinequalities. Further, in Section 5.4, we ﬁrst consider a subset of the post-processing operations LOSR + E , that correspond to mixing with a classical distribution, and identify a family ofnon-classical distributions in Π { , , , } CHSH for which we conjecture that the non-classicality cannotbe detected through arbitrary Shannon entropic inequalities and a class of Tsallis entropic in-equalities. In Section 5.4.2, we show analytically that if these conjectures hold, then they alsoextend also the whole of

LOSR + E . This suggests that post-selected entropic inequalities,both of Shannon and Tsallis types, are in general insuﬃcient for detecting non-classicality inthe bipartite Bell causal structure. Following the trend set in the previous chapter, a poeticsummary of this chapter is provided before delving into the technical results. Recall, we found it hard to certify,That a causal structure must be described,By systems that obey non-classical physics,Based on the entropies of the observed statistics.But for some causal structures, we realise,That an additional technique does materialise.Post-selection provides a trick,To detect non-classicality that would otherwise escape our grip.But does it allow us to identify,All the non-classical correlations that can arise? HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

In some cases , there’s indeed a technique [44],But does it generalise? An answer we seek.We consider a set of operations O you can do,On correlations before their entropies are put to test,For a subset M of O , our computations give a clueThat the ﬁnal entropies fail to reveal non-classicality, even at their best.If our computations for M are true,We prove that they continue to be true for all of O ,Along the way, we get some related results too,Through a minor geometric detour.So it seems that post-selected entropic inequalities,Be it of Shannon or the Tsallis kind,Are insuﬃcient for detecting non-classicality,Even if we post-process our data before we try. ( , , , ) Bell scenario in en-tropy space

The current section summarises the relevant results of [44] regarding the suﬃciency of en-tropic inequalities in the ( , , , ) scenario. As previously mentioned, it is possible for anon-classical distribution to have the same entropy vector as a classical one and hence to be en-tropically classical. For example, the maximally non-classical distribution in probability space, P PR (Equation (3.5)) is entropically classical since it has the same entropy vector as the classicaldistribution P C = (5.1)and hence cannot violate any of the BC inequalities . However, the distribution P PR + P C maximally violates I ≤ Particularly, this refers toBell scenarios with output cardinality two,Here we consider,Cardinalities of three or larger. P PR and P C are related by a permutation of the entries in the bottom right 2 × HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

In [44], it was shown that such a procedure is possible for every non-classical distributionin the ( d, d, , ) Bell scenario with d ≥ ( , , , ) case this works as follows. First, one deﬁnes a special class ofdistributions, isotropic distributions , as follows for some k ∈ { , , ..., } ∶= [ ] and (cid:15) ∈ [ , ] . P k iso = (cid:15)P k PR + ( − (cid:15) ) P noise , (5.2)where P noise is white noise i.e., the distribution with all entries equal to 1 /

4. In the ( , , , ) Bell scenario the isotropic distribution P k iso is non-classical if and only if (cid:15) > /

2. The LOSRtransformation used in [44] involves ﬁrst transforming the observed distribution into an isotropicdistribution through a local depolarisation procedure that cannot generate non-classicality.Second, it is shown that for any non-classical isotropic distribution i.e., a P k iso with (cid:15) > / P k C such that the distribution P k E ,v = vP k iso + ( − v ) P k C violatesone of the BC entropic inequalities for suﬃciently small v >

0. In particular, the value of I k BC for P k E ,v can be expanded for small v as I k BC ≈ v ln 4 [ f ( (cid:15) ) − ( (cid:15) − ) ln v ] , (5.3)where f ( (cid:15) ) is a function of (cid:15) , independent of v (see [44] for details). Thus for any (cid:15) > / v arbitrarily small canmake I k BC positive which is a violation of the entropic inequality. We summarise the main resultof [44] for ( , , , ) Bell scenarios in the following Theorem (which is implicit in [44]).

Theorem 5.2.1.

For every non-classical distribution, P XY ∣ AB in the ( , , , ) Bell scenario,there exists an LOSR transformation T , such that T ( P XY ∣ AB ) violates one of the BC entropicinequalities (3.12) . One of the aims of the present chapter is to study whether this result extends to the case wherethe number of outcomes per party is more than two. In general, the ( , , d, d ) Bell polytopefor d > d = ( , , , ) scenario in probability space will behelpful. ( , , , ) Bell scenario in proba-bility space

In this section we compute the vertex description of the

CHSH-classical polytope , Π ( , , , ) CHSH , i.e.,the polytope whose facets are the 648 CHSH inequalities and the positivity constraints of the HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION ( , , , ) Bell scenario. As previously mentioned, this will be the main region of interest in theremainder of this work since the non-classicality of distributions not belonging to this regioncan always be certiﬁed using Shannon-entropic inequalities (Corollary 5.4.1). The followingresult allows us to signiﬁcantly speed up the vertex enumeration problem.

Proposition 5.3.1.

Every non-classical distribution in Π ( , , , ) CHSH violates only one I inequal-ity.Proof. Let i ∈ { , , . . . , } and consider the linear program that maximises the value of (cid:15) ≥ I ≤ I i ≤ (cid:15) , i.e., I − (cid:15) ≥ I i − (cid:15) ≥ i ∈ { , , . . . , } and check that in all cases either the output of this linear programis (cid:15) = I inequalities can be jointly saturated but not violated) orthat the program is infeasible (the two I inequalities cannot even be jointly saturated). Bysymmetry it follows that no pair of I inequalities can be simultaneously violated when allthe CHSH-type inequalities are satisﬁed.Note that in the ( , , , ) scenario there exist extremal non-signaling distributions that violatemultiple Bell inequalities. For example, the distributions P NL (Equation (3.8)) and P ∗ NL (Equa-tion (5.7)) violate I ≤ P NL violates it maximally. Bysymmetry, P NL also violates another I inequality. In addition, P NL violates the CHSH-typeinequality whose evaluation is equivalent to applying the output coarse-graining 0 ↦

0, 1 ↦ ↦ ( , , , ) scenariowhere there is a one-to-one correspondence between the extremal non-signaling vertices and theCHSH inequalities in the sense that each such vertex violates exactly one CHSH inequality. Due to the symmetries, all the vertices of Π ( , , , ) CHSH can be enumerated by ﬁrst ﬁnding all thevertices for which the I inequality of Equation (3.6) is saturated or violated i.e., I ≥ I ≥ ( , , , ) CHSH has 7425 vertices (including the 81 local deterministic vertices). Note that this correspondence breaks down in the ( , , , ) scenario where it is possible for a CHSH-typevertex to violate multiple CHSH-type inequalities (these correspond to the same 2-outcome CHSH inequalityafter coarse-graining). HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION ( , , , ) Bell scenario in entropyspace

We now investigate whether entropic inequalities are necessary and suﬃcient for non-classicalityin the ( , , , ) Bell scenario. In ( d, d, , ) scenarios with d ≥

2, only 2 d Shannon entropicinequalities are required for the Shannon entropic characterisation of the scenario [46, 90]; inthe ( , , , ) , these are the four inequalities of (3.12). It may at ﬁrst seem surprising that thesecan always be used to decide whether a distribution is classical because the number of extremalBell inequalities grows very rapidly in d in the ( d, d, , ) scenario [66], and deciding whethera distribution is classical is NP-complete [14]. The reduction in the number of inequalities inentropy space is compensated by the need to identify a suitable post-processing operation (ofwhich there are uncountably many possibilities) in order to detect violations.The ﬁrst observation is a corollary of Theorem 5.2.1. Corollary 5.4.1.

Let P XY ∣ AB be a distribution in the ( , , , ) Bell scenario that violatesat least one CHSH-type inequality. Then there exists an LOSR transformation T , such that T ( P XY ∣ AB ) violates one of the BC entropic inequalities (3.12) .Proof. For each CHSH-type inequality in the ( , , , ) scenario, there exists a coarse-grainingin which two of the outcomes are mapped to one (for each party and each input) such thatfor any initial distribution in the ( , , , ) scenario that violates the CHSH-type inequalitythe coarse-grained distribution violates one of the CHSH-inequalities in the ( , , , ) scenario.Hence, for the given P XY ∣ AB , after applying the corresponding coarse-graining for the violatedCHSH-type inequality, followed by the LOSR operation from Theorem 5.2.1 we violate one ofthe BC entropic inequalities.This corollary means that we can limit our analysis to Π ( , , , ) CHSH , the polytope in which all theCHSH inequalities are satisﬁed, and, in particular, the non-classical region of this. This is theregion in which one of the I inequalities is violated.In going from the ( , , , ) to ( , , , ) scenario, a new class of inequalities (the I inequali-ties) become relevant in probability space but the entropic characterisation remains unchanged,since entropic inequalities do not depend on the number of measurement outcomes. It is nat-ural to ask whether all non-classical distributions in the ( , , , ) scenario that satisfy all theCHSH inequalities cannot be certiﬁed entropically. However, this is not the case as shown bythe following proposition. Proposition 5.4.1.

The polytope Π ( , , , ) CHSH is not entropically classical. HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION q - - I BC, q Figure 5.1: Plot of the I ,q value as a function of the Tsallis parameter q forthe distribution p e (Equation (5.4) ). As seen from the plot, the distribution violatesthe BC inequality I ,q ≤ q values between 1 (Shannon case) and just under 1 . q > Proof.

Consider the distribution P e ∶=

150 21 0 0 21 0 00 2 0 1 1 011 0 16 0 1 2631 0 0 20 1 101 1 0 1 0 10 1 16 1 1 15 . (5.4)This is formed by mixing the non-local extremal point number 8 of Π ( , , , ) CHSH (see Table 5.1)with the three local deterministic points 18, 26 and 47 with respective weights 1 /

10, 3 /

10, 1 / /

5, and hence is in Π ( , , , ) CHSH . It achieves a I value of 0 . I ≤ I ≤ P e used in the previous propositionwas chosen for its relative simplicity. Interestingly, we ﬁnd that the Shannon entropic BCinequalities appear to give the largest violation among the Tsallis entropic inequalities for q ≥ P e . This can be seen in Figure 5.1.In light of Proposition 5.4.1, it is natural to ask whether the non-classicality of all distribu-tions in Π ( , , , ) CHSH can be detected through entropic inequalities. We ﬁnd numerical evidencethat suggests the contrary, i.e., that there are non-classical distributions in Π ( , , , ) CHSH whosenon-classicality cannot be detected through entropic inequalities using a general class of post-processing operations, and hence that these entropic inequalities are not suﬃcient for detectingnon-classicality in the ( , , , ) scenario. Before presenting these results, we brieﬂy overviewthe post-processing operations considered in this work. HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

Post-processing operations

In this chapter we study whether entropic inequalities canalways detect non-classicality in the ( , , , ) Bell scenario. In order to do so we could inprinciple consider applying any NCNG operation to the distribution prior to evaluating theentropic inequality. However, due to the diﬃculty in dealing with arbitrary NCNG operations,we consider the subset of these corresponding to LOSR+E instead. In [223] it was shown thatall LOSR operations can be generated by convex combinations of local deterministic operations.These can be thought of in the following way. Each party ﬁrst does a deterministic functionon their input, uses the result as the input to their device, then does a deterministic functionon their input and the output of their device to form the ﬁnal output. All such operationscorrespond to local relabellings and local coarse-grainings. Note that deterministic classicaldistributions can be formed as a special case of coarse-graining (a local deterministic distribu-tion is formed when each party coarse-grains all of their outputs to one output for each of theirinputs). For the distributions we consider for our main conjectures, it turns out that all thecoarse-grainings give rise to local distributions (c.f. Proposition 5.4.5), so, by considering mix-ing with deterministic classical distributions, local relabelling and exchange of parties we cancover all LOSR+E operations. We hence start by separately considering mixing with classicaldistributions, and then consider relabelling and exchange of parties.

Analogously to the ( , , , ) case, we can deﬁne a family of distributions P ( , , , ) iso ,(cid:15) = (cid:15)P NL +( − (cid:15) ) P ( , , , ) noise , where P ( , , , ) noise is the uniform distribution with all entries equal to 1 / (cid:15) ∈ [ , ] . This class of distributions is isotropic in the sense that the marginal distributions areuniform for each input of each party. In order to show the insuﬃciency of entropic inequalities,one needs to identify at least one non-classical distribution whose non-classicality cannot bedetected through entropic inequalities. We will discuss this for the class P ( , , , ) iso ,(cid:15) and onlyconsider distributions of this form in the rest of the chapter. Further, without loss of generality,we consider only the BC inequality I ≤ P NL and thecorresponding BC inequalities). In the entropic picture of the ( , , , ) scenario, the 4 BC Inequalities (3.12) still hold (theseare valid independently of the cardinality of the random variables). Again, analogously to the ( , , , ) case, the maximally non-local distribution, P NL has the same entropy vector as the HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION classical distribution P ( , , , ) C =

00 0

00 0 (5.5)(amongst others). The distribution P NL is hence entropically classical. However, in contrast tothe ( , , , ) case, we have evidence suggesting that there are values of (cid:15) for which P ( , , , ) iso ,(cid:15) is non-classical, but such that the mixture vP ( , , , ) iso ,(cid:15) + ( − v ) P L is entropically classical forall classical distributions P L and all v ∈ [ , ] , i.e., there exist non-classical distributions in the ( , , , ) scenario for which mixing with classical distributions never gives rise to a non-classicalentropy vector.We begin by considering mixing P ( , , , ) iso ,(cid:15) with P ( , , , ) C in analogy with the treatment of the ( , , , ) case. Although we have not fully proven this, from our numerics, this mixing appearsto be optimal in the sense that when it does not allow for entropic violations, no other mixingcan either. This allows us to identify a range of (cid:15) for which the mixture P ( , , , ) iso ,(cid:15) is non-classical, yet appears to remain entropically classical even when mixed with arbitrary classicaldistributions. We begin with two propositions whose proofs can be found in Appendix 5.6.3. Proposition 5.4.2. P ( , , , ) iso ,(cid:15) is non-classical if and only if (cid:15) > / . Further, for (cid:15) ≤ / , P ( , , , ) iso ,(cid:15) satisﬁes all the CHSH-type inequalities, while for (cid:15) > / it violates at least one CHSH-type inequality. By analogy with the ( , , , ) case, we consider the violation of I attainable by mixing P ( , , , ) iso ,(cid:15) with P ( , , , ) C . We ﬁnd that for (cid:15) ∈ ( / , / ] , P ( , , , ) iso ,(cid:15) is non-classical but does notviolate any of the BC inequalities. As shown in the above proposition, these distributions arein the CHSH-classical polytope Π ( , , , ) CHSH and hence lie in our region of interest.

Proposition 5.4.3.

Corollary 5.4.2.

For (cid:15) ≤ / , P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C is entropically classical forall v ∈ [ , ] .Proof. This follows from Proposition 5.4.3 and Lemma 3.2.1.While Proposition 5.4.3 shows that the proof strategy of [44] does not directly generalize toall non-classical distributions in the ( , , , ) case, it does not rule out the possibility that HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION there may exist other mixings with classical distributions that could transform P ( , , , ) iso ,(cid:15) for (cid:15) ∈ ( / , / ] into a distribution that violates one of the BC inequalities. To investigate this,we can consider the polytope formed by mixing P ( , , , ) iso ,(cid:15) with classical distributions for some (cid:15) ≤ /

7, i.e., the polytope Conv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k ) , where { P L ,k } k denotes the set of all(81) local deterministic vertices of the ( , , , ) Bell-local polytope, { P ( , , , ) iso ,(cid:15) } is a set with asingle element and Conv () denotes the convex hull. We considered several values of (cid:15) ≤ / I over the non-classical region of this polytope for each and were unable to ﬁnd violations. The optimization involves a non-linear objectivefunction with linear constraints. Hence, it is possible that the numerical approach missed theglobal optimum. Nevertheless, this is evidence for the following conjecture and is presentedin more detail in Appendix 5.6.2. Proposition 5.4.3 along with Figures 5.2 and evidence inAppendix 5.6.2 also suggest this conjecture. Conjecture 5.4.1.

Let (cid:15) ≤ / . For all mixtures of the distribution P ( , , , ) iso ,(cid:15) with classicaldistributions in the ( , , , ) Bell scenario, the resulting distribution is entropically classical,i.e., all distributions in

Conv ({ P ( , , , ) iso ,(cid:15) } ⋃{ p L ,k } k }) are entropically classical. The interesting cases of Conjecture 5.4.1 are for non-classical distributions (i.e., for (cid:15) > / Remark 5.4.1.

There exist non-classical quantum distributions that lie in the polytopeConv ({ P ( , , , ) iso ,(cid:15) = / } ⋃{ P L ,k } k }) and in this case, our results suggest that the non-classicality of thecorresponding distributions cannot be detected through entropic inequalities. Let P QM be thequantum distribution from [64, Equation (14) with d =

3] with Bob’s inputs relabelled. Thisviolates I ≤ P ( , , , ) C . Consider then mixing P QM with uniform noise P ( , , , ) noise to obtain P mix ( u ) ∶= uP QM + ( − u ) P ( , , , ) noise ( u ∈ [ , ] ). We found that for some valuesof u (e.g., u = / P mix ( u ) is non-classical. Further, P mix ( u ) is quantum achievable since itcan be obtained from the density operator u ∣ ψ ′ ⟩⟨ ψ ′ ∣ + ( − u ) I (where ∣ ψ ′ ⟩ is the two qutrit stateproducing P QM ) and the same quantum measurements that produce P QM from ∣ ψ ′ ⟩ . Given the results (Proposition 5.4.3 and Conjecture 5.4.1) of the previous section for Shannonentropic inequalities, a natural question is whether other entropic measures can provide anadvantage over the Shannon entropy in detecting non-classicality. Here, we look at Tsallisentropies and ﬁnd that similar results hold in this case as well, suggesting that Tsallis entropiesalso do not allow us to completely solve the problem. To restrict to the non-classical region, it is suﬃcient to mix with a subset of these 81 locals— see Ap-pendix 5.6.2 for more detail. Note however that it is enough that these results hold for one value of (cid:15) ∈ ( / , / ] in order to concludethat entropic inequalities are not suﬃcient for detecting non-classicality in this scenario. HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

The properties of monotonicity, strong-subadditivity and the chain rule are suﬃcient to derivethe BC inequalities, which hence also hold for Tsallis entropy when q ≥

1. Other generalizedentropies such as Rényi or min/max entropies do not satisfy one or more of these properties ingeneral and it is not clear whether the analogues of (3.12) hold for these. In the Rényi case,we saw in Appendix 4.5.3 that no analogous inequalities were obtained. Hence we focus on thecase of Tsallis entropies in the remainder of this chapter, where for all q ≥ I ,q = S q ( X Y ) + S q ( X ) + S q ( Y ) − S q ( X Y ) − S q ( X Y ) − S q ( X Y ) ≤ I ,q = S q ( X Y ) + S q ( X ) + S q ( Y ) − S q ( X Y ) − S q ( X Y ) − S q ( X Y ) ≤ I ,q = S q ( X Y ) + S q ( X ) + S q ( Y ) − S q ( X Y ) − S q ( X Y ) − S q ( X Y ) ≤ I ,q = S q ( X Y ) + S q ( X ) + S q ( Y ) − S q ( X Y ) − S q ( X Y ) − S q ( X Y ) ≤ S (Equation (3.11)). We say that a distribution is q -entropicallyclassical if its entropy vector written in terms of the Tsallis entropy of order q is achievableusing a classical distribution. In the case of the Shannon entropy, we used the fact (Lemma 3.2.1)that the BC Inequalities (3.12) are known to be necessary and suﬃcient for entropic classicalityfor 2-input Bell scenarios [90]. However, it is not clear if the result of [90] generalises to Tsallisentropies for q >

1. Thus our results in the Tsallis case are weaker than those for Shannon, beingstated only for the BC inequalities. We leave the generalization to arbitrary Tsallis entropicinequalities as an open problem.

Proposition 5.4.4.

For (cid:15) ≤ / , P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C does not violate any ofthe Tsallis BC inequalities (5.6) for any v ∈ [ , ] and q > . However, for (cid:15) > / and every q > , there always exists a v ∈ [ , ] such that the entropic inequality I ,q ≤ is violated by P ( , , , ) E ,(cid:15),v . We refer the reader to Appendix 5.6.3 for a proof of this Proposition. To investigate theextension to other mixings, we tried the same computational procedure (see Appendix 5.6.2)as in the Shannon case. We found no violation of the Tsallis entropic BC inequalities for anymixings of P ( , , , ) iso ,(cid:15) with classical distributions, for several values of q > (cid:15) ∈ ( / , / ] ,leading to the following conjecture, which is similar to Conjecture 5.4.1. Conjecture 5.4.2.

Let (cid:15) ≤ / . For all mixtures of the distribution P ( , , , ) iso ,(cid:15) with classical distri-butions in the ( , , , ) Bell scenario, the resulting distribution does not violate any of the Tsal-lis entropic BC inequalities for any q > , i.e., all distributions in Conv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k ) satisfy the Tsallis entropic BC Inequalities (5.6) for all q > . Figure 5.2a, shows the values of (cid:15) and v for which I ,q (for q = , ,

8) evaluated with P ( , , , ) E ,(cid:15),v is positive, which is also suggestive of this conjecture. Remark 5.4.2.

Any impossibility result for the ( , , , ) scenario also holds in the ( , , d, d ) case for d > ( , , , ) scenario has a corresponding distribution in all the ( , , d, d ) scenarios with d > HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION which can be obtained by assigning a zero probability to the additional outcomes. Further, theentropic Inequalities (3.12) remain the same for all these scenarios as they do not depend onthe cardinality of the random variables involved. Thus the existence non-classical distributionsfor the d = d > So far, we only considered mixing with classical distributions to obtain entropic violations andgave evidence that this does not work for some non-classical distributions in the ( , , , ) scenario. This motivates us to study whether using arbitrary LOSR+E operations allows usto detect this non-classicality through entropic violations. We show in this section that ifConjectures 5.4.1 and 5.4.2 hold then they also hold for all LOSR+E operations. First considerthe following example.The maximum possible violation of the BC inequalities in the ( , , , ) case is I = ln 2 [44].This is derived by considering only Shannon inequalities within the coexisting sets, and thebound that the maximum entropy of a binary variable is ln 2. An analogous proof holds in the ( , , , ) case, except that the bound is then ln 3. In the former case we have P E ,(cid:15) = ,v = / = P PR + P C , which maximally violates I ≤

0, while in the latter case, one such distribution isformed by ( P NL + P ∗ NL + P ( , , , ) C )/

3, where P ∗ NL is another extremal non-local distribution: P ∗ NL =

00 0 . (5.7)Since the equal mixture ( P NL + P ( , , , ) C )/ I ≤ P NL = ( P NL + P ∗ NL )/ P NL in the deﬁnition of P ( , , , ) iso ,(cid:15) , i.e., to take ˜ P ( , , , ) iso ,(cid:15) = (cid:15) ˜ P NL + ( − (cid:15) ) P ( , , , ) C . One could then consider whether for (cid:15) ∈ ( / , / ] , ˜ P ( , , , ) E ,(cid:15),v = v ˜ P ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C violates I ≤

0. Interestingly, while ˜ P ( , , , ) E ,(cid:15),v violates I ≤ v values whenever (cid:15) > /

7, it does not give any violation (for any value of v ) when (cid:15) ≤ /

7, and Propositions 5.4.2and 5.4.3 also hold if ˜ P ( , , , ) iso ,(cid:15) replaces P ( , , , ) iso ,(cid:15) (see Figure 5.2b for an illustration). Thecorresponding results also hold for the Tsallis case with q >

1, i.e., Proposition 5.4.4 alsoholds with ˜ P ( , , , ) iso ,(cid:15) replacing P ( , , , ) iso ,(cid:15) (see also Figure 5.2). These suggest that mixing withrelabellings in addition to mixing with classical distributions may also not help to violateentropic inequalities when (cid:15) ≤ / HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION (a) (b)

Figure 5.2:

The regions in the v − (cid:15) plane where the Shannon entropic inequality I , ∶= I ≤ I ,q ≤ q = q = P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C (b) ˜ P ( , , , ) E ,(cid:15),v = v ˜ P ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C . Here P ( , , , ) iso ,(cid:15) = (cid:15)P NL + ( − (cid:15) ) P ( , , , ) noise and˜ P ( , , , ) iso ,(cid:15) = (cid:15) ( / P NL + / P ∗ NL ) + ( − (cid:15) ) P ( , , , ) noise . For both (a) and (b) I ,q ≤ (cid:15) ≤ / ≈ . (cid:15) > /

7, there is a violation of this inequality for alarger range of v values in the latter case, and also for a larger range in the q = In the remainder of this section we consider the full set of LOSR+E operations. We ﬁrstnote that all input coarse-grainings of P ( , , , ) iso ,(cid:15) result in local distributions (there are no Bellinequalities if either of the parties have only one input). Similarly, considering output coarse-grainings, whenever three outcomes are mapped to one the resulting distribution is alwaysclassical because there are no Bell inequalities if one party always makes a ﬁxed outcome forone of their inputs. We henceforth only consider coarse-grainings that take two outcomes toone. We can choose two of the three outcomes to combine into one for each party and eachlocal input. For the four input choices { A = , A = , B = , B = } , there are 81, 108, 54 and 12distinct coarse-grainings of this type when the outcomes of either 4, 3, 2 or 1 input choices arecoarse-grained. Thus there are a total of 255 coarse-grainings that remain.If we apply all such coarse-grainings to P ( , , , ) iso ,(cid:15) , this generates 255 possible distributions thatwe denote { P CG ,i(cid:15) } i , i ∈ [ ] . There are also 432 distinct local relabellings of P ( , , , ) iso ,(cid:15) , whichwe denote by { P R ,j(cid:15) } j , j ∈ [ ] (this set includes P ( , , , ) iso ,(cid:15) = P R , (cid:15) ). Due to symmetries of P ( , , , ) iso ,(cid:15) , it turns out that exchanging parties can be achieved through local relabellings for thesedistributions, so we do not need to separately consider the exchange in our results pertaining to P ( , , , ) iso ,(cid:15) . The set of all distributions that can be achieved through a convex mixture of P ( , , , ) iso ,(cid:15) with its coarse-grainings, relabellings and classical distributions is a convex polytope Π (cid:15) for HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION each (cid:15) and is the convex hull of these 255 + + =

768 points, i.e.,Π (cid:15) ∶= Conv ({ P CG ,i(cid:15) } i ⋃{ P R ,j(cid:15) } j ⋃{ P L ,k } k }) . We present the results for the remaining coarse-grainings and relabellings separately below.Firstly, we show that the coarse-grainings { P CG ,i(cid:15) } i of P ( , , , ) iso ,(cid:15) are classical if and only if (cid:15) ≤ / Proposition 5.4.5.

The distribution P CG ,i(cid:15) is classical for all i if and only if (cid:15) ≤ / . This is intuitive because P ( , , , ) iso ,(cid:15) satisﬁes all the CHSH-type inequalities if and only if (cid:15) ≤ / I requires three outcomes, after coarse-graining the only relevantthing is whether there is a CHSH-violation. A full proof is given in Appendix 5.6.3.Proposition 5.4.5 implies that Π (cid:15) = Conv ({ P R ,j(cid:15) } j ⋃{ P L ,k } k ) ∀ (cid:15) ≤ /

7, and that it is not nec-essary to consider coarse-grainings for such values of (cid:15) . Our next results are that if Conjec-tures 5.4.1 and 5.4.2 hold for P ( , , , ) iso ,(cid:15) for (cid:15) ≤ /

7, then they continue to hold even when weconsider arbitrary convex combinations with classical distributions and local relabellings of P ( , , , ) iso ,(cid:15) . Proposition 5.4.6.

Let (cid:15) ≤ / . If Conjecture 5.4.1 holds, then every distribution in Π (cid:15) isShannon entropically classical. Proposition 5.4.7.

Let (cid:15) ≤ / . If Conjecture 5.4.2 holds, then every distribution in Π (cid:15) satisﬁesthe Tsallis entropic BC Inequalities (5.6) ∀ q > . These are proven in Appendix 5.6.3 and give the following corollary.

Corollary 5.4.3.

Let (cid:15) ≤ / . If Conjectures 5.4.1 and 5.4.2 hold, then for any operation O inLOSR+E, O( P ( , , , ) iso ,(cid:15) ) does not violate a Shannon or Tsallis ( q > ) entropic BC inequality. We have provided evidence that there are distributions in the ( , , d, d ) ( d ≥

3) scenarios forwhich arbitrary LOSR+E operations do not enable detection of non-classicality with any Shan-non entropic inequalities or Tsallis entropic BC inequalities. This is in contrast to the ( , , , ) case [44], where non-classicality can always be certiﬁed using the Shannon BC inequalities andLOSR post-processings. The region where the BC inequalities can fail to detect non-classicalitywould contain non-classical distributions that satisfy all CHSH-type Bell inequalities. We foundall the vertices that characterize this region for the d = ( , , , ) scenario that cannot be certiﬁed through entropic inequalities under LOSR+E post-processings is not characterized by the CHSH-type inequalities. HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

Although we considered LOSR+E operations, a natural next question is to what extent theresults can be generalized to more general NCNG operations. In particular, there could bea non-linear NCNG map that allows the entropic BC inequalities to detect a wider range ofnon-classical distributions. An interesting open question would be to see whether for any non-classical distribution of the form P ( , , , ) iso ,(cid:15) with 1 / < (cid:15) ≤ / ( , , , ) and ( , , , ) Bell scenarios. In the presence of LOSR+E operations, we did not ﬁnd any advantage of Tsallisentropies over the Shannon entropy in the ( , , , ) Bell scenario. In fact, for some non-classicaldistributions such as that of Equation (5.4), the Shannon entropic inequalities appear to givethe largest violations among Tsallis entropies with q ≥ . On the other hand, for the familyof distributions P ( , , , ) iso ,(cid:15) , our results suggest that the range of (cid:15) for which post-processing viamixing with classical distributions enables non-classicality detection is the same for the Shannonas well as Tsallis entropic BC inequalities for q >

1. However, when entropic detection of non-classicality is possible, using Tsallis entropy can make it easier to do this detection in the sensethat there is a wider range of mixings that achieve this (see Figure 5.2). In conclusion, while the entropic approach for detecting non-classicality is useful in a numberof scenarios, it is known to have disadvantages in others. In particular, in the absence of post-selection we are not aware of any cases where entropic inequalities can be violated [215, 217, 207].We found that the entropic approach also suﬀers drawbacks in the presence of post-selection asit may fail to detect non-classicality under a natural class of post-processing operations, bothin the case of Shannon and Tsallis entropies. However, this method remains of use since inmany cases non-classicality can be detected using it. V -representation of the polytope Π ( , , , ) CHSH

Table 5.1 enumerates the 47 extremal points of the polytope Π ( , , , ) CHSH that saturate or violate theinequality I ≤ ( , , , ) Bell Which corresponds to those for which the BC inequalities can be derived in the classical case As a speciﬁc example, consider the distribution P ( , , , ) E ,(cid:15) = . ,v = . . Figure 5.2a indicates that this distribu-tion violates the Tsallis entropic inequality I ,q = ≤ I ≤

0. However, we can always further mix P ( , , , ) E ,(cid:15) = . ,v = . with the classical distribution P ( , , , ) C to obtain0 . P ( , , , ) E ,(cid:15) = . ,v = . + . P ( , , , ) C = . P ( , , , ) iso ,(cid:15) + . P ( , , , ) C which violates the Shannon entropic inequality I ≤

0. This is also in agreement with the results of [214], since when mixing is not considered, we also ﬁndexamples where it is advantageous to use Tsallis entropy in the ( , , , ) scenario. HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION scenario. The ﬁrst 17 of these are non-classical while the remaining 30 are local deterministicvertices. Due to Proposition 5.3.1 and the symmetries of the scenario, the remaining verticesof Π ( , , , ) CHSH can be generated by taking the orbit of these vertices under local relabellings andexchange of parties. In Table 5.1, each extremal point is given by a single 36 dimensional vectorwhich corresponds to writing the point in the notation explained in Section 3.1 (a 6 × In order to check for violations of the Shannon and Tsallis entropic inequalities I ≤ I ,q ≤ P ( , , , ) iso ,(cid:15) (Equation (3.8)) with arbitraryclassical distributions, we maximized the left hand sides of these inequalities over the poly-tope Conv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k ) for some (cid:15) values in ( / , / ] (such as (cid:15) = / , /

9) nu-merically using

Mathematica . Note that this polytope contains the local polytope whereby deﬁnition, entropic inequalities cannot be violated. Thus we can simplify the optimiza-tion and increase its reliability by only optimizing over the non-classical part of the poly-tope Conv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k ) . We ﬁnd this region as follows. For 1 / < (cid:15) ≤ /

7, we knowfrom Proposition 5.4.2 that P ( , , , ) iso ,(cid:15) is non-classical but does not violate any of the CHSHinequalities, while it violates I ≤ P ( , , , ) iso ,(cid:15) violates for this range of (cid:15) . Thus, the non-classical part of the polytopeConv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k ) is the convex hull of P ( , , , ) iso ,(cid:15) and all the local deterministic pointsthat satisfy I =

2. These are the 30 local deterministic points of Table 5.1. Hence we onlyneed to optimise over convex combinations of P ( , , , ) iso ,(cid:15) with these 30 points and not all 81 lo-cal deterministic points, which reduces the size of the optimization (number of variables) andincreases the chances of it being eﬀective in detecting entropic violations if there are any.Performing the optimization as outlined above, we found the maximum value to always benon-positive for both Shannon case and the Tsallis case with q = . , , , ,

50. We obtainedsimilar results when taking other values of (cid:15) ≤ / P ( , , , ) iso ,(cid:15) and also whenconsidering the inequalities I i BC ,q for i ∈ { , , } . This suggests that no point in the polytopeConv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k ) violates any of the (Shannon or Tsallis entropic) BC inequalitiesfor (cid:15) ≤ /

7. For (cid:15) > /

7, some mixing of P ( , , , ) iso ,(cid:15) with P ( , , , ) C gives a distribution that violates I ,q ≤ ∀ q ≥ q =

1) case, the range ofvalues of the mixing parameter v for which a violation can be found becomes arbitrarily smallas (cid:15) approaches 4 / q close to 1. For instance, in the Shannon case our program was not able to detect violationsof I ≤ (cid:15) < . / (cid:15) ≥ . /

7. Similarly, for the q = I , ≤ (cid:15) ≥ . /

7, but not below. The reason for this diﬀerence is in line with what one mightexpect by comparing the plots in Figure 5.2, where for (cid:15) > / v for whicha violation is possible is larger in the q = / ≤ (cid:15) < / HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

Number Vertex1 (1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 2, 0, 1, 1, 0, 0, 0, 2, 0, 1, 1, 2, 0, 0)2 (1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 2, 0, 0, 0, 1, 1, 0, 2, 0, 1, 0, 1, 0, 0, 2, 1, 1, 0)3 (1, 1, 0, 2, 0, 0, 0, 1, 1, 0, 2, 0, 1, 0, 1, 0, 0, 2, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0)4 (2, 0, 0, 1, 0, 1, 0, 2, 0, 1, 1, 0, 0, 0, 2, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0)5 (1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 2, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0)6 (1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 2, 1, 1, 0)7 (1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 2, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0)8 (1, 0, 0, 1, 0, 0, 0, 2, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0)9 (1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0)10 (1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 2, 0, 1, 1, 1, 1, 0)11 (1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 2, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0)12 (1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 2, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0)13 (1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0,1, 0, 1, 1, 2, 0, 0)14 (1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 2, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0)15 (1, 1, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0)16 (2, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0)17 (2, 1, 0, 2, 0, 1, 0, 2, 1, 1, 2, 0, 1, 0, 2, 0, 1, 2, 2, 0, 1, 0, 2, 1, 1, 2, 0, 1, 0, 2, 0, 1, 2, 2, 1, 0)18 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1)19 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0)20 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)21 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0)22 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)23 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0)24 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)25 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)26 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)27 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)28 (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0)29 (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)30 (0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0)31 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)32 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0)33 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0)34 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)35 (0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0)36 (0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)37 (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)38 (0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0)39 (0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)40 (0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0)41 (0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)42 (1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)43 (1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)44 (1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)45 (1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0)46 (1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)47 (1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) Table 5.1: The vertices of Π ( , , , ) CHSH that saturate or violate the I Inequality (3.6) . Allthe vertices of the polytope can be obtained from the vertices listed here through local relabellings orexchange of parties.

HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

However, because of the form of our objective function, the optimisation methods available donot guarantee to ﬁnd the global maximum. Thus, our ﬁndings only constitute evidence forthe conjectures and are not conclusive. In general, ﬁnding global optima for non-linear, non-convex/concave functions is an open question. A potential avenue for proving these conjecturesis using DC (diﬀerence of convex) programming [118] since our objective function being a linearcombination of entropies is a diﬀerence of convex functions.

For the proofs we need the concept of the local weight of a non-signaling distribution [229, 66]

Deﬁnition 5.6.1.

The local weight of a non-signaling distribution P XY ∣ AB is the largest α ∈[ , ] such that we can write P XY ∣ AB = αQ L XY ∣ AB + ( − α ) Q NL XY ∣ AB , where Q L XY ∣ AB is an arbitrary local distribution and Q NL XY ∣ AB is an arbitrary non-signaling dis-tribution. We denote the local weight by l ( P XY ∣ AB ) .The local weight of a distribution can be found by linear programming. Proposition 5.4.2. P ( , , , ) iso ,(cid:15) is non-classical if and only if (cid:15) > / . Further, for (cid:15) ≤ / , P ( , , , ) iso ,(cid:15) satisﬁes all the CHSH-type inequalities, while for (cid:15) > / it violates at least one CHSH-type inequality.Proof. The distribution P ( , , , ) iso ,(cid:15) = (cid:15)P NL + ( − (cid:15) ) P ( , , , ) noise can be written as follows. P ( , , , ) iso ,(cid:15) = A B BA B BB A BB A BB B AB B AA B BB A BB A BB B AB B AA B B (5.8)where A = ( (cid:15) + )/ B = ( − (cid:15) )/

9. We used the

LPAssumptions linear program solver [62](see Appendix 4.5.2 of Chapter 4 for details) to ﬁnd the local weight of P ( , , , ) iso ,(cid:15) , as a functionof (cid:15) to be l ( P ( , , , ) iso ,(cid:15) ) = ⎧⎪⎪⎨⎪⎪⎩ ≤ (cid:15) ≤ ( − (cid:15) ) < (cid:15) ≤ P ( , , , ) iso ,(cid:15) and determining that each has a saturating (cid:15) of at most 4 / HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

Proposition 5.4.3.

For (cid:15) ≤ / , P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C does not violate any ofthe BC entropic inequalities (3.12) for any v ∈ [ , ] . However, for all (cid:15) > / , there exists a v ∈ [ , ] such that the entropic inequality I ≤ is violated by P ( , , , ) E ,(cid:15),v .Proof. Consider the function f ∶ ( , ) × ( , ) → R given by f ( (cid:15), v ) ∶= ( − ( − (cid:15) ) v ) ln [ − ( − (cid:15) ) v ] + ( − (cid:15) ) v ln [( − (cid:15) ) v ] − ( + (cid:15) ) v ln [( + (cid:15) ) v ]− ( − ( + (cid:15) ) v ) ln [ − ( + (cid:15) ) v ] − [ , ]×[ , ] by taking the relevant limit. The Shannonentropic expression I ( (cid:15), v ) evaluated for the distribution P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C (seen as a function of (cid:15) and v ) is then given as I =

13 ln [ ] f ( (cid:15), v ) Thus all the following arguments for f ( (cid:15), v ) also hold for I .We ﬁrst use that for c > a ∈ R for suﬃciently small v we haveln [ c + av ] = ln [ c ] + avc + O ( v ) . Using this we can expand f ( (cid:15), v ) for small v as f ( (cid:15), v ) = ( − (cid:15) ) v ln [ v ]−( − (cid:15) ) v ( + ln [ ])+ v ( ( − (cid:15) ) ln [ − (cid:15) ]−( + (cid:15) ) ln [ + (cid:15) ])+ O ( v ) . (5.9)Thus, since lim v → v ln [ v ] = v → f [ (cid:15), v ] = ∂∂v f ( (cid:15), v ) = ( − (cid:15) ) ln [ v ] + ( − (cid:15) ) ln [ − (cid:15) ] − ( + (cid:15) ) ln [ + (cid:15) ]− ( − (cid:15) ) ln [ − ( − (cid:15) ) v ] + ( + (cid:15) ) ln [ − ( + (cid:15) ) v ] . (5.10)Note that 5 ( − (cid:15) ) ln [ − (cid:15) ] ≤ −( + (cid:15) ) ln [ + (cid:15) ] ≤ ∀ (cid:15) ∈ [ , ] . Further, since 3 − ( − (cid:15) ) v ≥ − ( + (cid:15) ) v and both terms are positive, 6 ( − (cid:15) ) ≥ ( + (cid:15) ) ∀ (cid:15) < /

7, and using the fact that ln [] isan increasing function, we have − ( − (cid:15) ) ln [ − ( − (cid:15) ) v ] + ( + (cid:15) ) ln [ − ( + (cid:15) ) v ] ≤ ∀ (cid:15) ∈ [ , / ] , v ∈ [ , ] . This in turn implies that ∂∂v f ( (cid:15), v ) ≤ ( − (cid:15) ) ln [ v ] ∀ ≤ (cid:15) ≤ / , ≤ v ≤ (cid:15) ≤ / ∂∂v f ( (cid:15), v ) < v ∈ [ , ] . Thus, f ( (cid:15), v ) is zeroat v = (cid:15) ≤ /

7, decreases with v , implying that f ( (cid:15), v ) ≤ ∀ (cid:15) ∈ [ , / ] , v ∈ [ , ] .Note that P ( , , , ) E ,(cid:15),v = vp ( , , , ) iso ,(cid:15) + ( − v ) p ( , , , ) C does not violate any of the analogous inequalities I i BC ≤ i ∈ { , , } , (cid:15), v ∈ [ , ] . This is because for this distribution, we always HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION have H ( X Y ) = H ( X Y ) = H ( X Y ) , H ( X ) = H ( X ) = H ( Y ) = H ( Y ) . Thus all threeinequalities I ≤ I ≤ I ≤ H ( X ) + H ( Y ) − H ( X Y ) − H ( X Y ) ≤ H ( X ) ≤ H ( X Y ) and H ( Y ) ≤ H ( X Y ) by the monotonicityof Shannon entropy.Further, using the expression for the derivative of f ( (cid:15), v ) with respect to v in Equation (5.10),we ﬁnd that for (cid:15) > /

7, lim v → ∂∂v f ( (cid:15), v ) = ∞ . Thus, since f ( (cid:15), v ) = v =

0, suﬃciently closeto v = v such that f ( (cid:15), v ) >

0. This proves the claim.

Proposition 5.4.4.

For (cid:15) ≤ / , P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C does not violate any ofthe Tsallis BC inequalities (5.6) for any v ∈ [ , ] and q > . However, for (cid:15) > / and every q > , there always exists a v ∈ [ , ] such that the entropic inequality I ,q ≤ is violated by P ( , , , ) E ,(cid:15),v .Proof. The Tsallis entropic expression I ,q ( (cid:15), v ) evaluated for the distribution P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C (seen as a function of q , (cid:15) and v ) is given as I ,q = q − ⎡⎢⎢⎢⎢⎣ ⎛⎝ − ( − (cid:15) ) v ⎞⎠ q + ⎛⎝ ( − (cid:15) ) v ⎞⎠ q − q − ⎛⎝ − ( + (cid:15) ) v ⎞⎠ q − ⎛⎝ ( + (cid:15) ) v ⎞⎠ q ⎤⎥⎥⎥⎥⎦=∶ g ( q, (cid:15), v ) q − q >

1, the following arguments for g ( q, (cid:15), v ) also hold for I ,q . Note that ∂∂v g ( q, (cid:15), v ) = q q − [ − ( − (cid:15) )( − ( − (cid:15) ) v ) q − + ( − (cid:15) )(( − (cid:15) ) v ) q − + ( + (cid:15) )( − ( + (cid:15) ) v ) q − − ( + (cid:15) )(( + (cid:15) ) v ) q − ] . (5.11)Then, since 6 ( − (cid:15) )( − ( − (cid:15) ) v ) q − ≥ ( + (cid:15) )( − ( + (cid:15) ) v ) q − and ( + (cid:15) )(( + (cid:15) ) v ) q − ≥ ( − (cid:15) )(( − (cid:15) ) v ) q − hold for all (cid:15) ≤ / v ∈ [ , ] and q >

1, we have ∂∂v g ( q, (cid:15), v ) ≤ ∀ (cid:15) ≤ / , v ∈ [ , ] , q > . Since g ( q, (cid:15), v = ) =

0, this implies that g ( q, (cid:15), v ) ≤ ∀ (cid:15) ≤ / , v ∈ [ , ] , q >

1. Hence, for (cid:15) ≤ / I ,q ≤ v ∈ [ , ] , q > P ( , , , ) E ,(cid:15),v = vP ( , , , ) iso ,(cid:15) + ( − v ) P ( , , , ) C also does not violateany of the inequalities I ,q ≤ I ,q ≤ I ,q ≤ (cid:15), v ∈ [ , ] and q > HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION

Further, Equation (5.11) implies lim v → ∂∂v g ( q, (cid:15), v ) = q q ( (cid:15) − ) which is always positive for (cid:15) > /

7. Since g ( q, (cid:15), v ) = v =

0, this allows us to conclude that for (cid:15) > /

7, there exists a v suﬃciently close to v = g ( q, (cid:15), v ) >

0. This establishes the claim.

Proposition 5.4.5.

The distribution P CG ,i(cid:15) is classical for all i if and only if (cid:15) ≤ / .Proof. For the “if” part of the proof, we calculated all the 255 coarse-grainings of P ( , , , ) iso ,(cid:15) = / andused a linear programming algorithm to ﬁnd that all of these are local (their local weight equals1). Since decreasing (cid:15) in P ( , , , ) iso ,(cid:15) cannot increase the violation of any (probability space) Bellinequality and neither can coarse-grainings, it follows that if (cid:15) ≤ /

7, all coarse-grainings of P ( , , , ) iso ,(cid:15) are classical.For the “only if” part we need to show that if all coarse-grainings of P ( , , , ) iso ,(cid:15) are classical then (cid:15) ≤ / (cid:15) > /

7, there exists at least one coarse-graining of P ( , , , ) iso ,(cid:15) that isnon-classical. Consider the coarse-graining that involves combining the second output with theﬁrst for all 4 input choices. For P ( , , , ) iso ,(cid:15) as in Equation (5.8), this coarse-graining gives P ( , , , ) CG ,(cid:15) = ( A + B ) B ( A + B ) B B A B A ( A + B ) B B + A A + B B A A + B B , (5.12)where A = ( (cid:15) + )/ B = ( − (cid:15) )/

9. The I value or left hand side of Equation (3.6) forthis distribution is 9 A − B . For this to be classical, we require that 9 A − B ≤ (cid:15) ≤ /

7. Again using the

LPAssumptions linear program solver [62] we found the local weightof this distribution as a function of (cid:15) , which gives the following. l ( P ( , , , ) CG ,(cid:15) ) = ⎧⎪⎪⎨⎪⎪⎩ , ≤ (cid:15) ≤ . ( − (cid:15) ) , < (cid:15) ≤ . In other words, if (cid:15) > /

7, then the coarse-graining P ( , , , ) CG ,(cid:15) of P ( , , , ) iso ,(cid:15) violates the I In-equality (3.6) and is hence non-classical. This concludes the proof.We prove the following two propositions together as they only diﬀer in one step.

Proposition 5.4.6.

Let (cid:15) ≤ / . If Conjecture 5.4.1 holds, then every distribution in Π (cid:15) isShannon entropically classical. Proposition 5.4.7.

Let (cid:15) ≤ / . If Conjecture 5.4.2 holds, then every distribution in Π (cid:15) satisﬁesthe Tsallis entropic BC Inequalities (5.6) ∀ q > .Proof. If Conjectures 5.4.1 and 5.4.2 hold, then for any (cid:15) ≤ / I i BC ,q ≤ ∀ q ≥ ∀ i ∈{ , , , } and for all distributions in the polytope Conv ({ P ( , , , ) iso ,(cid:15) } ⋃{ P L ,k } k }) . We want to Note that q = HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION show that this implies the same for the larger polytope that comprises the convex hull of notjust P ( , , , ) iso ,(cid:15) and local deterministic distributions { P L ,k } k , but also all the local relabellingsof P ( , , , ) iso ,(cid:15) , i.e., the polytope Π (cid:15) = Conv ({ P R ,j(cid:15) } j ⋃{ P L ,k } k ) . While we considered P ( , , , ) iso ,(cid:15) inPropositions 5.4.3 and 5.4.4 and Conjectures 5.4.1 and 5.4.2, due to symmetries, these alsoapply to every relabelling of P ( , , , ) iso ,(cid:15) , i.e., I i BC ,q ≤ ∀ i ∈ { , , , } ) throughout every polytopein the set { Conv ({ P R ,j(cid:15) } ⋃{ P L ,k } k )} j (where j runs over all the local relabellings of P ( , , , ) iso ,(cid:15) ) .This is because for every input-output relabelling of P ( , , , ) iso ,(cid:15) , we can correspondingly relabelthe inequality expression I ,q (for q ≥

1) and the same arguments as Propositions 5.4.3 and5.4.4 again hold, and similarly Conjectures 5.4.1 and 5.4.2 also follow for this case. From thisargument, it follows that if Conjectures 5.4.1 and 5.4.2 hold, then I i BC ,q ≤ ∀ q ≥ ∀ i ∈ { , , , } everywhere in the union of the polytopes, i.e., everywhere in ⋃ j Conv ({ P R ,j(cid:15) } ⋃{ P L ,k } k ) .To conclude the proof it remains to show that ⋃ j Conv ({ P R ,j(cid:15) } ⋃{ P L ,k } k ) = Π (cid:15) ∀ (cid:15) ≤ /

7. This isestablished below. Then, Proposition 5.4.7 automatically follows while Proposition 5.4.6 followsfrom this and Lemma 3.2.1.

Proposition 5.6.1. P j mix ,(cid:15) ∶= P ( , , , ) iso ,(cid:15) + P R ,j(cid:15) is local ∀ j ≠ if and only if (cid:15) ≤ / .Proof. For the “if” part of the proof, we used a linear program to conﬁrm that P j mix ,(cid:15) = / islocal ∀ j ≠

1. Since reducing (cid:15) in P j mix ,(cid:15) cannot decrease the local weight, this also holds for (cid:15) < / ∀ (cid:15) > / ∃ j such that P j mix ,(cid:15) isnon-classical. Consider the particular local relabelling that corresponds to Alice swapping theoutputs “1” and “2” only when her input is A =

1. Let the distribution obtained by applyingthis relabelling to P ( , , , ) iso ,(cid:15) be P R (cid:15) . Then P mix ,(cid:15) = P ( , , , ) iso ,(cid:15) + P R (cid:15) . More explicitly, P R (cid:15) = A B B A B BB A B B A BB B A B B AA B B B A BB B A A B BB A B B B A and p mix ,(cid:15) = A B B A B BB A B B A BB B A B B AA B B B A BB ∗ ∗ ∗ B ∗ B ∗ ∗ ∗ B ∗ , (5.13)where ∗ = A + B , A = ( (cid:15) + )/ B = ( − (cid:15) )/

9. Now consider the Bell inequality Tr ( M T P ) ≥ A note on notation: { P R ,j(cid:15) } is a set comprising a single element, while { P R ,j(cid:15) } j is a set whose elements arethe distributions for every j . Note that output relabellings don’t change the entropic expression but input relabellings (4 in number) cangive either one of I i BC ,q ≤ i ∈ { , , , } . Thus for an output relabelling of P ( , , , ) iso ,(cid:15) , one can continue using I ,q in Propositions 5.4.3 and 5.4.4 and the following Conjectures while for input relabellings, one simply needsto relabel the inequalities accordingly and run the same arguments. j = P ,(cid:15) = P ( , , , ) iso ,(cid:15) which is non-classical for (cid:15) > / HAPTER 5. ENTROPIC ANALYSIS OF CAUSAL STRUCTURES WITH POST-SELECTION where P is an arbitrary distribution in the ( , , , ) scenario and M is the following matrix. M ∶= P mix ,(cid:15) to be non-classical with respect to this Bell inequality i.e.,Tr ( M T P mix ,(cid:15) ) < (cid:15) > /

7. Since P R (cid:15) is a local relabelling of P ( , , , ) iso ,(cid:15) , there existsa j such that P R (cid:15) = P R ,j(cid:15) and hence P mix ,(cid:15) = P j mix ,(cid:15) . Thus we have shown that whenever (cid:15) > / ∃ j such that P j mix ,(cid:15) is non-classical and hence P j mix ,(cid:15) is local for all j implies that (cid:15) ≤ / P ( , , , ) iso ,(cid:15) replaced by P R ,i(cid:15) for any i , so we have the following corollary. Corollary 5.6.1. ˜ P i,j mix ,(cid:15) ∶= P R ,i(cid:15) + P R ,j(cid:15) is local ∀ j ≠ i if and only if (cid:15) ≤ / . Theorem 5.6.1 (Bemporad et. al 2001 [23]) . Let P and Q be polytopes with vertex sets V and W respectively i.e., P =

Conv ( V ) and Q =

Conv ( W ) . Then P ⋃ Q is convex if and only if theline-segment [ v, w ] is contained in P ⋃ Q ∀ v ∈ V and w ∈ W . Let P j = Conv ({ P R ,j(cid:15) } ⋃{ P L ,k } k }) and P be the set of polytopes P ∶= {P j } j . We use the abovetheorem to prove the ﬁnal result that establishes Propositions 5.4.6 and 5.4.7. Lemma 5.6.1.

Let V j be the vertex set of the polytope P j ∈ P and V ∶= ⋃ j V j . Then, ⋃ P j ∈ P P j = Conv (⋃ i V j ) = Conv ( V ) = Π (cid:15) ∀ (cid:15) ≤ / .Proof. By Corollary 5.6.1, for i ≠ j we have that ( P R ,i(cid:15) + P R ,j(cid:15) ) ∈ L ( , , , ) = P i ⋂ P j ∀ (cid:15) ≤ / αP R ,i(cid:15) + ( − α ) P R ,j(cid:15) ∈ P i ⋃ P j ∀ (cid:15) ≤ / α ∈ [ , ] , i.e., the line segment [ P R ,i(cid:15) , P R ,j(cid:15) ] is completely contained in the union of the corresponding polytopes P i ⋃ P j . Notethat all other line segments [ v i , v j ] with v i ∈ V i and v j ∈ V j are contained in P i ⋃ P j by con-struction since at least one of v i or v j would be a local-deterministic vertex. Therefore, byTheorem 5.6.1, P i ⋃ P j is convex ∀ i, j and (cid:15) ≤ /

7. We can then apply Proposition 5.6.1 andTheorem 5.6.1 to the convex polytopes P i ⋃ P j and P k and show that P i ⋃ P j ⋃ P k is convex ∀ i, j, k and (cid:15) ≤ /

7. Proceeding in this way, we conclude that ⋃ P i ∈ P P i is convex ∀ (cid:15) ≤ /

7, andhence ⋃ P j ∈ P P j = Conv (⋃ i V j ) = Conv ( V ) = Π (cid:15) . HAPTER Cyclic and ﬁne-tuned causal models and compatibilitywith relativistic principles T he notion of causation is closely tied with that of space-time, but how are the two related?Causation can be modelled operationally, without any reference to space-time, based only onobserved correlations and interventions on systems [160]. It can be represented through directedgraphs that are in principle independent from the partial order imposed by space-time. Thatthe arrows of this directed graph are compatible with the light cone structure of space-timeis an empirical observation that supports the notion of relativistic causality. For example, ifan external intervention on a variable A produces an observable change in another variable B ,then one would say that A aﬀects B and call A a cause of B . Assigning space-time locationsto the variables and requiring the eﬀect B to always be in the future light cone of the cause A makes this causal relationship compatible with the partial order of Minkowski space-time.In general, a variable may jointly aﬀect a set of variables without aﬀecting individual variablesin the set. Such scenarios correspond to causal models where the correlations are ﬁne-tuned to hide certain causal inﬂuences. For certain embeddings of such causal models in space-time (such as the jamming scenario considered in [105, 116]), this can lead to superluminalinﬂuence without superluminal signalling, or closed timelike curves that do not lead to anobservable signalling to the past, some of these are strictly post-quantum features that do notarise in standard quantum theory. For a deeper understanding of causality in quantum andpost-quantum theories, it is thus important to disentangle the two notions of causation– theoperational one and the relativistic one, and to characterise the relationship(s) between themin these general scenarios. A rigorous framework for causally modelling such scenarios andformalising their relationship with a space-time structure is not available. Such an analysis candeepen our understanding of the principles of causality other than broad relativistic principles(such as no superluminal signalling) that apply in the quantum world. This chapter is basedon joint work with Roger Colbeck that we hope to turn into a paper soon.117 HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

We develop an operational framework for analysing cyclic and ﬁne-tuned causal models innon-classical theories, and their compatibility with a space-time structure. This in particularincludes the jamming scenarios envisaged in [105, 116] where a party superluminally inﬂuencescorrelations between two other parties. We present the framework in two parts, the ﬁrst con-cerns causal models (Section 6.3) and the second characterises the embedding of these causalmodels in a space-time structure (Section 6.4). In Sections 6.5.1 and 6.5.2, we derive necessaryand suﬃcient conditions for the causal model to be compatibly embedded in a space-time struc-ture and to rule out operationally detectable causal loops. Following this, in Section 6.6, weanalyse the jamming scenario [105, 116] using causal modelling. We prove in Theorem 6.6.1 thatif the shared joint system is classical and accessible to the parties, then the jamming scenariocan indeed lead to superluminal signalling, contrary to what is claimed in [105, 116]. In Sec-tion 6.6.2, we analyse the results of [116] regarding such scenarios, providing counter examplesto some of their claims, and discussing how our results provide possible resolutions. Keepingin theme with the many cycles and paradoxes that the reader will encounter in this thesis, wepresent them with a mildly self-referential poem before looping back into the technicalities.

What is causation?Can we always discover its presence?If it is not mere correlation,Then what is its essence?What is it that orders the causeBefore that what we dub the eﬀect?What brings order to the chaosTelling us what we can or cannot aﬀect?Time gives us a direction “forward”.To pose questions without already knowing the answer.To drive causation through its invisible arrow,A rule that seems doomed to be followed.Causality and space-time,At ﬁrst, we disconnect.By actions and observable changes,We model cause and eﬀect.With space-time out of the way, we abstract away,Allowing causal loops, and hidden inﬂuences What is time,does it really “exist”?That is a question for another time,Another poem, and I must resist.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

That from our observations may evade.But we’d like to avoid paradoxesThat put our grandfathers’ lives at stake.Requiring a joint probability distributionKeeps these inconsistencies at bay.Bringing one foot back to reality,We embed the causal model in space-timeDerive conditions for its compatibility.Align the actions and changesWith the past and future,But this doesn’t suﬃce to exorciseAll the loops we set afoot earlier.Some lurk, beyond the observable realm.Finely tuned parameters that don’t reveal them.We can only rule out loops that manifestIn observations that we can operationally test.So we critically analyse a previous claim [116]That speciﬁes how all the loops can be tamed.A plethora of questions begging for answers,Some to be addressed before publishing.Therefore, much like the work presented here,This poem too needs some ﬁne-tuning.

A ubiquitous assumption made in majority of the literature on causal modelling is that of faithfulness , which we brieﬂy discussed in Section 2.5.1 (in particular, Example 2.5.1). Thiscorresponds to the requirement that all the (conditional) independences observed in the cor-relations arising from a causal structure must be a consequence of certain graph separationproperties (i.e., d-separation relations, Deﬁnition 2.5.7) of that causal structure. In unfaithful causal models, the causal parameters are ﬁne-tuned such that some causal inﬂuences exactlycancel out to yield additional independences beyond what is implied by d-separation. Fine-tuning creates problems for causal inference since it leads to correlations that do not faithfullyreﬂect the causal relations and it is considered an undesirable property of causal models andoften avoided in the literature, also on the grounds that ﬁne-tuned causal models constitutea set of measure-zero. In fact quantum correlations in the Bell causal structure 3.1a can beexplained using classical causal models if we allow for additional, ﬁne-tuned causal inﬂuences(e.g., from A to X , see Figure 6.11) [225], but a faithful explanation using quantum causal HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES models is often preferred.However, there are a number of examples, as we will see below, which necessitate a ﬁne-tunedexplanation irrespective of whether the causal structure is classical and non-classical. Theseinclude certain everyday scenarios, cryptographic protocols as well as those arising in certainpost-quantum theory. Moreover, another common assumption in the causality literature is thatthe causal structure is acyclic, this is justiﬁably so since our observations suggest that causalinﬂuences only propagate in one direction. Nevertheless, allowing ﬁne-tuned causal inﬂuencesmakes possible cyclic causal structures that are compatible with minimal notions of relativisticcausality, such as the impossibility of signalling superluminally at the observed level. Cycliccausal models have also been employed in the classical literature for describing systems withfeedback loops [161, 84].

Consider a house with an ideal thermostat. Such a thermostat would maintain a constant insidetemperature T I throughout the year by adjusting the energy consumption E in accordancewith the outside temperature T O . An individual who does not know how a thermostat worksmight conclude that T O and E which are correlated have a causal relationship between eachother while the indoor temperature T I is causally independent of everything else. However, anengineer who is more well-versed with the workings of a thermostat knows that both T O and E exert a causal inﬂuence on T I , and that these inﬂuences must perfectly cancel each otherout for the thermostat to function ideally. The causal model in this case is ﬁne-tuned since theindependence of T I from T O and E does not correspond to a d-separation relation in the causalstructure (Figure 6.1a). This thermostat analogy which is attributed to Milton Friedman [87],can be extended to a number of other scenarios such as the eﬀect of ﬁscal and monetary policieson economic growth [184], or physical systems where several forces exactly balance out.In cryptographic settings, examples that necessitate ﬁne-tuning include the one-time pad andthe “traitorous lieutenant problem” [38]. Consider a general who wishes to relay an importantsecret message M to an ally and has two lieutenants available as messengers, but one of themis a traitor who might leak the message to enemies. Consider for simplicity that M is a singlebit. The general could then adopt the following strategy: Depending on M = M =

1, twouniformly distributed bits M and M are generated such that M = M or M ≠ M . M isgiven to the ﬁrst and M to the second lieutenant to relay to the ally. Then the ally wouldreceive M and M and can simply use modulo-2 addition ⊕ to obtain M ∗ which would beidentical to the original message M ∗ = M = M ⊕ M (Figure 6.1b). More importantly, theindividual messages M and M contain no information about M and hence neither lieutenanthas any information about the secret message. A similar protocol underlies the one-time padwhere a message M is encrypted using a secret key K (both binary for this example) to producean encrypted message M E = M ⊕ K which can be sent through a public channel as it will carryno information about the original message M if the key K is uniformly distributed and is keptprivate. Only a receiver of M E who knows the key K can decrypt the message M = M E ⊕ K (Figure 6.1c). Hence ﬁne-tuning of causal inﬂuences i.e., causation in the absence of correlation, HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES T I T O E (a) MM ∗ M M (b) M E M K (c)

Figure 6.1: Causal structures for the motivating examples described in themain text: (a)

Friedman’s thermostat (b)

Traitorous Lieutenant (c)

One-time pad.In this chapter, we use squiggly arrows to denote causal inﬂuence, as this will laterbe classiﬁed into solid and dashed arrows. Note that there may be additional causalinﬂuences, for example there can be a direct inﬂuence of the outside temperature T O and/or the inside temperature T I on the energy consumption E in (a), the latter wouldmake it a cyclic causal model. Further, in examples like (b), we will later see thatan additional common cause between M and M will be required to fully explain thecorrelations (c.f. Figure 6.5a). is key to the working of such cryptographic protocols.Further, cyclic causal models have been analysed in the classical literature [161, 84] for thepurpose of describing complex systems involving feedback loops. Note that the cyclic depen-dences here do not correspond to closed time-like curves since the variables under question areconsidered over a period of time— e.g., demand at time t inﬂuences the price at time t > t ,which in turn inﬂuences the demand at time t > t . Therefore in order to characterise genuineclosed time-like curves one must consider not only the pattern of causal inﬂuences, but alsohow the relevant variables are embedded in a space-time structure. Another example that involves ﬁne-tuning, even though it has not been motivated or discussedin this context, is that of jamming non-local correlations introduced in [105]. Consider three,space-like separated parties, Alice, Bob and Charlie sharing a tripartite system Λ which theymeasure using measurement settings A , B and C , producing outcomes X , Y and Z respectively.Suppose that their space-time locations are such that Bob’s future light cone entirely containsthe joint future of Alice and Charlie, as shown in Figure 6.2. The standard no-signaling condi-tions forbid the input of each party from being correlated with the outputs of any subset of theremaining parties, in particular, the joint distribution P XY Z ∣ ABC satisﬁes P XZ ∣ ABC = P XZ ∣ AC . In[105] it is argued that a violation of this requirement does not lead to superluminal signallingin the space-time conﬁguration of Figure 6.2, as long as X and Z are individually independentfrom B i.e., P X ∣ ABC = P X ∣ A and P Z ∣ ABC = P Z ∣ C . This is because any inﬂuence that B exerts HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES jointly on X and Z can only be checked when X and Z are brought together to evaluate thecorrelations P XZ ∣ ABC , which is only possible in their joint future, that is by construction con-tained in the future of B . This scenario allows Bob to jam the correlations between Alice andCharlie non-locally.In [116], where jamming is further analysed, the causal structure for such an experiment isrepresented by introducing a new random variable C XZ associated with the set { X, Z } thatencodes the correlations between its elements. Then B is seen as a cause of C XZ but not asa cause of either X or Z . In general scenarios, this representation would require adding anew variable for every non-empty subset of the observed nodes, which can become intractable. Further, this representation does not always correspond to what is physically going on— forinstance, in the example of the traitorous lieutenant, this would introduce a new variable C M M that is observably inﬂuenced by the general’s original message M , while M would no longerbe seen as a cause of M or M . However, we know that we physically generated M and M using M , hence it is indeed a cause of at least one of them. Therefore, we aim to develop a newapproach to causally modelling in a general class of ﬁne-tuned and cyclic scenarios, using onlythe original variables/systems. The following proposition illustrates that the jamming scenarioconsidered in [105, 116] necessarily corresponds to a ﬁne-tuned causal model over the originalvariables. Here, jamming is considered in the context of multipartite Bell scenarios where thejamming variable is a freely chosen input of one of the parties. In the causal model approachadopted here, we will take free choice of a variable to correspond to the exogeneity of thatvariable in the causal structure. Proposition 6.2.1.

Consider a tripartite Bell experiment where three parties Alice, Bob andCharlie share a system Λ which they measure using the setting choices A , B and C , producingthe measurement outcomes X , Y and Z respectively. Let G be any causal structure with only { A, B, C, X, Y, Z } as the observed nodes where A , B and C are exogenous. Then any jointdistribution P XY Z ∣ ABC corresponding to the jamming correlations of [105, 116] deﬁnes a ﬁne-tuned causal model over G , irrespective of the nature (classical, quantum or GPT) of Λ .Proof. Jamming allows Bob’s input B to be correlated jointly with X and Z but not individuallywith X or Z . Hence jamming correlations in the tripartite Bell experiment of [105, 116] arecharacterised by the conditions B Æ X and B Æ Z while B /Æ { X, Z } . Since B is exogenous(i.e., has no incoming arrows), the only way to explain the correlation between B and { X, Z } is through an outgoing arrow or a directed path from B to the set { X, Z } i.e., either an arrowfrom B to X , or from B to Z or both. Since we require both independences B Æ X and B Æ Z to hold, at least one of these will not be a consequence of d-separation and hence the causalmodel must be ﬁne-tuned in order to produce these correlations in the causal structure G . In general, this representation would include up to 2 n − n elements. And possibly some additional information to explain the distribution over the individual variables. As wewill see later in Figure 6.5a, a common cause Λ between M and M would also be required in such examples. Free choice is often of a variable B is often deﬁned through the condition that B can only be correlatedwith variables in its future [61]. We will discuss the relation between this notion of free choice and ours, namelytaking B to be exogenous, later in the chapter (Section 6.6.2). If this were not the case, B would be d-separated from { X, Z } and therefore cannot be correlated with it. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES t x AX BY CZ Λ C XZ (a) A B C X Y Z Λ (b) Figure 6.2: A tripartite Bell experiment:

Three parties Alice, Bob and Charlieshare a tripartite system Λ, they measure their subsystem using the freely chosen mea-surement settings A , B and C , producing the outcomes X , Y and Z respectively. (a) Space-time conﬁguration for the jamming scenario [105, 116]: The measurement eventsof the three parties are space-like separated such that the future of Bob’s input B con-tains the entire joint future (in blue) of Alice and Charlie’s outputs X and Z . Here, B isallowed to signal to X and Z jointly (which can only be veriﬁed in the blue region) butnot individually. In [116], a new variable C XZ is introduced, located at the earliest pointin the joint future of X and Z and representing the correlations between X and Z . (b) Causal structure for the usual tripartite Bell experiment. Note that in order to explainjamming correlations, we must either include additional causal arrows from B to X or B to Z or both (Proposition 6.2.1), or introduce a new node C XZ with an incoming arrowfrom B [116]. We adopt the former approach in the rest of this chapter. The simplest example of a jamming where B = X ⊕ Z and all variables are binary uniformlydistributed (the remaining variables are irrelevant here), and we will revisit this example severaltimes in this chapter. These are in fact the same correlations as the traitorous lieutenantexample. However in the jamming case, the three variables involved are pairwise space-likeseparated and since B is exogenous, this corresponds to a situation where B superluminallyinﬂuences the correlations between X and Z . In Section 2.5 we have reviewed the standard literature on acyclic and faithful causal models,both in the classical as well as non-classical cases. Here, following the motivation set out in theprevious sections, we wish to relax the assumptions of acyclicity and faithfulness and extendthese methods to cyclic and ﬁne-tuned causal structures. While quantum cyclic causal modelshave been previously studied [18], these have only been analysed in the faithful case and are

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES based on the split-node causal modelling approach of [6] (this approach is brieﬂy discussed inSection 2.5.2.3). This approach is not equivalent to the standard causal modelling approachsuch as [115] in the cyclic case, for example the former forbids faithful 2 node cyclic causalstructures [18] but the latter does not, and the former admits a Markov factorisation (analogousto Equation (2.45)) while the latter does not in general (as we will see later in the chapter).To the best of our knowledge, there is no prior framework for causally modelling cyclic andunfaithful causal structures in the presence of quantum and post-quantum latent nodes, thelack of a Markov factorisation posing a particular diﬃculty. Here, we develop the bare bonesof such a framework that will suﬃce for our main results. We note that there may be other,inequivalent ways to do the same. We will deﬁne causal models in terms of minimal conditionsthat they must satisfy at the level of the observed nodes which are classical.

Causal structure:

Causal structures will be represented using directed graphs, of which thedirected acyclic graphs of Section 2.5 are special cases. Edges in these graphs will be denotedusing (unlike —→ in previous chapters), as it will be useful to later classify these edges intosolid —→ and dashed (cid:57)(cid:57)(cid:75) ones based on certain operational conditions for detecting causation.These causal structures can have observed as well as unobserved nodes, where the former areclassical random variables and the latter can be classical or non-classical systems. We willthen assume the existence of such a causal structure (though it may be unknown) and use thefollowing deﬁnition of cause that directly arises from this assumption. Deﬁnition 6.3.1 (Cause) . Given a causal structure represented by a directed graph G , possiblycontaining observed as well as unobserved nodes, we say that a node N i is a cause of anothernode N j if and only if there is a directed path N i ... N j from N i to N j in G . Observed distribution:

In classical acyclic causal models, the causal Markov condition (2.45)is used for deﬁning the compatibility of the observed distribution with the causal structure [160].In the non-classical case, the generalised Markov condition of [115] provides a compatibility con-dition (Section 2.5.2.1). However, in cyclic causal models, demanding such a factorisation willbe too restrictive even in the classical case. For example, consider the simplest cyclic causalstructure, the 2-cycle where

X Y and

Y X , with X and Y observed and X = Y . TheMarkov condition would imply that P XY = P X ∣ Y P Y ∣ X . Since X = Y , the right hand side is aproduct of deterministic distributions, which forces P XY to also be deterministic in order tobe a valid distribution. Therefore, we instead use a weaker compatibility condition in termsof d-separation between observed nodes (Deﬁnition 2.5.7), a concept that can also be appliedto non-classical causal structures (Theorem 2.5.3), and deﬁne compatibility of the observeddistribution with a cyclic causal structure as follows within our framework. Deﬁnition 6.3.2 (Compatibility of observed distribution with a causal structure) . Let { X , ..., X n } be a set of random variables denoting the observed nodes of a directed graph G , and P X ,...,X n be a joint probability distribution over them. Then P is said to be compatible with G if for alldisjoint subsets X , Y and Z of { X , ..., X n } , X ⊥ d Y ∣ Z ⇒ X Æ Y ∣ Z i.e., P XY ∣ Z = P X ∣ Z P Y ∣ Z , Based on a diﬀerent condition for compatibility of a distribution with a causal structure, for example.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES for the marginal distribution, P XY Z of P X ,...,X n . Deﬁnition 6.3.2 is essentially the soundness of d-separation (c.f. Theorem 2.5.2) and is satisﬁedby classical as well as non-classical causal models in the acyclic case [160, 115]. It is sometimesreferred to as the global directed Markov condition [181] and holds in several classical cycliccausal models [161, 84] and well as non-classical acyclic causal models [115]. In Appendix 6.8.1,we provide an example of a quantum cyclic causal model where this holds. However there alsoexist cyclic causal models producing observed distributions that do not satisfy Deﬁnition 6.3.2,we discuss this further in the Appendix as well. There, we also present further motivation forthe compatibility condition of Deﬁnition 6.3.2 in terms of the properties of the underlying causalmechanisms (e.g., functional dependences in the classical case, completely positive maps in thequantum case) and outline possible methods for identifying when this condition might hold fornon-classical cyclic causal models. Even in the classical case, several inequivalent deﬁnitionsof compatibility are possible (which become equivalent in the acyclic case) and [84] presents adetailed analysis of these conditions and the relationships between them. Such an analysis forthe non-classical case is beyond the scope of the present thesis. For the rest of this chapter, wewill only consider causal models that satisfy the compatibility condition 6.3.2.We will work with the following minimal deﬁnition of a causal model in this chapter which is interms of the graph and observed distribution only. Further details such as the functional rela-tionships between classical variables, choice of quantum states/transformations, or generalisedtests (Section 2.5.2.1) can also be included in the full speciﬁcation of the causal model. Theseconstitute the causal mechanisms of the model. Developing a complete and formal speciﬁcationof these mechanisms and deriving the conditions for their compatibility with cyclic and ﬁne-tuned causal models is a tricky problem, we outline possible ideas for this in Appendix 6.8.1and leave the full problem for future work.

Deﬁnition 6.3.3 (Causal model) . A causal model over a set of observed random variables { X , ..., X n } consists of a directed graph G over them (possibly involving classical/quantum/GPTunobserved systems) and a joint distribution P X ,...,X n that is compatible with this graph ac-cording to Deﬁnition 6.3.2.Deﬁnition 6.3.2 allows for ﬁne-tuned distributions to be compatible with the causal structuresince it only requires that d-separation implies conditional independence and not the converse.Fine-tuned causal models may in general have an arbitrary number of additional conditionalindependences that are not implied by the d-separation relations in the corresponding causalgraph. The following lemma shows that some additional conditional independences that arenot directly implied by d-separation can be derived using d-separation and other independencesthat may be known. Note that we only need to consider d-separation between observed sets of variables in this deﬁnition, howeverthe paths being considered may involve unobserved nodes. For example, if the observed variables X and Y havean unobserved common cause Λ, then X and Y are not d-separated by the empty set since there is an unblockedpath between X and Y through the unobserved common cause, and naturally we don’t expect X and Y to beindependent in this case. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Lemma 6.3.1.

Let S , S and S be three disjoint sets of RVs such that S Æ S ∣ S . If S is aset of RVs that is d-separated from these sets in a directed graph G containing S , S , S and S are nodes i.e. S ⊥ d S i ∀ i ∈ { , , } , then any distribution P that is compatible with G alsosatisﬁes the following conditional independences, ( S ∪ S ) Æ S ∣ S , S Æ ( S ∪ S )∣ S and S Æ S ∣( S ∪ S ) . A proof can be found in Appendix 6.8.2. Note that this lemma is trivial in the case of faithfulcausal models. This is because, the independence S Æ S ∣ S implies the d-separation S ⊥ d S ∣ S in this case and combined with S ⊥ d S i , this would immediately imply the d-separations ( S ∪ S ) ⊥ d S ∣ S , S ⊥ d ( S ∪ S )∣ S and S ⊥ d S ∣( S ∪ S ) , which in turn imply the correspondingindependences. This property is not so evident in the case of ﬁne-tuned causal models but aswe have shown, it nevertheless holds here. Speciﬁc examples of this property for ﬁne-tunedcausal models are discussed in Section 6.3.5. We adopt the characterisation of interventions outlined in Section 2.5.2.2, following the aug-mented graph approach [160] presented in Section 2.5.1.2. The main intuition behind interven-tions in the acyclic case carries forth to the cyclic case as well, but we will not be allowed to usea Markov factorisation condition such as Equation (2.45) for classical cyclic causal structuresor its non-classical analogue of [115] since this does not holds for cyclic causal structures ingeneral. We summarise these concepts here for completeness. Given a generalised, cyclic causalstructure G , an intervention on an observed node X of G is described as follows.• An intervention variable I X taking values in the set { idle, { do ( x )} x ∈ X } is introduced where I X = idle corresponds to the original pre-intervention scenario and I X = do ( x ) sets thevariable X to a particular value x ∈ X , cutting oﬀ its dependence on all other parents in G .• Two new causal structures are introduced to keep track of what happens during such anintervention, the augmented graph G I X and the do-graph G do ( X ) . The former is obtainedfrom the original graph G simply by adding I X as a node an including an edge I X X (with the rest remaining unchanged). The latter corresponds to the case when a non-trivial intervention is performed i.e., when I X ≠ idle . Hence, G do ( X ) is obtained from G I X by cutting oﬀ all incoming arrows to X , except the one from I X , and I X only takes valuesin { do ( x )} x ∈ X in G do ( X ) .• The causal mechanisms involved in the causal model are updated accordingly i.e., they re-main unchanged when I X = idle and when I X = do ( x ) , all causal mechanisms, except thatof X remain unchanged while X is deterministically set to the value x (irrespective of thestates of the subsystems corresponding to incoming edges to X in G ). The do-conditional In general, the distribution over I X may be arbitrary, but given a speciﬁc value of I X in the post-interventionscenario, such as I X = do ( x ) , X is completely determined. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES P ( Y = y ∣ do ( x )) encodes the correlations between a variable Y and the intervened variable X (or equivalently I X , since X and I X are perfectly correlated) in the post-interventiongraph i.e., P ( Y = y ∣ do ( x )) ∶= P G do ( X ) ( Y = y ∣ I X = do ( x )) (c.f. Equation (2.51)), where P G do ( X ) is compatible with the do-graph G do ( X ) . We will use the graph as a subscriptto the distribution where it is useful to explicitly indicate the causal structure that thedistribution is compatible with.This method naturally extends to simultaneous interventions on subsets of nodes (as describedin Section 2.5.1.2). Only interventions on the observed nodes (which are classical) need to beconsidered since unobserved nodes are by construction experimentally inaccessible. At the level of the causal mechanisms (if these are also given), the causal mechanisms of G do ( X ) can be obtained from those of G simply by updating the causal mechanisms for each node X i in X as X i = x i iﬀ I X i = do ( x i ) (while leaving the causal mechanisms for all other nodesunchanged) i.e., P G do ( X ) ( X ) is fully speciﬁed by P G do ( X ) ( I X ) which can be chosen arbitrarilyfor the exogenous set I X . Physically, the post-intervention distribution (or the do-conditional)corresponds to additional empirical data that is collected in an experiment, that can, in generalbe diﬀerent from the experiment generating the original, pre-intervention data. For example,when the original experiment involves passive observation of correlations between the smokingtendencies and presence of cancer in a group of individuals, an intervention model may involveforcing certain individuals to take up smoking and then studying their chances of developingcancer. In repeated trials, the proportion of individuals who are passively observed and thosethat are actively intervened upon may be chosen as desired. The latter type of experimentsmay not necessarily be ethical but are nevertheless a physical possibility. In certain cases, itmay be possible to fully deduce the post-intervention statistics counterfactually from the pre-intervention data alone (e.g., using a relation such as (2.52)), and the former experiment neednot be actually performed, sparing us some ethical dilemmas. In some cases however, thismay not be possible (c.f. Example 2.5.3).Therefore a complete speciﬁcation of the post-intervention distribution in terms of the pre-intervention may not always be possible. However, the compatibility condition of Deﬁni-tion 6.3.2 along with the deﬁnition Equations (2.50) and (2.51) allow us to derive further usefulrelationships between these distributions, in particular the three rules of Pearl’s do-calculus[158, 160]. These rules have been originally derived for classical causal models satisfying thecausal Markov property (2.45) which does not hold in the general scenarios considered here.However, we note that the derivation of these rules don’t require the Markov property but onlythe weaker d-separation condition of Deﬁnition 6.3.2 along with the deﬁning Equations (2.50)and (2.51). This is captured in the following theorem and we present a proof of the same inAppendix 6.8.2 for completeness, even though this is predominantly based on the original proofof [158]. In the following, we will use G X to denote the graph obtained by deleting all incoming Manipulation of non-classical systems such as quantum states, which are by construction unobserved, canbe modelled by introducing observed classical variables that specify the choice of preparation/transformationor measurement acting on the system. As discussed in Section 2.5, whenever this is possible, the causal model is called identiﬁable and in Ex-ample 2.5.3 we visited an example of a simple non-indentiﬁable scenario involving a classical latent commoncause.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES edges to X in a graph G , where X is some subset of the observed nodes. Similarly, G X denotesthe graph obtained by deleting all outgoing edges from a subset X of nodes in a graph G . Theorem 6.3.1.

Given a causal model over a set S of observed nodes, associated causal graph G and a distribution P S compatible with G according to Deﬁnition 6.3.2, the following 3 rulesof do-calculus [160] hold for interventions on this causal model. • Rule 1: Ignoring observations P ( y ∣ do ( x ) , z, w ) = P ( y ∣ do ( x ) , w ) if ( Y ⊥ d Z ∣ X, W ) G X (6.1)• Rule 2: Action/observation exchange P ( y ∣ do ( x ) , do ( z ) , w ) = P ( y ∣ do ( x ) , z, w ) if ( Y ⊥ d Z ∣ X, W ) G X,Z (6.2)•

Rule 3: Ignoring actions/interventions P ( y ∣ do ( x ) , do ( z ) , w ) = P ( y ∣ do ( x ) , w ) if ( Y ⊥ d Z ∣ X, W ) G X,Z ( W ) , (6.3) where X , Y , Z and W are disjoint subsets of the observed nodes, Z ( W ) denotes the set ofnodes in Z which are not ancestors of W and the above statements hold for all values w , x , y and z of the variables W , X , Y and Z . While the observed distribution in the post-intervention causal model may not be completelyspeciﬁed by the pre-intervention observed distribution alone, considering the underlying causalmechanisms e.g., the states, transformations, measurements involved in the original causalmodel should allow for the complete speciﬁcation of the post-intervention distribution. To thebest of our knowledge, this problem has not been studied in non-classical and cyclic causalmodels, we discuss this point in further detail in Appendix 6.8.1, providing examples of non-classical cyclic causal models where the post-intervention distribution can be calculated fromthe causal mechanisms. The full solution to this problem will not be relevant to the results ofthis thesis. Using these concepts, we now deﬁne the aﬀects relation that is central to the resultsof this chapter.

Deﬁnition 6.3.4 (Aﬀects relation) . Consider a causal model over a set of random variables S with causal graph G and a joint distribution P compatible with it. For X , Y ⊆ S , if there existsa value x of X such that P ( Y ∣ do ( x )) ≠ P ( Y ) , then we say that X aﬀects Y .The distribution on the left hand side of the above equation is compatible with the post-intervention graph G do ( X ) (as given by Equation (2.51)) while that on the right hand side iscompatible with the original graph G i.e., P ( Y ) is short for P G ( Y ) . Operationally, X aﬀects Y is equivalent to saying that X signals to Y . With this deﬁnition, we are ready to state twouseful corollaries of Theorem 6.3.1. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Corollary 6.3.1. If X is a subset of observed exogenous nodes of a causal graph G , then forany subset Y of nodes disjoint to X the do-conditional and the regular conditional with respectto X coincide i.e., P ( y ∣ do ( x )) = P ( y ∣ x ) ∀ x, y In other words, for any subset X of the observed exogenous nodes, correlation between X anda disjoint set of observed nodes Y in G guarantees that X aﬀects Y .Proof. Since X consists only of exogenous nodes, it can only be d-connected to other nodesthrough outgoing arrows. Then in the graph G X (where all outgoing arrows from X are cut oﬀ), X becomes d-separated from all other nodes. This d-separation, ( Y ⊥ d X ) G X implies, by Rule 2of Theorem 6.3.1 that P ( y ∣ do ( x )) = P ( y ∣ x ) ∀ x, y . Further if X and Y are correlated in G , i.e., ∃ x, y such that P ( y ∣ x ) ≠ P ( y ) , the equation previously established along with Deﬁnition 6.3.4imply that X aﬀects Y . Corollary 6.3.2. If X and Y are two disjoint subsets of the observed nodes such that ( X ⊥ d Y ) G do ( X ) , then X does not aﬀect Y and P G do ( X ) ( Y ) = P G ( Y ) .Proof. The d-separation ( X ⊥ d Y ) G do ( X ) trivially implies the d-separation ( X ⊥ d Y ) G X since G do ( X ) and G X only diﬀer by the inclusion of the intervention nodes I X i and the correspondingedges I X i —→ X i for each X i ∈ X . Then by Rule 3 of Theorem 6.3.1 we have P ( y ∣ do ( x )) = P ( y )∀ x, y , which by Deﬁnition 6.3.4 stands for X does not aﬀect Y . Further, the d-separationimplies the independence ( X Æ Y ) G do ( X ) i.e., P G do ( X ) ( y ∣ x ) = P G do ( X ) ( y ) ∀ x, y where the left handside equals the do-conditional P ( y ∣ do ( x )) by deﬁnition. Along with the result that X does notaﬀect Y , this yields the required equation P G do ( X ) ( y ) = P G ( y ) ∀ y .Note that X aﬀects Y implies that there must be a directed path from X to Y in G (whichis equivalent to X is a cause of Y , c.f. Deﬁnition 6.3.1). This follows from the contrapositivestatement of Corollary 6.3.2— X aﬀects Y implies that X and Y are not d-separated in G do ( X ) and since this graph has no incoming arrows to X (except those from the intervention nodes in I X ), the only way for X and Y to be d-connected in G do ( X ) is through a directed path from X to Y . However, the converse is not true. A directed path from X to Y in G does not imply that X aﬀects Y in the presence of ﬁne-tuning (as illustrated by Example 6.3.2), even though it doesimply d-connection between X and Y in G do ( X ) . This motivates the following classiﬁcation ofthe causal arrows between observed nodes. The arrows emanating from/pointing to anunobserved node cannot be operationally probed and hence need not be classiﬁed. Deﬁnition 6.3.5 (Solid and dashed arrows) . Given a causal graph G , if two observed nodes X and Y in G sharing a directed edge X Y are such that X aﬀects Y , then the causal arrowbetween those nodes is called a solid arrow , denoted X —→ Y . Further, all arrows betweenobserved nodes in G that are not solid arrows are called dashed arrows , denoted (cid:57)(cid:57)(cid:75) . In otherwords, X (cid:57)(cid:57)(cid:75) Y for any two RVs X and Y in G implies that the X does not aﬀect Y . Remark 6.3.1 (Exogenous nodes) . Note that if X is an exogenous node that is a directcause of another node Y in a causal graph G i.e. X Y , and X and Y are correlated in thecorresponding causal model, then by Corollary 6.3.1 and Deﬁnition 6.3.5 this would imply that HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES the arrow from X to Y must be a solid one. Applying this to the graphs G I X and G do ( X ) , where I X is exogenous and correlated with X by construction (Equation (2.50)), we can conclude thatthe arrow from every intervention variable to the corresponding intervened variable must be asolid arrow, i.e., I X —→ X .Finally, another noteworthy implication is encapsulated in the following lemma. Lemma 6.3.2.

Given a causal graph G and two disjoint subsets X and Y of observed nodestherein, ( X /Æ Y ) G do ( X ) ⇒ X aﬀects Y. Proof.

Suppose that X does not aﬀect Y . By Deﬁnition 6.3.4, this implies that P ( y ∣ do ( x )) = P ( y ) ∀ x, y . Further suppose also that ( X /Æ Y ) G do ( X ) . This means that there exist two distinctvalues x and x ′ of X and some value y of Y such that P ( y ∣ do ( x )) ≠ P ( y ∣ do ( x ′ )) , whichcontradicts P ( y ∣ do ( x )) = P ( y ) ∀ x, y . Therefore ( X /Æ Y ) G do ( X ) must imply X aﬀects Y . We distinguish between two types of causal loops that can arise in our framework. The ﬁrstwill be called an aﬀects causal loop which is based on the aﬀects relation of Deﬁnition 6.3.4 thatcorresponds to physically detectable causal relationships. The second will be called a functionalcausal loop and corresponds to a loop at the level of the causal mechanisms (which are functionaldependences in the classical case). These function as causal loops at the underlying level eventhough they may not be detectable at the observable level.

Deﬁnition 6.3.6 (Causal loops) . A causal model associated with a causal structure G over aset S of RVs is said to have an aﬀects causal loop if there exist two distinct random variables X and Y in S such that X aﬀects Y and Y aﬀects X . Hence every aﬀects causal loop correspondsto a directed cycle in G . Any directed cycle in G that is not an aﬀects causal loop is called a functional causal loop .Note that the aﬀects causal loop is deﬁned in terms of single RVs X and Y and not sets of RVs.Replacing X and Y by sets of observed variables S and S in this deﬁnition does not imply theexistence of a directed cycle. For example, consider a causal structure G with 4 nodes A , B , C and D , all of which are observed such that the only edges in G are the solid arrows A —→ B and C —→ D , with A aﬀects B and C aﬀects D . Then, if S = { A, D } and S = { B, C } we have S aﬀects S and S aﬀects S even though G is clearly acyclic. On the other hand, it is also possibleto have a cyclic causal model that has no aﬀects causal loops but admits cyclic aﬀects relationsin terms of subsets of variables (Example 6.3.5.3). Such loops fall into the category of functionalcausal loops, even though it might be possible to operationally detect their existence throughsimultaneous interventions on multiple nodes and additional conditions. We will discuss thesefurther detail later in the chapter. In particular, we will see that such causal loops can beembedded in space-time without leading to superluminal signalling (Figure 6.10). HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Remark 6.3.2.

A causal modelling approach allows us to deﬁne causal loops without a ref-erence to space-time. This would correspond to a closed time-like curve when the variables X and Y in the loop are each associated with single, but mutually distinct space-time points.Cyclic causal models have also been used to describe variables such as demand and supply thatare studied over a period of time. These cyclic scenarios do not correspond to closed time-likecurves but to physical feedback mechanisms. Due to the presence of ﬁne-tuning and the introduction of the 2 types of causal arrows (solid anddashed), a number of concepts that are equivalent in faithful causal models are not equivalentfor the causal models described in our framework. It will hence be illustrative to present some ofrelationships between the concepts relating to our causal models, before discussing the relationto space-time structure. These are illustrated in Figure 6.3. The reason for every implication isexplained in the ﬁgure and its caption, and for every implication that fails, we provide a counter-example below. There are 14 implications in Figure 6.3 that do not hold. Some of these canbe explained by the same counter-example or are immediately evident from the deﬁnitions.Therefore we ﬁrst group these 14 cases based on the corresponding counter-example/argumentneeded for explaining them, in the end we will only need a few distinct counter-examples tocover all these cases. Note that if we restrict to faithful and/or acyclic causal models, not allof these non-implications would hold. For instance, in the case of faithful and acyclic causalmodels commonly considered in the literature, non-implications 1, 2, 3, 4, 5, 9 and 12 willbecome implications.1.

Non-implication 1:

In unfaithful causal models, X and Y can be independent evenwhen they are d-connected, as we have seen in the examples of Figure 6.1.2. Non-implications 2, 11, 18:

These are covered by Example 6.3.1.3.

Non-implications 3, 6, 8, 13:

These are covered by Example 6.3.2.4.

Non-implications 4, 5: X is a cause of Y does not imply that it is a direct cause of Y ,it can be an indirect cause. Further X can aﬀect Y even when it is an indirect cause, forexample X —→ Z —→ Y .5. Non-implication 7:

This is covered by Example 6.3.3.6.

Non-implication 9:

It is evident that “ X is a direct cause of Y ” does not imply X (cid:57)(cid:57)(cid:75) Y , since it can also be a cause through a solid arrow.7. Non-implications 10, 12:

These are just a consequence of the fact that correlationdoes not imply causation. Correlation between X and Y can arise when they share acommon cause, without being a cause (direct or indirect) of each other. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES ( X /Æ Y ) G do ( X ) ( X /⊥ d Y ) G do ( X ) ∃ a directed pathfrom X to Y in G i.e., X is a cause of YX aﬀects Y X Y in G i.e., X is a direct cause of YX —→ Y in G X (cid:57)(cid:57)(cid:75) Y in G( X /Æ Y ) G X does not aﬀect Y ( X Æ Y ) G Lem. . . Def. . . / / / Cor. . . / / / / Def. . . / / / Def. . . Def. . . / Def. . . Def. . . / / / / / / / Figure 6.3: Relationships between concepts relating to causal models:

Theblack arrows denote implications while red (crossed out) arrows denote non-implications.The numbers label the counter-examples corresponding to each non-implication, whichare explained in the main text. The equivalence between “ ∃ a directed path from X to Y in G ” and ( X /⊥ d Y ) G do ( X ) is explained in the paragraph following Corollary 6.3.2. X —→ Y and X (cid:57)(cid:57)(cid:75) Y imply X Y since solid and dashed arrows are simply specialinstances of the more general, squiggly arrow by Deﬁnition 6.3.5. Non-implications 14, 17:

In a simple common cause scenario, i.e., Z —→ X and Z —→ Y with X = Y = Z , X does not aﬀect Y however X is correlated with Y and thereis no dashed arrow from X to Y .9. Non-implication 15:

It is evident that independence of X and Y does not imply thatthere is a dashed arrow between them, they can also be d-separated.10. Non-implication 16:

We can consider scenarios where two variables connected by adashed arrow are also connected by an alternate, correlating path such as a commoncause, and hence the dashed arrow does not by itself rule out the possibility of thevariables being correlated.

Example 6.3.1.

Consider the causal structure of Figure 6.4. Let the three variables S , E ad H be binary and correlated as H = S ⊕ E and S = E . These relations imply that H = deterministically while S = E . Now, when we intervene on E , we can choose its valueindependently of S and whenever we choose E ≠ S , we will see that H = occurs with non-zero HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

SE H

Figure 6.4: Aﬀects relation does not imply correlation:

This is a causal struc-ture for Example 6.3.1 which demonstrates a scenario where E aﬀects H even though P EH = P E P H , i.e., solid arrows can also be ﬁne-tuned and the ability to detect causationthrough an active intervention) does not imply that we will see correlation upon passiveobservation. probability. In other words, there exists a value e of E such that P ( H = ∣ do ( e )) ≠ P ( H = ) = i.e., E aﬀects H . As E is a direct cause of H in G , this further implies that the causal arrowfrom E to H is a solid one, even though E and H are independent in both the pre and post-intervention causal models i.e., ( E Æ H ) G and ( E Æ H ) G do ( X ) both hold. The former since H is deterministic in the original causal model, irrespective of the value of E and the latter duesince H is uniform in the post-intervention model, again irrespective of the value of E , due tothe ﬁne-tuned nature of the correlations. Therefore the existence of an aﬀects relation betweentwo sets of observed variables does not imply correlation between them either in the pre or thepost intervention causal model. Further, S does not aﬀect H since the exogeneity of S impliesthat P G do ( S ) ( H ∣ S ) = P ( H ∣ S ) (Corollary 6.3.1), and the independence of S and H in G gives P ( H ∣ S ) = P ( H ) . Example 6.3.2 (Jamming) . Consider the causal structure of Figure 6.5a where B (cid:57)(cid:57)(cid:75) A , B (cid:57)(cid:57)(cid:75) C and the RVs A and C share an unobserved common cause Λ . By Deﬁnition 6.3.5of the dashed arrows, we have B does not aﬀect A and B does not aﬀect C . Suppose that B aﬀects the set { A, C } . When A , B and C are binary, a probability distribution compatible withthis situation is one where all 3 RVs are uniformly distributed and correlated as B = A ⊕ C ,where ⊕ stands for modulo-2 addition. Then, A and C individually carry no information about B but A and C jointly determine the exact value of B . In this case, B is a cause of A and of C but due to ﬁne-tuning, B and A as well as B and C are uncorrelated and there are no pairwiseaﬀects relations such that the causal inﬂuence of B on A (or B on C ) can only be detected whenall 3 variables are jointly accessed. The common cause is crucial to this example as explainedin Figure 6.5a, and the causal structure compatible with the distribution and aﬀects relationsof this example is not unique. An alternative causal structure would be one where one of thedashed arrows B (cid:57)(cid:57)(cid:75) A or B (cid:57)(cid:57)(cid:75) C is dropped, as we will see in Section 6.8. This example by itself makes no reference to space-time or the tripartite Bell scenario. However,for the particular embedding of the variables A , B and C in space-time where they are pairwisespace-like separated and taken to correspond to the output of Alice, input of Bob and output ofCharlie respectively, this becomes a special case of the tripartite jamming scenario of [105, 116] HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

BA CΛ (a)

BA CDΛ (b)

Figure 6.5: Some ﬁne-tuned causal structures: (a)

The jamming causal structureof Example 6.3.2. Note that the common cause Λ is essential to this example, becausewithout Λ, A and C would be d-separated given B which would imply the conditionalindependence P AC ∣ B = P A ∣ B P C ∣ B . The dashed arrows would imply the independence of A and B as well as C and B and hence the observed distribution would factorise as P ABC = P A P B P C . Then no pairs of disjoint subsets of { A, B, C } would aﬀect each othercontrary to the original example. (b) Causal structure for Example 6.3.3 where B aﬀects D even though there is no solid arrow path from B to D . (Figure 6.2). In the rest of the chapter, such examples, where an RV has dashed arrows to aset of RVs will be referred to as instances of “jamming” in accordance with the terminology of[105], irrespective of the space-time conﬁguration. We will further discuss the relation of suchcausal models to space-time structure later in the chapter, and will also revisit Example 6.3.2several times in this process.

Example 6.3.3.

Consider a causal model over observed variables { A, B, C, D } associated withthe causal graph G given in Figure 6.5b. Here, there are no pairs of variables sharing an edgesuch that one of them aﬀects the other. A correlation compatible with this graph is obtained bytaking B = A ⊕ C = D where all variables are binary and uniformly distributed. Here, B aﬀects D even though there are no solid arrow paths from B to D . Here we provide examples that better illustrate some of the deﬁnitions and rules of the frame-work laid out so far. In particular, how one can deduce the conditional independences andaﬀects relations in a given causal model. We brieﬂy summarise these below. Barring the slight change of notation: In Figure 6.2 A and C correspond to the inputs of Alice and Charliewhile X and Z correspond to the outputs that are jammed by B . We don’t make a distinction between inputsand outputs in general since we will also consider situations where the jamming variable is not exogenous forexample. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES • Conditional independences:

Given a causal graph G with the set of observed nodes S ,some of the conditional independences satisﬁed by the joint distribution P S can be iden-tiﬁed using Deﬁnition 6.3.2 i.e., by listing all the conditional independences implied by d-separation relations in G . Further independences may be found if there are dashed arrowsemanating from exogenous nodes, since X (cid:57)(cid:57)(cid:75) Y implies X does not aﬀect Y (by Deﬁ-nition 6.3.5) which implies X Æ Y if X is exogenous (c.f. Corollary 6.3.1). Lemma 6.3.1can also be used to list further independences not directly implied by d-separation in G . There may still be more conditional independences in P S that cannot be listed usingthe methods mentioned above, for instance, in Example 6.3.2, we also have A Æ C butthis neither follows from d-separation or the dashed arrow structure. Since we allow forﬁne-tuning, there could be arbitrarily many independences in P , but those mentionedabove are suﬃcient for compatibility with the causal model.• Aﬀects relations:

Some of the aﬀects relations can be identiﬁed from the pre and post-intervention causal models using the fact that correlation between two disjoint sets X and Y of observed nodes in the post-intervention graph G do ( X ) implies that X aﬀects Y (Lemma 6.3.2). Note however that an independence ( X Æ Y ) G do ( X ) in the post-intervention model need not imply that X does not aﬀect Y (i.e., a non-aﬀects relation)unless X is exogenous (c.f. non-implication 2 of Figure 6.3 and Corollary 6.3.1), but thed-separation ( X ⊥ d Y ) G do ( X ) in the post-intervention model does imply X does not aﬀect Y (Corollary 6.3.2). Therefore one can check for non-independences and d-separationsin the post-intervention causal model to identify aﬀects and non-aﬀects relations. Again,due to ﬁne-tuning, this identiﬁcation may not be exhaustive.In case some or all of the causal mechanisms are also given in addition to the observed dis-tributions, it may be possible to identify further independences and aﬀects relations in themodel. In the jamming causal structure G jam of Figure 6.5a and Example 6.3.2, Deﬁnition 6.3.2 doesnot impose any conditional independences on the observed distribution P ABC since Λ is un-observed. However, from Deﬁnition 6.3.5 of dashed arrows we know that B aﬀects neither A nor C individually and we are given that B aﬀects { A, C } . Using the exogeneity of B (c.f.Corollary 6.3.1), this implies the independences A Æ B and C Æ B and the non-independence B /Æ { A, C } in G jam . Now, consider an intervention on A . The post-intervention causal struc-ture G jam do ( A ) only has the edges B (cid:57)(cid:57)(cid:75) C and Λ C (along with I A —→ A of course). Thed-separation ( A ⊥ d C ) G jam do ( A ) implies the independence ( A Æ C ) G jam do ( A ) and also that A does notaﬀect C . Similarly, we can derive C does not aﬀect A , A does not aﬀect { B, C } , C does notaﬀect { A, B } and { A, C } does not aﬀect B . Further, using Lemma 6.3.1 and the exogeneity of B , we can derive { A, B } does not aﬀect C as follows. In the causal structure G jam do ({ A,B }) , A is If Λ in Figure 6.5a were observed, A and C would be d-separated given { B, Λ } and we would have theconditional independence P AC ∣ B Λ = P A ∣ B Λ P C ∣ B Λ . HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES d-separated from B and C , while B and C are independent of each other due to the exogeneityof B and the dashed arrow connecting them. Using the lemma, this gives ({ A, B } Æ C ) G jam do ({ A,B }) which can be explicitly written as P G jam do ({ A,B }) ( c ∣ a, b ) = P G jam do ({ A,B }) ( c ) ∀ a, b, c . The left hand sideequals the do-conditional P ( c ∣ do ( a ) , do ( b )) by deﬁnition and the right hand side can be sim-pliﬁed in the following two steps. Firstly as P G jam do ({ A,B }) ( c ) = P G jam do ( A ) ( c ) noting that G jam do ({ A,B }) and G jam do ( A ) are eﬀectively the same graph due to the exogeneity of B . Then the d-separation ( A ⊥ d C ) G jam do ( A ) implies the independence P G jam do ( A ) ( c ∣ a ) = P G jam do ( A ) ( c ) ∀ a, c , which along with A does not aﬀect C (as noted earlier) gives P G jam do ( A ) ( c ) = P G ( c ) ∶= P ( c ) ∀ c . Together, this gives P G jam do ({ A,B }) ( c ∣ a, b ) = P ( c ) ∀ a, b, c i.e., { A.B } does not aﬀect C . Similarly, one can obtain { B, C } does not aﬀect A . In the causal structure of Figure 6.6a, the independence A Æ C follows from Deﬁnition 6.3.2,while A Æ B and C Æ B follow from the dashed arrow structure. These are the same inde-pendences as the case in the previous example of jamming with unobserved Λ (where A Æ C was an additional independence in the jamming example but follows from d-separation in thiscase). Thus the distribution P ABC from Example 6.3.2 is compatible with both the jamming(Figure 6.5a) as well the ﬁne-tuned collider (Figure 6.6a) causal structures. However inter-ventions on the two causal structures yield diﬀerent results. We have { A, C } aﬀects B for theﬁne-tuned collider (since { A, C } consists of exogenous nodes and is correlated with B ) but notfor the jamming case. We also have A aﬀects { B, C } and C aﬀects { A, B } for the ﬁne-tunedcollider even though A and C do not individually aﬀect B due to the dashed arrow structure.This follows from the exogeneity of A and C and the joint correlations A = B ⊕ C . Further, { A, B } does not aﬀect C since these sets become d-separated upon intervention on { A, B } andby a similar reasoning, { B, C } does not aﬀect A and B does not aﬀect { A, C } (in contrast withthe jamming case where B aﬀects { A, C } ). Consider the cyclic causal structure G fl (“ﬂ” stands for functional loop) of Figure 6.6b alongwith the following classical causal mechanisms where all 4 variables are taken to be binary: A = Λ, B = A ⊕ C , C = B ⊕ Λ, where the exogenous variable Λ is uniformly distributed. Onecan check that the distribution P ABC obtained through these mechanisms would be the same asthat of the jamming as well as the ﬁne-tuned collider examples above, but the aﬀects relationsdiﬀer. Firstly, in the causal model of G fl do ( A ) , Λ is no longer a parent of A , but using theremaining causal mechanisms B = A ⊕ C and C = B ⊕ Λ (which remain the same), we can stillobtain A = Λ. Therefore the intervention on A does not change the observed distribution and A and B continue to be independent in G fl as well as G fl do ( A ) , and in both graphs the marginal Note that this is essentially the one-time pad example from earlier.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

BA C (a)

BA CΛ (b)

AB DC (c)

Figure 6.6: Some ﬁne-tuned and/or cyclic causal structures: (a)

A ﬁne-tunedcollider (b)

A functional causal loop (c)

An aﬀects causal loop distributions over A , B and C are uniform, which gives A does not aﬀect B . On the otherhand, B does not aﬀect A can be established simply from the d-separation ( B ⊥ d A ) G fl do ( B ) . Inthe causal model of G fl do ( C ) , neither B nor Λ are parents of C but the remaining mechanisms A = Λ and B = A ⊕ C give C = B ⊕ Λ. Again, the observed distribution here is the same asthe pre-intervention distribution, which gives C does not aﬀect B . By a similar argument, B does not aﬀect C can also be established. Further, we have both B aﬀects { A, C } (as in thejamming case) and { A, C } aﬀects B (as in the ﬁne-tuned collider) since P ( b ∣ do ( a ) , do ( c )) and P ( a, c ∣ do ( b )) are deterministic while P ( b ) and P ( a, c ) are uniform. We also have A aﬀects { B, C } and C aﬀects { A, B } as in the ﬁne-tuned collider, which can be veriﬁed using the causalmechanisms given. As in the jamming case, we also get { A, B } does not aﬀect C and { B, C } does not aﬀect A . Since we have no pair of variables in { A, B, C } that aﬀect each other, thisdoes not correspond to an aﬀects causal loop (Deﬁnition 6.3.6). Furthermore, even though wehave directed cycles and the cyclic aﬀects relations B aﬀects { A, C } and { A, C } aﬀects B , thevariables A , B and C can be embedded in Minkowski space-time such that this causal modeldoes not lead to signalling superluminally, as we will see later (Figure 6.10). Consider the causal structure of Figure 6.6c. Applying Deﬁnition 6.3.5 of solid arrows, tothis causal structure we have A aﬀects B , B aﬀects C , C aﬀects D and D aﬀects A , whichforms an aﬀects causal loop. The conditional independences that follow from d-separationare A Æ C ∣{ B, D } and B Æ D ∣{ A, C } and any joint distribution satisfying these would becompatible with the causal structure. To further illustrate the kind of causal loops allowedin this framework, consider the pairwise correlations A = B , B = C , C = D and D ≠ A .Since this system of equations has no solutions, there exists no joint distribution P ABCD fromwhich the pairwise marginals producing these correlations can be obtained. Such examples Note that in the absence of the causal mechanisms, many of the aﬀects/non-aﬀects relations may not beidentiﬁable. For example, to deduce that { A, B } does not aﬀect C in the jamming case, we used Lemma 6.3.1along with the fact that B was exogenous in G jam . But the same argument cannot be applied here since B isnot exogenous in G fl . HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES correspond to grandfather type paradoxes and cannot be modelled in frameworks that demandthe existence of a valid joint probability distribution over all variables involved in a causal loop.On the other hand, examples of solid arrow directed cycles where the functional dependencesof the loop variables admit solutions such as A = B = C = D (with any probability) or theexamples considered in [161] for other cyclic causal structures can be modelled in our framework.Additionally, there can also be aﬀects causal loops that do not involve any solid arrows, forexample through a concatenation of structures such as that of Figure 6.5b. We discuss causalloops in more detail in Appendix 6.8.1, also in the case of quantum causal structures. We model space-time simply by a partially ordered set T without assuming any further struc-ture/symmetries. A particular example of T is Minkowski space-time, where the partial ordercorresponds to the light-cone structure and the elements of T can be seen space-time coordi-nates in some frame of reference. Our results will only depend on the order relations of T andnot on the representation of its particular elements. To make operational statements about T , we must embed physical systems into it. In our case, we can only do so for the observedsystems in the causal model which are random variables. We embed them in this space-timeby assigning an element of T to specify the space-time location of each variable, and refer tosuch variables as ordered random variables or ORVs. Here, the order of an ORV correspondsto that of the space-time T (and not of the causal model). Deﬁnition 6.4.1 (Ordered random variable (ORV)) . Each ORV X is deﬁned by a pair X ∶=( R (X ) , O (X )) where R (X ) (equivalently denoted by the corresponding, non-calligraphic letter X ) is a random variable and O (X ) ∈ T speciﬁes the location of X with respect to a partiallyordered set T . Notation:

We use ≺ , ≻ and ⊀⊁ to denote the order relations for a given partially ordered set T ,where for α , β ∈ T , α ⊀⊁ β corresponds to α and β being unordered with respect to T . Thisis not to be confused with α = β which corresponds to the two elements being equal. Theserelations carry forth in an obvious way to ORVs and we say for example that 2 ORVs X and Y are ordered as X ≺ Y iﬀ O (X ) ≺ O (Y) . We will also use X = Y as short hand for X = Y and O (X ) = O (Y) . Deﬁnition 6.4.2.

The future of an ORV is the set

F (X ) ∶= { α ∈ T ∶ α ≻ O (X )} . Then, we say that an ORV Y lies in the future of an ORV X iﬀ O (Y) ∈ F (X ) . In a slight abuseof notation, we will simply write this as Y ∈ F (X ) , which is equivalent to

X ≺ Y . This can be seen as an abstract version of space-time random variables.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Deﬁnition 6.4.3.

The inclusive future of an ORV is the set

F (X ) ∶= { α ∈ T ∶ α ⪰ O (X )} . Note that

X ∈ F (X ) but

X ∉ F (X ) , hence the name “inclusive” future. Further, any probabili-ties written in terms of ORVs should be understood as being probability distributions over thecorresponding random variables.

We have encountered two types of order relations, one which is encoded in the arrows of thecausal structure and the other speciﬁed by the order relation ≺ of the space-time T . The former,which can include directed cycles corresponds to a pre-order while the latter is by construction apartial order. We now describe the compatibility condition will tell us when the order providedby a causal model is compatible with the order provided by the space-time T at the operationallevel, here the classiﬁcation of the arrows into solid and dashed arrows will play a role.In an operational theory, we must be able to consider scenarios where a random variable maybe copied, distributed or acted upon. For ordered random variables, we need to consider a setof locations in the embedding partial order at which copies of RVs can be accessed for furtherinformation processing. To this eﬀect, we ﬁrst propose the following deﬁnitions before deﬁningcompatibility. Deﬁnition 6.4.4 (Copy of a RV) . A random variable X ′ is called a copy of a random variable X if X = X ′ and X aﬀects X ′ . Hence every RV X is a copy of itself, we call this the trivialcopy of X .For ordered random variables X and X ′ , we will simply say X ′ is a copy of X whenever thecorresponding RV X ′ ∶= R (X ′ ) is a copy of the RV X ∶= R (X ) . Deﬁnition 6.4.5 (Accessible region of an ORV) . With each ordered random variable

X ∶=( R (X ) , O (X )) ordered with respect to some partially ordered set T , we can associate a subset R X ⊆ T called its accessible region , such that any copy X ′ of X must belong to this set i.e., O (X ′ ) ∈ R X for all copies X ′ of X , and R X is the smallest subset of T that has this property.We say that X is accessible in R (X ) .The accessible region has the property that R(X ′ ) ⊆ R(X ) for all copies X ′ of X (since thecopy of a copy is a copy), and X ∈ R X for all ORVs X . Remark 6.4.1.

Deﬁnition 6.4.5 allows us to naturally associate accessible regions to sets ofORVs in terms of the accessible regions of their individual members. The accessible region ofa set

S = {X , ..., X k } of ORVs is the smallest subset of T within which all the ORVs in theset can be jointly accessed i.e., R S = ⋂ X i ∈S R(X i ) . Each set S of ORVs can be thought of as HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES being comprised of one copy of each of its members, and its accessible region is the region ofspace-time where a copy of each ORV in the set can be accessed.

Deﬁnition 6.4.6 (Embedding) . An embedding of a set of RVs S in a partially ordered set T produces a corresponding set of ORVs S by assigning a location O (X ) ∈ T , and an accessibleregion R X ⊂ T to each RV X , which deﬁnes the associated ORV X = (

X, O (X )) .We now deﬁne what it means for a causal model to be compatible with the embedding partialorder T which captures the idea of no superluminal signalling with respect to the space-time T .Intuitively, the deﬁnition corresponds to the condition that “it is possible to signal everywhere in the future and nowhere else”. This is not the same as “it is possible to signal only to thefuture (but not necessarily to every point in the future)”. We will see that this subtle diﬀerenceplays a role in our results of the following sections (c.f. Remark 6.5.1). Deﬁnition 6.4.7 (Compatibility of a causal model with the embedding partial order ( compat )) . Let S be a set of ORVs ordered with respect to the partially ordered set T . Then a causalmodel over S is said to be compatible with the partial order T if the following conditions aresatisﬁed:1. For all X ∈ S , R X = F (X ) , and2. For all subsets S , S ⊆ S , such that no proper subset of S aﬀects S , S aﬀects S ⇒ R S ⊆ R S . Remark 6.4.2.

In the second statement above, the condition that no proper subset of S aﬀects S is required because whenever a set S of ORVs aﬀects another set S , any S ⊃ S will also trivially aﬀect S . In such situations, one is only interested in the non-trivial aﬀectsrelation S aﬀects S (given that no further proper subsets of S aﬀect S ) and not S aﬀects S ,since S ⊃ S could be arbitrarily large in this case. In this section, we derive necessary and suﬃcient conditions for a causal model to be compatiblewith a partial order T (Section 6.5.1) and for it to have no aﬀects causal loops (Section 6.5.2).In [116], a set of necessary and suﬃcient conditions for no causal loops in the bipartite andtripartite Bell scenarios have been proposed. The results of this section identify implicit as-sumptions in these claims [116] and provide necessary and suﬃcient conditions for compatibility In the case of single variables, X aﬀects Y implies that Y must be in the future of X in order to be compatiblewith the space-time. We don’t require Y to be in the future of every superset of X , which would trivially aﬀectit. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES with the space-time and for no aﬀects causal loops for arbitrary causal structures. The othertype of causal loops, functional causal loops cannot in general be detected operationally andtherefore cannot be ruled out. We then discuss the relationships between the various conditionsderived in our results in Section 6.5.3 and consider [116] in further detail in Section 6.6.2.

Theorem 6.5.1. [Necessary and suﬃcient conditions for compatibility with T ] Given a causalmodel over a set of ORVs S , the following condition ( cond ) is necessary for the causal modelto be compatible with the embedding partial order T according to compat (Deﬁnition 6.4.7). ∀ subsets S , S ⊆ S , such that no proper subset of S aﬀects S , ⋂ s ∈S F ( s ) ⊈ ⋂ s ∈S F ( s ) ⇒ S does not aﬀect S . (6.4) While cond alone is not suﬃcient for compat , along with the following additional requirementit provides a suﬃcient condition for compat ∀X ∈ S , F (X ) ⊆ R X (6.5)A proof of this theorem can be found in Appendix 6.8.2. Theorem 6.5.2. [Necessary and suﬃcient condition for no aﬀects causal loops] A necessarycondition for a causal model over a set S of RVs to have no aﬀects causal loops is that thereexists an embedding of S in Minkowski space-time T such that the corresponding ORVs S satisﬁes the condition cond of Equation (6.4) . cond along with the additional assumption thatany 2 distinct ORVs X and Y such that one aﬀects the other, cannot share the same locationin T , are suﬃcient for having no aﬀects causal loops in the causal model. A proof of the above theorem can be found in Appendix 6.8.2. Note that a subset of the aboveconditions, would already be suﬃcient for no aﬀects causal loops, since these causal loopsare deﬁned in terms of a chain of aﬀects relations involving single variables (and not sets ofvariables). Suppose that for every pair of aﬀects relations in a causal model, X aﬀects Y where X and Y are single variables, we can ensure that the corresponding ordered random variablessatisfy Y ∈ F (X ) , this would already be suﬃcient to conclude that the causal model has noaﬀects loops. However this is not suﬃcient for compatibility with the space-time since we mayhave X aﬀects { W, Z } for some variables W and Z in the model that are not individuallyaﬀected by X such that F (W) ⋂ F (Z) /⊆ F (X ) . Based on this, a third class of causal loops canbe deﬁned, those that can be operationally detected only when we consider aﬀects relations of

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES the form S aﬀects S where at least one of S and S is a set of cardinality greater than one.An example would be the cyclic causal structure of Figure 6.6b where { A, C } aﬀects B and B aﬀects { A, C } . With our current deﬁnition 6.3.6, this gets classiﬁed as a functional causalloop and not an aﬀects loop. This does not, however, aﬀect the results presented here and thefurther classiﬁcation and characterisation of such loops may be left for future work.Keeping this in mind and applying Theorem 6.5.2 to the bipartite and tripartite Bell exper-iments respectively, where the parties are space-like separated, we have Corollaries 6.5.1 and6.5.2. Corollary 6.5.1 (Bipartite Bell scenario) . Consider a causal model over the observed variables { A, B, X, Y } where A and B are exogenous, and X and Y share an unobserved common cause Λ . Suppose that T corresponds to Minkowski space-time. Let the observed ORVs deﬁned withrespect to T be such that A ⊀⊁ B , A ⊀⊁ Y , B ⊀⊁ X , X ⊀⊁ Y , A ≺ X and

B ≺ Y and the accessibleregion of every ORV coincides with its inclusive future. In this scenario, a set of suﬃcientconditions for the causal model to have no aﬀects causal loops are1. A does not aﬀect any subset of ORVs not containing X .2. B does not aﬀect any subset of ORVs not containing Y .3. X and Y do not aﬀect any other ORVs.1. and 2. are equivalent to the bipartite no signalling conditions (NS2) given the exogeneity of A and B . P X ∣ A ( x ∣ a ) ∶= ∑ y P X,Y ∣ A,B ( x, y ∣ a, b ) = ∑ y P X,Y ∣ A,B ( x, y ∣ a, b ′ ) ∀ x, a, b, b ′ ,P Y ∣ B ( y ∣ b ) ∶= ∑ x P X,Y ∣ A,B ( x, y ∣ a, b ) = ∑ x P X,Y ∣ A,B ( x, y ∣ a ′ , b ) ∀ y, a, a ′ , b (6.6) Hence NS2 along with 3. are suﬃcient for no aﬀects causal loops in this scenario.

Corollary 6.5.2 (Tripartite Bell scenario) . Consider a causal model over the observed variables { A, B, C, X, Y, Z } where A , B and C are exogenous, and X , Y and Z share an unobservedcommon cause Λ . Suppose that T corresponds to Minkowski space-time. Let the observed ORVsdeﬁned with respect to T be such that A ≺ X , B ≺ Y , C ≺ Z , F (X ) ⋂ F (Z) ⊆ F (B) , all otherpairs of observed ORVs are unordered with respect to T and the accessible region of every ORVcoincides with its inclusive future. In this scenario, a set of suﬃcient conditions for the causalmodel to have no aﬀects causal loops are1. A does not aﬀect any subset of ORVs not containing X .2. B does not aﬀect any subset of ORVs containing neither Y , nor the set {X , Z} .3. C does not aﬀect any subset of ORVs not containing Z .4. X , Y and Z do not aﬀect any other ORVs. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Given the exogeneity of A , B and C , 1., 2. and 3. are equivalent to the relaxed tripartiteno-signaling conditions (NS3’) of [116]. P X,Y ∣ A,B ( x, y ∣ a, b ) ∶= ∑ z P X,Y,Z ∣ A,B,C ( x, y, z ∣ a, b, c ) = ∑ z P X,Y,Z ∣ A,B,C ( x, y, z ∣ a, b, c ′ ) ∀ x, y, a, b, c, c ′ P Y,Z ∣ B,C ( y, z ∣ b, c ) ∶= ∑ x P X,Y,Z ∣ A,B,C ( x, y, z ∣ a, b, c ) = ∑ x P X,Y,Z ∣ A,B,C ( x, y, z ∣ a ′ , b, c ) ∀ y, z, a, a ′ , b, cP X ∣ A ( x ∣ a ) ∶= ∑ y,z P X,Y,Z ∣ A,B,C ( x, y, z ∣ a, b, c ) = ∑ y,z P X,Y,Z ∣ A,B,C ( x, y, z ∣ a, b ′ , c ′ ) ∀ x, a, b, b ′ , c, c ′ P Z ∣ C ( z ∣ c ) ∶= ∑ x,y P X,Y,Z ∣ A,B,C ( x, y, z ∣ a, b, c ) = ∑ x,y P X,Y,Z ∣ A,B,C ( x, y, z ∣ a ′ , b ′ , c ) ∀ z, a, a ′ , b, b ′ , c (6.7) Hence NS3’ along with 4. are suﬃcient for no aﬀects causal loops in this scenario.

An important point to note here is that the conditions are only suﬃcient and not necessary. Thisis because the Bell experiments considered in [116] correspond to particular embeddings of thecausal model where the measurement events are space-like separated, while our Theorem 6.5.2only requires the existence of a suitable embedding which need not be this particular one. Moreexplicitly, a violation of the no-signaling condition (6.6) in a Bell experiment is not necessarilya problem when the parties are not space-like separated, and if they are, they would needto be able to signal superluminally in more than one frame to create a causal loop from thisability to signal superluminally. But superluminal signalling in one frame need not imply thesame for other frames. Hence the no-signaling conditions (6.6) and (6.7) are are not necessaryfor ruling out causal loops in the bipartite and tripartite Bell scenarios, contrary to the claimof [116]. We further discuss [116] in the context of our framework in Section 6.6.2 providingexplicit counter examples to these claims. Furthermore, note that these corollaries only ruleout aﬀects causal loops and not all causal loops, they also do not guarantee compatibility withthe space-time. For example, a causal model whereby { A, X } aﬀects Y and { B, Y } aﬀects X while neither A nor X aﬀect Y and neither B nor Y aﬀect X , would create a directed cycle (afunctional causal loop) between X and Y and also lead to signalling outside the future for thespace-time conﬁguration of Corollary 6.5.1. This is not ruled out by the conditions of the abovecorollaries, but can nevertheless be obtained from Theorems 6.5.1 and 6.5.2. We now presentanother noteworthy corollary that can be derived from Theorems 6.5.1 and 6.5.2, a proof ofwhich can be found in Appendix 6.8.2. Corollary 6.5.3. [No aﬀects causal loops and compatibility with space-time] A necessary con-dition for a causal model over a set S of RVs to have no aﬀects causal loops is that ∃ an embed-ding of S in Minkowski space-time T such that the causal model is compatible (Deﬁnition 6.4.7)with T . This condition along with the additional assumption that any 2 distinct ORVs X and Y such that one aﬀects the other, cannot share the same location in T , are suﬃcient for havingno aﬀects causal loops in the causal model. Remark 6.5.1.

We have seen two distinct necessary and suﬃcient conditions for no causalloops: a) there exists an embedding in T such that cond (Equation (6.4)) holds (Theorem 6.5.2),and b) there exists an embedding in T such that the causal model is compatible with T (Corol-lary 6.5.3). These two conditions namely cond and compatibility with T are not equivalent, HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES which follows from Theorem 6.5.1 due to the additional assumption required in one directionof the proof. The diﬀerence has to do with whether or not the accessible region coincides withthe future for each ORV or is only a subset of it. The embedding for which a) holds may ingeneral be diﬀerent from the embedding for which b) holds, since an embedding also involvesa speciﬁcation of the accessible region (Deﬁnition 6.4.6). In the latter case, the accessible re-gion must coincide with the future by deﬁnition of compatibility (Deﬁnition 6.4.7), while inthe former case, it need not. In the case that only a) holds, we can have a jamming scenario(Figure 6.5a) where the jammed variables can be jointly accessed only within a strict subset R AC of their joint future F (A) ∩ F (C) . Then it is enough if this subset is contained in thefuture of the jamming variable B and not the entire joint future. Note that modifying Deﬁni-tion 6.4.7 of compatibility to require R X ⊆ F (X ) instead of R X = F (X ) , would not alter thefact that Theorem 6.5.1 would require an additional assumption along with cond for one ofthe directions, it would only alter the direction for which this is required. The results of Sections 6.5.1 and 6.5.2, along with related (non-)implications are summarised inFigure 6.7. All the implications are covered by the results of the previous sections as explainedin the Figure. Here, we describe counter examples to establish the 7 non-implications of theFigure, which can be grouped as follows.1.

Non-implications I, III:

While no aﬀects causal loops implies cond by Theorem 6.5.1,it does not imply that O (X ) ≠ O (Y) whenever X aﬀects Y . This is because we can havean acyclic causal model with non-trivial aﬀects relations that is embedded trivially inspace-time, i.e., where all the variables are embedded at the same space-time location.This explains non-implication I. Further, this trivial embedding of any causal model isclearly compatible with the space-time, which explains non-implication III.2. Non-implications IV, V:

These are immediately established by embedding the 2-cycleaﬀects loop (i.e., the two variable causal model with X aﬀects Y aﬀects X ) in space-timesuch that X and Y are assigned the same space-time location O (X ) = O (Y) . Such causalloops between variables located at the same space-time point do not violate the compati-bility condition 6.4.7 or lead to superluminal signalling, but they are rather uninteresting.3. Non-implications II, VI:

These are covered by the example provided in Remark 6.5.1:compatibility requires that the accessible region coincides with the future while cond onlyimplies that the accessible region is a subset of the future.4.

Non-implications VII, VIII:

This is because the accessible region (Deﬁnition 6.4.5)of an ordered random variable X is deﬁned only in terms of copies of X . Therefore wecan have an embedding for which X aﬀects another ORV Y which is not a copy of itselfand does not lie in its future, even if the accessible region of each variable is contained inthe variable’s future. Such an embedding would violate cond . HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES cond ∧ (X aﬀects

Y ⇒ O (X ) ≠ O (Y)) No aﬀects causal loops ∃ a compatible embedding in space-time cond cond ∧ (F (X ) ⊆ R X ∀X ∈ S)R X ⊆ F (X ) , ∀X ∈ S T hm. . . / II / I Cor. . . T hm. . . / III / IV T hm. . . T hm. . . / V / VI T hm. . . ( proof ) T hm. . . ( proof ) / VII / VIII

Figure 6.7: Illustrative summary of results pertaining to embeddings of causalmodels in a space-time.

The jamming scenario analysed in the literature [105, 116] has been considered only in the con-text of multipartite Bell scenarios, in particular the tripartite case, as discussed in Section 6.2.There, the inputs of all the parties are considered to be freely chosen and the parties share ajoint system Λ that provides the initial correlations for the Bell scenario. Additionally, jammingallows the input of one party to jointly signal to the outputs of a set of other parties, withoutsignalling to them individually. In the causal modelling paradigm, we have formalised freelychosen variables as parentless nodes in the causal structure, the notion of signalling through theaﬀects relation, and have modelled Λ as a common cause between outputs of all the parties. Inthe following, we will consider a particular class of jamming scenarios arising in the tripartiteBell experiment of [105, 116], those where Bob’s input jams the outputs of Alice and Charlie,while the other inputs and outputs do not feature in the correlations. In this section, we simplydenote these variables as B , A and C respectively. In Figure 6.5a, we have explained that a Even though the naming convention is slightly diﬀerent from the full tripartite Bell scenario of Figure 6.2

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES common cause Λ between A and C is necessary for producing jamming correlations in a causalstructure where the input B is parentless and the outputs A and C have no outgoing arrows.In the remainder of this section, we explore the possibility of superluminal signalling using sucha causal model. It would be illustrative to sketch a simple protocol that can be used to obtain the jamming cor-relations of Example 6.3.2, that can lead to superluminal signalling in the space-time embeddingconsidered in [105, 116]. This protocol captures the main intuition behind Theorem 6.6.1 thatwill follow, and is illustrated in Figure 6.8. It also illustrates that Figure 6.5a is not the uniquecausal structure that is compatible with the correlations and aﬀects relations of Example 6.3.2.

A protocol for jamming:

In Example 6.3.2, suppose that Λ is classical and all four variables,including Λ are binary and uniformly distributed. The output A is generated from Λ by ﬂippingits value whenever B = A = Λ otherwise i.e., A = Λ ⊕ B . C is set to alwaysbe equal to Λ irrespective of B . This immediately gives the required correlation B = A ⊕ C and reproduces the aﬀects relations of the example. Even though B does not aﬀect A , it wasphysically sent to A and even though B jointly aﬀects A and C , it was not sent to C . Thiscorresponds to the causal structure of Figure 6.5a but with the arrow B (cid:57)(cid:57)(cid:75) C removed, asshown in Figure 6.8. Equivalently, the same correlations and aﬀects relations can be generatedby setting A = Λ and C = Λ ⊕ B which would correspond to the causal structure of Figure 6.5abut with the arrow B (cid:57)(cid:57)(cid:75) A removed instead. In this example, without additional information,we cannot distinguish between the three causal structures where B has a dashed arrow to A ,to C or to both. All 3 causal structures (also illustrated in Figure 6.12) are compatible withthe correlations and the aﬀects relations of the example.Note that A and Λ together are enough to determine B . Therefore, in a space-time conﬁgurationsuch as that of [105, 116] where the space-time random variables A , B and C are pairwise space-like separated, and Λ is in the common past of A and C , information about the freely chosenvariable B can already be available at the space-time location of A . This is outside the futureof B , and can lead to superluminal signalling from the space-time location of B to that of A .More generally, note that { B, Λ } aﬀects A but neither B nor Λ aﬀect A (as denoted by thedashed arrows in Figure 6.8). By Theorem 6.5.1, A must be in the joint future of B and Λ for compatibility with the space-time. However, the space-like separation between A and B leads to a violation of this condition and hence to superluminal signalling, irrespective of thespace-time location assigned to Λ. The following theorem (a proof of which can be found inAppendix 6.8.2) establishes the possibility of superluminal signalling using an observed Λ in ageneral class of jamming scenarios. Theorem 6.6.1.

Consider a causal model over four nodes A , B , C and Λ , all of which areobserved. Let B be exogenous, A and C have no outgoing arrows and Λ be a common cause When causally modelling Bell scenarios, it is standard to take input variables to be exogenous and outputvariables to have no outgoing arrows, and be related by a common cause, as in Figures 3.1a and 6.2. However,one can explain the correlations and aﬀects relations of the Bell scenario using alternative causal models wherebythe inputs have incoming ﬁne-tuned arrows, and/or outputs have outgoing ﬁne-tuned arrows. These can alsobe analysed in our framework, but we will not consider them here.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

A B C Λ (a) BA CΛ (b)

A B C Λ P ABC (c)

Figure 6.8: A simple protocol that can lead to superluminal signalling in thejamming scenario of [105, 116]: (a)

Space-time embedding of the variables A , B , C and Λ suggested in [105, 116], where the joint future of the space-time variables A and C (blue region) is contained in the future of B . Here, B corresponds to the input ofa party, Bob while A and C are the outputs of Alice and Charlie respectively. In thisconﬁguration, [105, 116] claim that there will be no superluminal signalling since A and C are jointly accessible only in the future of B and B = A ⊕ C can not be computed from A or C individually. However, if Λ is observed, then our protocol explained in the maintext allows for superluminal signalling in this space-time conﬁguration, contrary to thisclaim. In the protocol, C = Λ and B = A ⊕ Λ freely chosen variable B can already bedetermined at the location of A which is outside the future of B and hence can lead tosuperluminal signalling. (b) The causal structure corresponding to the protocol. Notethat this causal structure is missing the dashed arrow from B to C that is present inthe causal structure of Figure 6.5a even though both explain the same correlations andaﬀects relations. (c) The correlations obtained from the protocol. of A and C . Further, suppose that B aﬀects the set { A, C } without aﬀecting its individualelements i.e., B jams the correlations between A and C . This causal model is not compatiblewith Minkowski space-time T i.e., leads to superluminal signalling when the variables A , B and C are embedded in T as proposed in [105, 116], irrespective of the space-time location of Λ .The proposed embedding is: A ⊀⊁ B , B ⊀⊁ C , A ⊀⊁ C , F (A) ⋂ F (C) ⊆ F (B) , and the accessibleregion of each observed ORV coincides with its inclusive future.

Remark 6.6.1.

The classical protocol described in this section can also be extended to jammingscenarios where Bob jams non-classical correlations between Alice and Charlie. For example,let Alice and Charlie share a bipartite state, measure it using the inputs A and C and obtainthe outcomes X and Z , and let B be an input of Bob that may be sent to one of them.Suppose, depending on B , we would like Alice and Charlie to share either correlations arisingfrom measurements on the Bell state ∣ ψ ⟩ = √ (∣ ⟩ + ∣ ⟩) or those arising from the samemeasurements on a diﬀerent Bell state, ∣ ψ ⟩ = √ (∣ ⟩ + ∣ ⟩) . For this, Alice and Charlie simplyneed to share the ﬁrst Bell state but one of the parties, say Alice must perform a controlled NOTon her local subsystem controlled on the value of B , prior to the measurement. Then the partieswould measure ∣ ψ ⟩ whenever B = ∣ ψ ⟩ whenever B = B can beused to decide whether Alice and Charlie share one relabelling of a PR box (Equation (3.5)) HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES or another. Note however that in a tripartite Bell scenario where all the parties are non-communicating, these correlations cannot be generated quantum mechanically or in boxworld,and therefore deﬁne a more general class of post-quantum theories [116, 186].

Here, we address the claim of [116] (Propositions 2 and 3) that the no-signaling conditions NS2(Equation (6.6)) and NS3’ (Equation (6.7)) are necessary and suﬃcient for no causal loops inthe bipartite and tripartite Bell scenarios respectively. The deﬁnition of no causal loops usedby [116] in the proof of this statement is the following, which we quote verbatim below.No causal loops occur, where a causal loop is a sequence of events, in which oneevent is among the causes of another event, which in turn is among the causes ofthe ﬁrst event.Here “events” correspond to measurement events, i.e., an input and output pair of space-time random variables both of which are assigned the same space-time location, and “cause”does not seem to be deﬁned explicitly. A rigorous formalisation of these deﬁnitions is oﬀeredby the present framework— events correspond to ordered (or space-time) random variables(Deﬁnition 6.4.1), causation and signalling (which are non-equivalent) are deﬁned in termsof a causal model (Deﬁnition 6.3.1 and 6.3.4) and diﬀerent types of causal loops have beendistinguished (Deﬁnition 6.3.6). Based on this, it appears that neither direction of these claimshold. In the following we provide concrete counter-examples to illustrate our points, and explainhow our results provide possible resolutions. Since the deﬁnitions of [116] appear to be relativelyambiguous, we note that our framework may not be the only possible way to formalise them,but we are not aware of any other.

Deﬁnition of causal loops:

As we have seen, ﬁne-tuning allows for two distinct types ofcausal loops (Deﬁnition 6.3.6), the aﬀects loops which can be detected at the observed leveland the functional loops that may in principle be operationally undetectable ( functional causalloops ). An example of the latter in the bipartite Bell scenario is provided in Figure 6.9a, wherethe observed distribution P XY AB over the parties’ inputs A and B , and outputs X and Y exhibits no correlations i.e., P XY AB = P X P Y P A P B , and there are no aﬀects relations i.e., noneof the causal inﬂuences can be operationally detected, when Λ is unobserved. Nevertheless, thecausal structure involves a (functional) causal loop, which would qualify as a causal loop also Similarly, one can devise protocols to realise other types of jamming correlations by allowing the underlyingmechanisms (like the choice of local operation prior to measurement) to depend on B in a way that this doesnot reﬂect in the local statistics. Further one can also consider jamming correlations where Bob’s output Y isnon-trivially involved in the process, which we do not analyse here as it is not directly relevant to the results ofthis thesis. HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

A B X Y Λ (a) A B X Y Λ (b) A B X Y Λ (c) Figure 6.9: Counter-examples to the claims of [116]:

All variables are assumedto be binary in these examples. (a)

Causal loops do not imply signalling:

Let the de-pendence of X and Y on their parents be X = Y ⊕ Λ and Y = X ⊕ Λ, with X and Y beinguniformly distributed. This, along with the d-separation condition (Deﬁnition 6.3.2)yields P XY AB = P X P Y P A P B , and the no-signaling conditions (6.6) trivially hold. Never-theless, we have a causal loop between X and Y that cannot be operationally detectedwhen Λ is unobserved. (b) Signalling does not imply causal loops:

When A and Y areembedded in space-time such that they are space-like separated, the causal model is notcompatible (Deﬁnition 6.4.7) with the space-time and leads to superluminal signalling.However, it has no causal loops. (c) Instantaneous measurements are problematic:

Letall the variables be uniformly distributed and related as X = A ⊕ Y ( X jams A and Y ), A ⊕ Λ = X ( A and Λ collide into X ). This gives Y = Λ (hence the solid arrow). If A and X are assigned the same space-time location by idealising the local measurement tobe instantaneous, then no superluminal signalling ensues despite the causal loop. Notethat even if we had a solid arrow 2-cycle with X aﬀects A aﬀects X , this need not pose aproblem (either to the free choice deﬁnition used in [116] or to superluminal signalling)when they are embedded at the same location. according to the deﬁnition of [116]. Clearly we could never rule out such causal loops basedon any operational conditions, especially using conditions that only restrict the correlations(such as NS2 and NS3’). We can only rule out causal loops of the aﬀects type as done inTheorem 6.5.2, where have taken into account correlations as well as an operational notionof causation deﬁned through active interventions. For this reason, even if we take “no causalloops” of [116] to mean “no aﬀects causal loops”, the no signalling conditions ( N S ) (6.6) and ( N S ′ ) (6.7) are neither necessary nor suﬃcient, which we discuss below with explicit examples. Suﬃcient conditions for no causal loops:

The no-signaling conditions

N S

N S ′ (6.7) are not suﬃcient for no causal loops (of either type) in the corresponding Bellscenarios, and there are a number of reasons for this. Firstly, in Theorem 6.6.1 we have shownthat the jamming scenario in the space-time conﬁguration of [116] can lead to superluminalsignalling if the common cause Λ is an observed variable. The scenarios for which we haveproven this result are a special case of the tripartite Bell experiment where the inputs ofAlice and Charlie, and the output of Bob are ignored, and they satisfy the tripartite no-signaling conditions N S ′ of [116]. This shows that N S ′ alone are not suﬃcient for rulingout superluminal signalling and subsequent causal loops that can arise, we have also seen that HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES there exists an explicit protocol (Figure 6.8) that achieves superluminal signalling when Λ isobserved. Furthermore, including this assumption still does not suﬃce to rule out causal loops,even in the bipartite Bell experiment, as shown in Corollaries 6.5.1 and 6.5.2. To rule outaﬀects causal loops, we also require that the output variables do not aﬀect any other variables.For an example of an aﬀects causal loop that is not ruled out by the no-signaling conditions

N S X and Y such that X —→ Y and Y —→ X with X = Y . Since the inputs A and B play no role here,no-signaling is trivially satisﬁed despite the causal loop that allows X to aﬀect Y and Y in turnto aﬀect X . In addition, the inclusion of these extra assumptions still falls short for ruling outall causal loops and superluminal signalling. For the latter, as mentioned in Section 6.5.2, werequire further assumptions involving aﬀects relations such as { A, X } does not aﬀect Y . Wenote that a previous work by Baumeler, Degorre and Wolf [20] also illustrates that non-classical,non-signaling correlations in the bipartite Bell scenario admit an alternative explanation usingcausal loops, pointing to insuﬃciency of the no-signaling conditions for ruling out causal loops. Necessary conditions for no causal loops:

We also noted in Section 6.6.1 that the no-signaling conditions are not necessary for no causal loops in the bipartite and tripartite Bellscenarios. Consider the causal structure of Figure 6.9b where we have the usual bipartite Bellscenario, and in addition that Alice’s input A aﬀects Bob’s output Y . This violates the no-signaling conditions. The particular embedding of this causal model in Minkowski space-time T where the space-time random variables A and Y are space-like separated is not compatible withthe space-time and leads to superluminal signalling. However, the causal structure is clearlyacyclic and hence the causal model has no causal loops (in this case both of the aﬀects or thefunctional kind). Further, there clearly exists a compatible embedding of the causal model in T ,this would be one where Y is taken to be in the future of A (in accordance with Corollary 6.5.3).The fact that superluminal signalling does not imply causal loops has been pointed out in [116]itself. If superluminal signalling were allowed only in one special reference frame, this cannotbe used to create causal loops. The conclusion that the no-signaling conditions are necessaryfor no causal loops appears to be at odds with this observation, and the proof of [116] seems tobe implicitly assuming that superluminal signalling in one frame would enable such signallingin all frames.

Instantaneous measurements:

As an idealisation, measurements are considered to be in-stantaneous in [116], such that the input and output variables of each party are embedded atthe same space-time location. While this is an idealisation that is often made, in the presentcase it must be treated with caution, since causal loops (even of the aﬀects type) between vari-ables at the same space-time location do not necessarily lead to superluminal signalling (c.f. Though we have not explicitly considered reference frames in this work, they could in principle be taken intoaccount by introducing a partially ordered set for every choice of reference frame, such that the space-time ischaracterised by a collection of partially ordered sets. And compatibility of a causal model with the space-time(Deﬁnition 6.4.7) would correspond to compatibility (or no signalling outside the future) with all these partiallyordered sets, such that signalling outside the light cone even in one frame would be a violation of compatibility.

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

A B C

Figure 6.10: A space-time embdedding for the cyclic causal model of Fig-ure 6.6b:

In the causal model, we have B aﬀects { A, C } and { A, C } aﬀects B . In thisabove space-time embedding, the future of the space-time random variable B exactly co-incides with the joint future of A and C (blue region) i.e., the causal model is compatiblewith the space-time (Deﬁnition 6.4.7) for this special embedding and does not lead tosuperluminal signalling even though it contains functional causal loops. Note that foraﬀects causal loops, the only compatible embedding in space-time is the one where allthe loop variables are assigned the same space-time location (c.f. Corollary 6.5.3). Theorem 6.5.2, Figure 6.9c). In the bipartite Bell scenario, when Alice’s input and output areat the same space-time location, this allows for an aﬀects causal loop A —→ X and X —→ A ,while still satisfying free choice in the sense of: a free variable is not correlated with variablesoutside its (inclusive) future. This is yet another reason for the insuﬃciency for the no-signalingconditions for ruling out causal loops. From a physical standpoint, it would be natural to expectthat in a causal theory, non-trivial operations cannot be performed instantaneously and someframeworks for causality, such as [172] explicitly forbid this. Note that the suﬃciency proof ofTheorem 6.5.2 explicitly forbids such instantaneous embeddings. Causal structure vs space-time structure:

It is important to point out that the directedgraphs presented here diﬀer from those of [116] in what they represent. In [116], arrows betweenspace-time variables (which are always represented as solid)

X —→ Y represent the space-timestructure i.e., that Y is in the future of X . This does not necessarily mean that X is a cause of Y . In this thesis, arrows are used to represent the causal structures such that X —→ Y standsfor X is a cause of Y (Deﬁnition 6.3.1). If the causal model is compatible with the space-time,this would imply that Y is indeed in the future of X , but apriory this need not be the casein our framework. [116] assumes that for any two space-time random variables X is a causeof Y implies that Y is in the future of X . Again the deﬁnition of “cause” is important here,since this need not hold for ﬁne-tuned causes. For example if X is a ﬁne-tuned cause of Y (i.e., X (cid:57)(cid:57)(cid:75) Y ), Y being in the future of X is not necessary for ruling out superluminal signalling insuch cases (Theorem 6.5.1). Formalising this notion in terms of the aﬀects relation (introducedin Deﬁnition 6.3.4) we have X aﬀects Y implies that Y is in the (inclusive) future of X . Notethat this follows from our compatibility condition (Deﬁnition 6.4.7). Free choice:

Free choice of inputs in the Bell scenario is crucial to the arguments of [116]since without this, outputs of parties can be correlated with inputs of space-like separated

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES parties through common causes, hence violating the no-signaling conditions without leading tocausal loops. The commonly used deﬁnition of free choice proposed in [61] needs to be modiﬁedwhen considering jamming correlations in multipartite Bell scenarios as noted in [116]. Theformer requires each party’s input to only be correlated with variables in its future while thelatter allows Bob’s input in the tripartite Bell experiment to be correlated with Alice andCharlie’s outputs jointly even though the each output is space-like separated to the inputvariable. Accordingly, the deﬁnition will change depending on the scenario considered, anddoes not generalise easily. Here we have modelled free choice of a random variable by taking itto be exogenous in the causal graph, which generalises readily to arbitrary causal structures.Though admittedly, this is not the only way to model free choice in causal structures.Modellingfree choice is not a major concern in this work since our main results, Theorems 6.5.1 and 6.5.2do not assume any form of free choice and apply to arbitrary causal models.

Remark 6.6.2.

It can be shown that deﬁning free choice through exogeneity would recoverthe free choice deﬁnitions of [61, 116] in the Bell scenarios when the causal model is embeddedcompatibly in the space-time (using the results of Section 6.5.2). However, the converse is nottrue since the latter are deﬁned in terms of correlations and the lack of correlation does notimply a lack of causation, specially when we allow for ﬁne-tuning. Therefore the deﬁnitions arenot equivalent. Other inequivalent deﬁnitions may also be possible. We leave for future work,a deeper exploration of the relationship(s) between the diﬀerent notions of free choice that canarise in the presence of ﬁne-tuning.

D-separation and aﬀects relations:

The aﬀects relation (Deﬁnition 6.3.4), based on thenotion of interventions which is crucial for distinguishing between correlation and causation. Inacyclic causal structures [160, 115] and in classical cyclic causal structures [84], existing frame-works completely describe how the post-intervention distribution can be calculated, from theobserved distribution and/or the underlying causal mechanisms. In non-classical cyclic causalstructures, such a characterisation is not available. In Section 6.3, we have used the d-separationcondition (Deﬁnition 6.3.2) on the observed distribution to obtain a partial characterisationwhich suﬃces for the current purpose, but this does not fully specify the post-interventiondistribution. In Appendix 6.8.1, we outline a possible method for doing so, given the under-lying causal mechanisms. We note that this method may not always recover the d-separationcondition 6.3.2. In the classical cyclic case, it has been shown that the d-separation conditionis recovered whenever all the variables are discrete and the causal mechanisms of the modelsatisfy a certain property known as ancestrally unique solvability (discussed in Appendix 6.8.1).All the examples presented in the main chapter are either acyclic or cyclic and satisfy theseproperties. Therefore our main results are not impacted by making d-separation a deﬁningcondition. Deﬁning the framework without this assumption, using the causal mechanisms asprimitives would only generalise it, and could potentially be of independent interest as a generalframework for causal modelling. Further, another observation made in Appendix 6.8.1 is thatthe presence of causal loops could allow us to distinguish between a faithful, non-classical expla-nations vs unfaithful classical explanations (e.g., using non-local hidden variables) of quantum

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES correlations, which cannot be operationally distinguished otherwise. Formalising this intuitionwould provide another interesting line of investigation stemming from this work.

Causal loops and paradoxes:

A full formalisation of our causal model framework, in amanner described in the above paragraph would allow us to make a precise comparison withthe two main types of closed timelike curves or CTCs that have been proposed in the literaturenamely, Deutsch’s CTCs (DCTCs) [75] and post-selected CTCs (PCTCs) [25, 199, 137, 138].These CTCs have provided insights into complexity classes in information theory, known to havediﬀerent computing powers [2, 138] and to provide diﬀerent resolutions to the grandfather andunsolved theorem paradoxes [138]. In our framework, grandfather type paradoxes are forbiddenby the assumption that a valid joint probability distribution observed variables indeed exists.An example of a paradoxical scenario is a 2-cycle between X and Y where the inﬂuence X —→ Y deﬁnes the functional dependence Y = X and Y —→ X gives the dependence X ≠ Y . Theseequations are mutually inconsistent and there is no joint distribution P XY compatible withthese dependences. This consistency condition appears to be similar in spirit to that ofPCTCs where the paradoxical scenario is never compatible with the post-selection i.e., occurswith zero probability. The unproved theorem paradox on the other hand, can depend on howthe framework is formalised. For example, in classical cyclic causal models, an assumptionregarding the unique solvability of the underlying functional dependences is often made. Inparticular, this could be seen as the requirement that any information involved in a loop (suchas the unproved theorem) must be fully determined by the mechanisms of the causal modelthereby eliminating the paradox of a proof that “came from nowehere”.

Causal inference in the presence of ﬁne-tuning:

The no ﬁne-tuning assumption, in someform or the other is required by all causal discovery algorithms, which are algorithms that inferan unknown causal structure from observed correlations, interventions and other additionalinformation that may be available. Relaxations to the assumption have been considered wherecertain forms of ﬁne-tuning have been allowed [227]. In the general case where arbitrary amountof ﬁne-tuning is allowed, it can be fundamentally impossible to infer the causal structure sincewe have an arbitrary number of ﬁne-tuned causal arrows that produce absolutely no observableeﬀects. Take for instance the example of Figure 6.9a, where there is a ﬁne-tuned causal loopeven though no correlations can be seen between the observed variables even when we interveneon them. Therefore, to address the problem of causal inference in our framework i.e., inthe presence of cycles, ﬁne-tuning and non-classical systems, we would need to make certainassumptions on the kind of ﬁne-tuning that is allowed. One reasonable assumption could bethat whenever there is a ﬁne-tuned arrow between two variables X (cid:57)(cid:57)(cid:75) Y that cannot bedetected when considering those variables alone, the causal inﬂuence must become detectablewhen we consider additional variables along with X and Y , as is the case in the jamming andcollider examples of Section 6.3.5. This is another interesting direction for future work. This is similar in structure to the Liar’s paradox, e.g., “this statement is false” or more generally a liarcycle. Interestingly, this will crop up again in our discussion of multi-agent paradoxes and contextuality inChapter 8. The striking similarities between such paradoxes in philosophy, time-travel, quantum contextualityand multi-agent scenarios have also been pointed out in [77].

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Jamming theories and physical principles:

An important motivation for analysing post-quantum theories and jamming type scenarios constrained by relativistic principles is to gaindeeper insights into the physical principles that single out quantum theory. One property ofjamming theories that becomes suggested by our results is that they would involve some formof ﬁne-tuning as well as variables and inﬂuences that must remain fundamentally inaccessibleto observation. Even though jamming correlations in the space-like separated conﬁguration of[105, 116] are post-quantum phenomena, they are similar in spirit to non-local hidden variableexplanations of quantum theory such as the De broglie-Bohm pilot wave theory [27] which alsoinvolve ﬁne-tuned inﬂuences. Here the pilot wave which is not operationally accessible leadsto the operational predictions of quantum theory, but can involve non-local or superluminalinﬂuences at the ontological level. An interesting future direction would be to formalise theconcept of jamming theories in a similar manner to generalised probabilistic theories and con-sider other physical properties thereof. One diﬃculty in doing this is that the post-quantumjamming scenario requires a particular space-time conﬁguration, while GPTs can be formalisedindependently of a space-time structure. Disentangling the notions of causality and space-timethrough operational considerations, as done here, could be seen as ﬁrst step towards such amore general characterisation of jamming theories.

Indeﬁnite causal orders:

Going beyond the notion of a ﬁxed (but possibly unknown) causalstructure, causal orders that may be fundamentally indeﬁnite has been considered [154]. Hereprocesses are said to be causally separable if they admit an explanation in terms of a classicalmixture of ﬁxed causal order processes, and causally non-separable otherwise. Even thoughour framework assumes a ﬁxed causal and space-time structure, we note the following pointregarding causally separable processes. Consider a scenario where the causal order X —→ Y or Y —→ X between two classical random variables is determined probabilistically by a third,binary variable Λ. Let the ﬁrst causal order correspond to the dependence Y = X and the secondcorrespond to the dependence X = Y ⊕ X and Y . This can be modelled in thecausal structure of Figure 6.9a with the dashed arrows replaced with solid arrows, consideringonly the variables X , Y and Λ. The causal mechanisms would be X = Λ . ( Y ⊕ ) ⊕ ( Λ ⊕ ) .E X and Y = ( Λ ⊕ ) .X ⊕ Λ .E Y , where E X and E Y are mutually independent, uniformly distributedbinary variables. Conditioned on Λ = X aﬀects Y but Y does not aﬀect X i.e., P ( X ∣ do ( Y = y ) , Λ = ) = P ( X ∣ Λ = ) ∀ y and similarly P ( Y ∣ do ( X = x ) , Λ = ) = P ( Y ∣ Λ = ) ∀ x . However,when Λ is not given, it can be checked that X aﬀects Y and Y aﬀects X which is an aﬀectscausal loop. At ﬁrst sight this might seem counter-intuitive to the idea that classical mixturesof causal orders should be implementable in a lab by ﬂipping a coin to decide the order of twooperations. Such a physical implementation corresponds to a situation where one operation (say,that which generates X ) is performed at a time t and the other (which generates Y ) at time t > t when Λ = t and ﬁrst at t when Λ = X t —→ Y t , Y t —→ X t . This suggests theimportance of the space-time embeddings in understanding the physicality of such processes. Itwould be interesting to analyse the quantum counterpart of this example, namely the quantumswitch [53] (which implements a quantum controlled superposition of the orders) in a similar Note that the contradictory conditions X = Y and Y = X ⊕ HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES manner, through a suitable generalisation of our framework. Further, indeﬁnite causal ordersthat violate so-called causal inequalities [154] have also been proposed in frameworks that donot assume a ﬁxed background space-time structure (as we have assumed in the present work).However such frameworks appear to disallow certain types of causal loops that are allowed byours, and it would also be interesting to consider whether genuine causal inequality violationscan be modelled in our framework. Indeﬁnite space-time locations:

Physically, it is possible to implement scenarios wherea quantum system is superposed not only between spatial locations, but also in time. Thesespatio-temporal superpositions are required in the physical implementation of the quantumswitch mentioned above [174, 185, 172]. It would therefore be of interest to generalise ourframework to also consider unobserved systems that may be space-time delocalised. The frame-work currently uses the standard quantum formalism and the Born rule which only apply toquantum states and composite systems deﬁned at a single instant of time. Generalisations tothe standard formalism that deﬁne quantum states over space and time have been proposed anda particularly relevant one for this purpose would be the causal box formalism of [172] whichmodels quantum information processing mechanisms that act on such systems. In an ongoingcollaboration with members of ETH Zurich , we explore the relations between the causal boxformalism and indeﬁnite causality, outlining a way to recover a probability rule analogous tothe Born rule in certain cases. This suggests one possible way to generalise the results of thecurrent chapter to situations where the systems are delocalised in space and time. In Section 6.3 we have outlined how interventions and do-conditionals (i.e., the post interven-tion distribution) are deﬁned in our framework, and Theorem 6.3.1 provides some conditionsunder which the post and pre intervention distributions can be related. Ideally though, onewould expect that it should be possible to fully specify the post-intervention distribution if weare given all the underlying causal mechanisms involved in the causal structure. For example,in the classical case, the structural equations of the causal model [160] provide these causalmechasisms. Here for each node X in the causal structure, the dependence of X on its parentspar ( X ) corresponds to a stochastic map, which can be written in terms of a deterministic func-tion X = f X ( par ( X ) , E X ) by including an additional exogenous and unobserved error variable E X for each node X . This is called a structural equation . If the structural equations for all the When no assumptions are made, causal inequalities, like Bell inequalities can be trivially violated. Identi-fying a genuine violation hence depends on the assumptions under which the causal inequality is derived, andthese are not yet completely formalised for general scenarios. Vilasini, V., del Rio, L. and Renner, R.

Causality in deﬁnite and indeﬁnite space-times.

In preparation(2020). https://wdi.centralesupelec.fr/users/valiron/qplmfps/papers/qs01t3.pdf

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES nodes are known, then the complete post-intervention distribution can be calculated. This hasbeen shown to be the case for classical cyclic causal models in [84]. An intervention do ( x ) on X , corresponds to updating the structural equation for X to X = x while keeping the remainingstructural equations the same. Another important result for the classical case derived in [84]is that the d-separation property or the global directed Markov condition of Deﬁnition 6.3.2 isrecovered whenever all the random variables are discrete and the structural equations of thecausal model satisfy a property known as ancestrally unique solvability (anSEP). Roughly, thisproperty demands that the structural equation for each node must admit a unique solutiongiven the values of the node’s ancestors. We need not deﬁne this concept formally for ourpurposes here.Our goal would be to extend these ideas to quantum and post-quantum cyclic causal structures,where the causal mechanisms involve measurements and transformations on non-classical sys-tems, which cannot be expressed using deterministic structural equations. In the non-classicalcase, it is unclear what conditions allow for the d-separation condition to be recovered. Evento make this question precise in the non-classical case, one would need to specify the analogof structural equations for such causal models which is an open problem. Here, we present apossible method for achieving this by explaining it using the following example and sketchinghow it might generalise to a larger class of causal models. Example 6.8.1 (A quantum cyclic causal model) . Consider the cyclic variation of the bipartiteBell causal structure illustrated in Figure 6.11a. Let the common cause Λ correspond to theBell state ∣ ψ Λ ⟩ = √ (∣ ⟩ + ∣ ⟩) . Suppose that A and B are the settings of local measurementson the two subsystems such that when these variables take the value , it denotes a σ Z or com-putational basis ( {∣ ⟩ , ∣ ⟩} ) measurement on the associated subsystem, and the value denotesa σ X or Hadamard basis ( {∣+⟩ , ∣−⟩} ) measurement. X and Y are the binary outcomes of thesemeasurements where 0 and 1 for the Hadamard basis measurements denote the outcomes + and − respectively. The additional constraints coming from the causal loop are that B = X and A = Y . This speciﬁes all the causal mechanisms, how do we calculate the observed distribution P XY AB ? A method based on post-selection:

One method is to ﬁrst calculate the observed corre-lations for the speciﬁed state and measurements in the original Bell scenario (Figure 3.1a), andthen post-select on the observations that obey the loop conditions B = X and A = Y . Moreformally, this corresponds to transforming the original cyclic causal structure of Figure 6.11ato the acyclic causal structure of Figure 6.11b by cutting oﬀ the edges A X and

B Y andreplacing them with the edges A ∗ X and B ∗ Y by introducing two exogenous nodes A ∗ and B ∗ . Then the inputs A ∗ and B ∗ and outputs X and Y along with the shared system Λdeﬁne a Bell scenario, while the variables A = Y and B = X can simply be seen as local post-processings of the outcomes. We can then calculate the observed probabilities for this acycliccausal structure using the Born rule, and post-select on A ∗ = A and B ∗ = B , which eﬀectivelyachieves the post-selection A = Y and B = X in the original Bell scenario (Figure 3.1a). Theseprobabilities will not be normalised and one has to renormalise them to obtain the observeddistribution P XY AB . This is calculated in Figure 6.11 and can be used to ﬁnd all the aﬀectsrelations. An intervention on A would cut oﬀ the arrow from Y to A . We can then deduce that HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES A does not aﬀect X as follows. Since A is eﬀectively exogenous in the post-intervention causalstructure and Λ is a Bell state, A will be uncorrelated with X here i.e., P G do ( A ) ( x ∣ a ) = P G do ( A ) ( x )∀ a, x and both equal the uniform distribution. From the pre-intervention distribution calcu-lated in the last column of Figure 6.11d, we have that P ( X = x ) is also uniform which gives P G do ( A ) ( x ∣ a ) = P ( x ) ∀ a, x . Similarly B does not aﬀect Y can also be established. Further, { A, B } aﬀects { X, Y } because a joint intervention on A and B takes us back to the originalBell scenario in which these sets are correlated (c.f. Lemma 6.3.2). Similarly, we can alsoestablish that X aﬀects B and Y aﬀects A using the loop conditions A = Y and B = X whichimmediately yields { X, Y } aﬀects { A, B } . In this example, G do ( A,B ) corresponds to a quantumcausal structure (the Bell scenario) while G do ( X,Y ) is a simple classical causal structure for thecausal structure G of Figure 6.11a and the (observed) arrows of the ﬁgure can be classiﬁedinto dashed and solid arrows as: A (cid:57)(cid:57)(cid:75) X , B (cid:57)(cid:57)(cid:75) Y , X —→ B and Y —→ A . Note thatthis example corresponds to a functional causal loop as per Deﬁnition 6.3.6, and the observeddistribution satisﬁes the d-separation condition of Deﬁnition 6.3.2. The post-interventiondistribution is fully speciﬁed here because, all interventions (except the one on Λ alone) areassociated with acyclic post-intervention graphs and for interventions on the exogenous Λ, thepost and pre-intervention distributions coincide.

Applying the method to ﬁne-tuned explanations of non-classical correlations:

It isknown that certain non-classical correlations arising in the bipartite Bell causal structure cannotbe obtained in the same causal structure if the common cause Λ was classical. However, thesecorrelations can be easily generated in the classical, ﬁne-tuned causal structure of Figure 6.11c,which diﬀers from the original causal structure by the inclusion of ﬁne-tuned causal inﬂuencesfrom each party’s input to the other party’s output. We now explain how this is achieved andthen apply the post-selection method explained above to create a causal loop in Figure 6.11c byadding X —→ B and Y —→ A . This will demonstrate that, even though the same non-classicalcorrelations and aﬀects relations can be obtained in the original Bell causal structure and itsﬁne-tuned classical counterpart 6.11c, the two causal structures may behave diﬀerently in thepresence of causal loops.First consider the PR box (Equation (3.5)), which is one of the maximally non-classical corre-lations of the Bell causal structure. It is deﬁned by the condition X ⊕ Y = A.B where all thevariables are binary. This is easily generated in the classical causal structure of Figure 6.11cby the structural equations Λ = E , Y = E and X = E ⊕ A.B (where E is binary and uniformlydistributed). Other non-classical correlations can be obtained by adding some “noise” to thisPR box example. Let Λ = ( E, F ) correspond to two variables E and F both binary, and theformer distributed uniformly. Then the structural equations Y = E and X = E ⊕ F ⊕ A.B fordiﬀerent distributions over the exogenous variable F correspond to the PR box mixed with The observed d-separations here are A ⊥ d B ∣{ X, Y } and X ⊥ d Y ∣{ A, B } and the observed distribution P XY AB satisﬁes the conditional independences A Æ B ∣{ X, Y } and X Æ Y ∣{ A, B } . HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES diﬀerent levels of noise. X = A.B ⊕ E ⊕ F,Y = E. (6.8)Therefore, the causal mechanisms that allow us to produce non-classical correlations P XY AB inthe acyclic causal structure 6.11c are the functional dependences (6.8) along with a speciﬁcationof the distributions over the exogenous variables E and F that constitute Λ. E is uniform while F can vary depending on the correlation to be generated. We now construct the causal loopby including the additional arrows X B and

Y A and by eﬀectively post-selecting on theloop condition A = Y and B = X . These, along with the causal mechanisms (6.8) of the acycliccase deﬁne the mechanisms for the cyclic causal structure. We will now see that these causalmechanisms are incompatible with each other. We have Y = E , X = E ⊕ F ⊕ A.B , A = Y and B = X , which gives X = E.X ⊕ E ⊕ F and Y = E . Therefore for ( E, F ) = ( , ) , we have ( X, Y ) = ( , ) and for ( E, F ) = ( , ) we have ( X, Y ) = ( , ) . However for ( E, F ) = ( , ) we get X = X ⊕ ( E, F ) = ( , ) we get X = X whichis not a unique solution. Therefore if we demand unique solvability, we must require E = E is uniform. Even if we don’trequire uniqueness, we can not have ( E, F ) = ( , ) and forbidding this would make E and F correlated and non-uniform.Therefore, in the classical, ﬁne-tuned explanation of the Bell correlations, adding the loop isnot consistent with the causal mechanisms that generate the non-classical correlations in theabsence of the loop— in particular, they are in conﬂict with the preparation of the exogenousvariable Λ. If we have a consistent loop, then intervention on A and B will no longer recover theoriginal non-classical correlations. This is in contrast to the faithful case analysed in Figure 6.11(and explained previously in the text), when do ( A, B ) gives back the non-classical correlationsof the Bell scenario. This suggests that certain (non-local) hidden variable explanations forquantum correlations (in a Bell experiment) can in principle be distinguished from the expla-nation provided by standard quantum mechanics in the presence of causal loops. We have onlyshown this for a particular set of functions or causal mechanisms for generating the formerand it would be interesting to consider if this generalises, in particular to causal mechanismsprovided by Bohmian mechanisms [27], a non-local hidden variable theory. Generalising to other causal structures:

The idea behind the post-selection methodemployed for Example 6.8.1 above can in principle be generalised to other non-classical, cycliccausal structures where every directed cycle includes at least one edge

W Z connectingclassical nodes W and Z . The intuition is that cutting oﬀ such an edge in every direct cycleand replacing it with an edge W ∗ Z , by introducing an additional, exogenous variable W ∗ would result in a directed acyclic graph (DAG). One can then apply the generalised causal modelframework of [115] (reviewed in Section 2.5.2.1) to obtain the observed distribution in this DAGand then post-select on W = W ∗ for all the edges that were cut oﬀ. Then a way to recover Note that the model can be symmetrised by including an additional, uniformly distributed binary variable G in the description of Λ = ( E, F, G ) and using the structural equations X = E ⊕ ( G ⊕ )( A.B ⊕ F ) and Y = E ⊕ G ( A.B ⊕ F ) . HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

AX YBΛ (a) A A ∗ X YB B ∗ Λ (b) AX YBΛ (c)

X Y A B Measurements, outcomes P QMXY AB P XY AB σ Z ⊗ σ Z , ( , )

18 13 σ X ⊗ σ Z , (+ , )

116 16 σ Z ⊗ σ X , ( , +)

116 16 σ X ⊗ σ X , (− , −)

18 13 (d)

Figure 6.11: A cyclic quantum causal model: (a)

A cyclic variation of the bipartiteBell causal structure (Figure 3.1a). (b)

A method to calculate the observed distributionof (a) when Λ is non-classical involves this intermediate causal structure. This is obtainedfrom (a) by copying the nodes A and B and removing the directed cycle as shown. Thisgives an acyclic causal structure for which the distribution P XY ABA ∗ B ∗ can be calculatedusing known methods. Then, post-selecting on A = A ∗ and B = B ∗ gives the distribution P XY AB for the original cyclic causal structure of (a). (c)

A classical causal, ﬁne-tunedstructure that can generate, all non-classical correlations of the bipartite Bell causalstructure. Creating a causal loop in this case by adding the arrows

X B and

Y A does not lead to the same predictions as (a), which corresponds to adding these arrowsto the original Bell causal structure. This method explained in the main text. (d)

Thetable provides the observed distribution for Example 6.8.1 calculated using the proposedmethod. The only values of A , B , X and Y that are compatible with the loop conditions A = Y and B = X are those listed here, and the ﬁfth column lists the measurements andoutcomes that these values correspond to, according to Example 6.8.1. P QMXY AB denotesthe probabilities of the measurements and outcomes listed in the ﬁfth column calculatedusing the Born rule. These values are sub-normalised, and upon renormalisation, theobserved distribution P XY AB for the cyclic causal structure (a) is obtained. Note thatthe d-separation condition 6.3.2 is satisﬁed in this case. the d-separation condition (using the result of [84]) would be to check whether there existsa classical causal model for the same cyclic causal structure that produces identical observed

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES correlations and satisﬁes the anSEP property. Note that this classical causal model need notnecessarily yield the same post-intervention distributions. In the example of Figure 6.11a, anintervention on A and B gives the Bell scenario, which as we know produces non-classicalcorrelations that cannot be obtained in the corresponding classical causal model [225]. Finally,it would be interesting to compare this method with the framework of post-selected closedtime-like curves [138]. Remark 6.8.1.

We note that assuming the d-separation condition of Deﬁnition 6.3.2 as aprimitive property of the framework rules out certain cyclic causal structures from being de-scribed in our current framework. In the classical case, these are precisely those cyclic causalmodels that do not satisfy anSEP or those involving continuous random variables (due to theresult of [84]). A good example of such a causal model is that of [150], and [84] proposes a gen-eralisation of d-separation called σ − separation through which they derive a generalised globaldirected Markov condition that applies to classical causal models involving continuous variablesand/or do not satisfy anSEP. This reduces to d-separation in the acyclic case. Therefore, oneoption would be to replace d-separation with σ -separation in Deﬁnition 6.3.2 to generalise ourframework for cyclic causal models. Lemma 6.3.1.

The conditional independendence S Æ S ∣ S stands for P S S ∣ S = P S ∣ S P S ∣ S , whichimplies P S ∣ S S = P S ∣ S . (6.9)The 3 d-separation relations S ⊥ d S i for i ∈ { , , } imply that S is d-separated from everysubset of the union S ∪ S ∪ S . This implies the following independences by Deﬁnition 6.3.2of compatibility of the distribution P with the causal model represented by G , P S ∣ S ′ = P S ∀ S ′ ⊆ S ∪ S ∪ S . (6.10)Now consider the conditional distribution P S ∣ SS S . We have, P S ∣ SS S = P S SS S P SS S = P S P S ∣ S P S ∣ S S P S ∣ S S S P SS S = P S P S ∣ S P S ∣ S P S P S P S S = P S ∣ S , (6.11) HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES where we have used Equations (6.9) and (6.10) in the third line, noting that P S ∣ S S = P S ⇒ P SS S = P S P S S . Equation (6.11) is equivalent to P SS S ∣ S = P SS ∣ S P S ∣ S which denotes theconditional independence ( S ∪ S ) Æ S ∣ S . The conditional independence S Æ ( S ∪ S )∣ S canbe derived analogously due to the symmetry between S and S .Finally, we have P S ∣ SS = P S SS P SS = P S P S ∣ S P S ∣ S S P S P S = P S ∣ S , (6.12)and similarly P S ∣ SS = P S ∣ S . Together with Equation (6.11), this implies P S ∣ SS S = P S ∣ SS .This is equivalent to P S S ∣ SS = P S ∣ SS P S ∣ SS which denotes the ﬁnal conditional independence S Æ S ∣( S ∪ S ) . Theorem 6.3.1.

We ﬁrst note that the graph G do ( X ) diﬀers from G X only by the inclusionof the additional nodes I X i and corresponding edge I X i X i for each X i ∈ X . Therefore,the d-separation condition ( Y ⊥ d Z ∣ X, W ) G X for the latter implies the same condition ( Y ⊥ d Z ∣ X, W ) G do ( X ) for the former. Using the d-separation condition of Deﬁnition 6.3.2 this impliesthe conditional independence of Y and Z given { X, W } for the distribution P G do ( X ) compatiblewith G do ( X ) . Using the deﬁnition of the do-conditional (Equation (2.51)), P G do ( X ) ( y, z ∣ x, w ) ∶= P ( y, z ∣ do ( x ) , w ) = P ( y ∣ do ( x ) , w ) P ( z ∣ do ( x ) , w ) . This conditional independence automaticallygives the required Equation (6.1). Rule 2: G X,Z is the graph where all incoming arrows to X and outgoing arrows from Z areremoved in G . Hence, the d-separation condition ( Y ⊥ d Z ∣ X, W ) G X,Z implies that the only

HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES paths between Y and Z in the graph G X that are not blocked by X and W are paths involvingan outgoing arrow from Z . These are precisely the paths that get removed in going from G X to G X,Z , resulting in the required d-separation there. The same statement holds for the graph G do ( X ) (by the argument used in the proof of Rule 1), and also for the graph G do ( X ) ,I Z whichcorresponds to adding the nodes I Z i and edges I Z i Z i to G do ( X ) for each Z i ∈ Z . The latterholds true since the addition of the I Z i nodes and I Z i Z i edges cannot create any additionalpaths between Z and Y that are left unblocked by X and W . This implies that the only pathsbetween Y and the set I Z ∶= { I Z i } i not blocked by X and W in G do ( X ) ,I Z are paths from I Z ,going through Z and involving an outgoing arrow from Z i.e., paths involving the subgraph I Z Z ... . All these paths would get blocked when conditioning additionally on Z . Thisgives ( Y ⊥ d I Z ∣ X, W, Z ) G do ( X ) ,IZ , which through the compatibility condition (Deﬁnition 6.3.2)implies the conditional independence ( Y Æ I Z ∣ X, W, Z ) G do ( X ) ,IZ , equivalently expressed as P ( y ∣ do ( x ) , w, z, I Z = idle ) = P ( y ∣ do ( x ) , w, z, I Z = do ( z )) ∀ y, x, w, z. (6.13)By deﬁnition (see Section 6.3.2) we have P ( y ∣ do ( x ) , w, z, I Z = idle ) = P ( y ∣ do ( x ) , w, z ) and ( y ∣ do ( x ) , w, z, I Z = do ( z )) = P ( y ∣ do ( x ) , w, do ( z )) ∀ y, x, w, z . Along with Equation (6.13), thisgives the required Equation 6.2. In other words, once X , W and Z are given, Y does not dependon whether the given value z of Z was obtained through an intervention ( I Z = do ( z ) ) or passiveobservation (i.e., where I Z i = idle for all i , which is the causal model where no interventions aremade on elements of Z ). Rule 3:

Consider the graph G do ( X ) ,I Z which is the post-intervention graph with respect tothe nodes X augmented with I Z i Z i for all Z i ∈ Z . In this graph, suppose we had thed-separation relation ( Y ⊥ d I Z ∣ X, W ) G do ( X ) ,IZ . By Deﬁnition 6.3.2, this would result in theconditional independence ( Y Æ I Z ∣ X, W ) G do ( X ) ,IZ which can be expressed as P ( y ∣ w, do ( x ) , I Z = idle ) = P ( y ∣ w, do ( x ) , I Z = do ( z )) ∀ y, w, x, z Note that by deﬁnition of the augmented and post-intervention causal models, we have P ( y ∣ w, do ( x ) , I Z = idle ) = P ( y ∣ w, do ( x )) and P ( y ∣ w, do ( x ) , I Z = do ( z )) = P ( y ∣ w, do ( x ) , do ( z ))∀ y, w, x, z (c.f. Equations (2.50) and (2.51)), and consequently P ( y ∣ w, do ( x ) , do ( z )) = P ( y ∣ w, do ( x )) ∀ y, w, x, z , which is the required Equation (6.3). Therefore, showing thatthe d-separation condition ( Y ⊥ d Z ∣ X, W ) G X,Z ( W ) implies the d-separation relation ( Y ⊥ d I Z ∣ X, W ) G do ( X ) ,IZ would complete the proof. This is shown by contradiction. Suppose that ( Y ⊥ d Z ∣ X, W ) G X,Z ( W ) and ( Y /⊥ d I Z ∣ X, W ) G do ( X ) ,IZ . Then there must exist a path from a mem-ber I Z i of I Z to a member Y j of Y in G do ( X ) ,I Z that is unblocked by X and W . There aretwo possibilities for such a path: either it contains the subgraph I Z i Z i ...Y j or the sub-graph I Z i Z i ...Y j . Denoting these possibilities as cases 1 and 2 respectively, let P be theshortest such path. We will show that a contraction arises in each case. Case 1:

Consider the ﬁrst case where P contains the subgraph I Z i Z i ...Y j . In this case, ( Y /⊥ d I Z ∣ X, W ) G do ( X ) ,IZ (which we have assumed) implies ( Y /⊥ d Z i ∣ X, W ) G do ( X ) . Along with theassumption that ( Y ⊥ d Z i ∣ X, W ) G X,Z ( W ) , this implies that there exists a path from Z i to Y in G do ( X ) unblocked by X and W that passes through some member Z k of Z ( W ) , which would HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES blocked when the incoming arrows to Z k are removed. This leads to the following subcaseswhere the path from Z i to Y j in G do ( X ) contains the following subgraphs:• Case 1a: Z i ... Z k ...Y j or• Case 1b: Z i ... Z k ...Y j or• Case 1c: Z i ... Z k ...Y j Each of these cases are not possible due to the following reasons. In

Case 1a , some descendantof Z k must be in W for the path to be unblocked in G do ( X ) but by deﬁnition, Z ( W ) (whichcontains Z k ) is the set of all nodes in Z that don’t have descendants in W . In Case 1b , thepath between Z i and Z k must contain a collider. For this path to be unblocked by X and W in G do ( X ) the collider node must have a descendant in W but the other requirement that this pathmust be blocked in G X,Z ( W ) implies that the same collider node must be a member of Z ( W ) which by deﬁnition does not have any descendants in W , yielding a contradiction. In Case 1c ,there is either a directed path from Z k ∈ Z to Y j ∈ Y in G do ( X ) or a collider in the path between Z k and Y j . The latter is ruled out by the same argument used in Case 1b. If there is a directedpath from Z k to Y j , then there is a directed path from I Z k to Y j in G do ( X ) ,I Z i.e., there is a pathfrom a member of Z to Y that is unblocked by X and W in G do ( X ) ,I Z and that is shorter thanthe shortest path P , which is not possible.Finally, consider Case 2 where the path P contains the subgraph I Z i Z i ...Y j . The initialassumption that ( Y /⊥ d I Z ∣ X, W ) G do ( X ) ,IZ implies that the collider node Z i must have descendantsin the conditioning set W i.e., Z i /∈ Z ( W ) . However, in this case we will violate the assumptionthat ( Y ⊥ d Z ∣ X, W ) G X,Z ( W ) . On the other hand, to satisfy this d-separation, we would require Z i ∈ Z ( W ) but this would violate ( Y /⊥ d I Z ∣ X, W ) G do ( X ) ,IZ . Hence we have shown that ( Y ⊥ d Z ∣ X, W ) G X,Z ( W ) and ( Y /⊥ d I Z ∣ X, W ) G do ( X ) ,IZ can never be simultaneously satisﬁed and hencethat ( Y ⊥ d Z ∣ X, W ) G X,Z ( W ) implies ( Y /⊥ d I Z ∣ X, W ) G do ( X ) ,IZ which in turn implies the requiredEquation (6.3). Theorem 6.5.1. [Necessary and suﬃcient conditions for compatibility with T ] Given a causalmodel over a set of ORVs S , the following condition ( cond ) is necessary for the causal modelto be compatible with the embedding partial order T according to compat (Deﬁnition 6.4.7). ∀ subsets S , S ⊆ S , such that no proper subset of S aﬀects S , ⋂ s ∈S F ( s ) ⊈ ⋂ s ∈S F ( s ) ⇒ S does not aﬀect S . (6.4) While cond alone is not suﬃcient for compat , along with the following additional requirementit provides a suﬃcient condition for compat ∀X ∈ S , F (X ) ⊆ R X (6.5) HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Proof. Necessity of condition:

For any two subsets S , S ⊆ S such that no proper subset of S aﬀects S , compat implies that R S ⊆ R S whenever S aﬀects S . Further, by compat , R X =F (X ) for all ORVs X . Using these along with the fact that for any subset S i ⊆ S , R S i = ⋂ s i ∈S i R s i (see Remark 6.4.1), we have R S i = ⋂ s i ∈S i F ( s i ) for all subsets S i whenever compat holds. Hencewe have the following whenever compat holds, which is equivalent to Equation (6.4) S aﬀects S ⇒ ⋂ s ∈S F ( s ) ⊆ ⋂ s ∈S F ( s ) . Suﬃciency of condition + additional assumption:

Applying the condition cond of Equa-tion (6.4) to the case where

Y = {X } (i.e., Y is any arbitrary copy of X ), we have X aﬀects Y (by Deﬁnition 6.4.4 of a copy). This gives F (X ′ ) ⊆ F (X ) and hence X ′ ∈ F (X ) ∀ copies X ′ of X . Since by Deﬁnition 6.4.5, the accessible region of X is the smallest subset of T containingall possible copies of X , we must have R X ⊆ F (X ) . Hence cond by itself is not suﬃcient for compat as it does not require the accessible region to coincide with the future, only that it isa subset of the future. Along with the additional assumption that R X ⊇ F (X ) ∀X ∈ S , thisgives R X = F (X ) , ∀X ∈ S . (6.14)Again, the accessible region of any subset S i ⊆ S is R S i = ⋂ s i ∈S i R s i (Remark 6.4.1). Equa-tion (6.14) then implies that R S i = ⋂ s i ∈S i F ( s i ) ∀S i ⊆ S (6.15)Hence we have that the condition of Equation (6.4) along with the additional requirement ofEquation (6.5) imply Equation (6.14) and (6.15), where Equation (6.14) is the ﬁrst condition for compat (Deﬁnition 6.4.7). Further, Equation (6.15) along with the condition Equation (6.4)then imply the second condition for compat which is ∀ subsets S , S ⊆ S , such that no proper subset of S aﬀects S , S aﬀects S ⇒ R S ⊆ R S . (6.16) Theorem 6.5.2. [Necessary and suﬃcient condition for no aﬀects causal loops] A necessarycondition for a causal model over a set S of RVs to have no aﬀects causal loops is that thereexists an embedding of S in Minkowski space-time T such that the corresponding ORVs S satisﬁes the condition cond of Equation (6.4) . cond along with the additional assumption thatany 2 distinct ORVs X and Y such that one aﬀects the other, cannot share the same locationin T , are suﬃcient for having no aﬀects causal loops in the causal model.Proof. Necessity of condition: Consider a causal model over a set S of RVs that has no aﬀectscausal loops. Taking T to be Minkowski space-time, embed the RVs in T such that wheneveran RV X aﬀects another RV Y , then the corresponding ORVs satisfy Y ∈ F (X ) . Since X HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES aﬀects Y implies that Y does not aﬀect X (by the assumption of no aﬀects causal loops), this isalways possible for any pair of RVs. For embedding the remaining RVs, consider the followingconditions for two sets S and S of the RVs,1. No proper subset of S aﬀects S ,2. No proper subset of S is aﬀected by S ,3. No proper subset of S aﬀects S and,4. No proper subset of S is aﬀected by S ,Then embed the elements of S and S in T as follows• If conditions 1. and 2. hold and S aﬀects S : ⋂ s ∈S F ( s ) ⊆ ⋂ s ∈S F ( s ) .This automatically gives (since the sets are arbitrary):• If conditions 3. and 4. hold and S aﬀects S : ⋂ s ∈S F ( s ) ⊇ ⋂ s ∈S F ( s ) ,• If conditions 1.-4. hold, S aﬀects S and S aﬀects S : ⋂ s ∈S F ( s ) = ⋂ s ∈S F ( s ) .Such an embedding is always possible because the joint inclusive future of some set T of pointsin Minkowski space-time can be seen as the inclusive future of a single point L T , which wouldbe the earliest point that is in the future of all points in T , and the embedding imposes anorder on the points L T corresponding to a set. For example, for every pair of ORV sets S and S satisfying 1. and 2. and S aﬀects S , the embedding requires L S ⪯ L S . For the third casewhere 1.-4. are satisﬁed and the sets aﬀect each other, we would have L S = L S . Note that thisis possible even when the RVs are assigned distinct locations in T , since only the joint futuresof sets need to coincide (this was the case in Figure 6.6b where B and { A, C } aﬀect each other).In the case that S and S consist of single elements, this embedding clearly agrees with X aﬀects Y implies Y ∈ F (X ) , since no causal loops forbids the third case. In particular, for allsubsets S and S satisfying the conditions 1. and 2., we have S aﬀects S ⇒ ⋂ s ∈S F ( s ) ⊆ ⋂ s ∈S F ( s ) (6.17)Now, suppose S and S are two sets of variables that satisfy 1. and S aﬀects S . Then,either they also satisfy 2. or they do not satisfy 2., in which case there exists a proper subset S ′ ⊂ S such that S aﬀects S ′ . Without loss of generality, we can assume that S and S ′ satisfy both 1. and 2. (if not, we can repeat the argument for this case by taking a propersubset of S ′ that will satisfy these conditions). In other words S and S ′ ⊆ S satisfy 1.and 2. and also the aﬀects relation S aﬀects S ′ . Therefore, for the embedding described HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES above, we have ⋂ s ′ ∈S ′ F ( s ′ ) ⊆ ⋂ s ∈S F ( s ) due to Equation (6.17). Further, S ⊇ S ′ implies that ⋂ s ∈S F ( s ) ⊆ ⋂ s ′ ∈S ′ F ( s ′ ) . Therefore for any S and S , which satisfy 1. and S aﬀects S , wehave ⋂ s ∈S F ( s ) ⊆ ⋂ s ∈S F ( s ) . In other words, for the embedding described above, we have thefollowing which is equivalent to cond (Equation (6.4)). ∀ subsets S , S ⊆ S , such that no proper subset of S aﬀects S , S aﬀects S ⇒ ⋂ s ∈S F ( s ) ⊆ ⋂ s ∈S F ( s ) . For all other RVs (if any remain), the embedding can be arbitrary. For example, these maybe superﬂuous RVs that are neither aﬀected nor aﬀect any other sets of RVs in the model.Hence we have shown that if a causal model has no causal loops, there exists an embedding inMinkowski space-time T such that cond is satisﬁed. Suﬃciency of condition+additional assumption: cond (Equation (6.4)) implies in particularthat when the sets correspond to single elements X and Y , X aﬀects Y implies that F (Y) ⊆F (X ) (i.e.,

Y ∈ F (X ) ). Hence we can have a causal loop X aﬀects Y and Y aﬀects X only if X and Y share the exact same space-time location i.e., O (X ) = O (Y) . The additional assumptionforbids this and hence forbids causal loops. Corollary 6.5.3. [No aﬀects causal loops and compatibility with space-time] A necessary con-dition for a causal model over a set S of RVs to have no aﬀects causal loops is that ∃ an embed-ding of S in Minkowski space-time T such that the causal model is compatible (Deﬁnition 6.4.7)with T . This condition along with the additional assumption that any 2 distinct ORVs X and Y such that one aﬀects the other, cannot share the same location in T , are suﬃcient for havingno aﬀects causal loops in the causal model.Proof. Necessity of condition: The proof is identical to that of Theorem 6.5.2, with the addi-tional requirement that

F (X ) = R X . Note that this is a particular choice of embedding sincean embedding involves an assignment of locations in T to each RV along with an assignment ofaccessible regions (see Deﬁnition 6.4.6). In Theorem 6.5.1, we proved the suﬃciency of cond (Equation (6.4)) along with the additional assumption F (X ) ⊆ R X for compatibility of a causalmodel with the embedding T . cond implies that F (X ) ⊇ R X which along with F (X ) ⊆ R X implies F (X ) = R X , and this was required for the proof. Here, we already have F (X ) = R X by our choice of embedding and Theorem 6.5.2 shows that no causal loops imply the existenceof an embedding for which cond holds. Hence with the current choice of embedding, this alsoimplies compatibility of the causal model with the embedding partial order T . Suﬃciency of condition+additional assumption:

The proof is identical to the that of the suﬃ-ciency part of Theorem 6.5.2.

Theorem 6.6.1.

Consider a causal model over four nodes A , B , C and Λ , all of which areobserved. Let B be exogenous, A and C have no outgoing arrows and Λ be a common causeof A and C . Further, suppose that B aﬀects the set { A, C } without aﬀecting its individual HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

BA CΛ (a)

BA CΛ (b)

BA CΛ (c)

Figure 6.12: Possible causal structures for Theorem 6.6.1. elements i.e., B jams the correlations between A and C . This causal model is not compatiblewith Minkowski space-time T i.e., leads to superluminal signalling when the variables A , B and C are embedded in T as proposed in [105, 116], irrespective of the space-time location of Λ .The proposed embedding is: A ⊀⊁ B , B ⊀⊁ C , A ⊀⊁ C , F (A) ⋂ F (C) ⊆ F (B) , and the accessibleregion of each observed ORV coincides with its inclusive future.Proof. B aﬀects { A, C } implies the existence of a directed path from B to at least one of A and C . Given that B is exogenous, A and C have no outgoing arrows and share a common causeΛ, this yields the 3 possible causal structures of Figure 6.12, where the arrows from Λ may besolid of dashed. In all these causal structures, we have the d-separation relations A ⊥ d C ∣{ B, Λ } and B ⊥ d Λ. By Deﬁnition 6.3.2, these imply the conditional independences A Æ C ∣{ B, Λ } and B Æ Λ . (6.18)Further, the given aﬀects relations along with the exogeneity of B imply that B /Æ { A, C } and B Æ A and B Æ C. (6.19)We now show that in all the causal structures, irrespective of the classiﬁcation of the outgoingarrows from Λ and the space-time location that it is assigned, the causal model is not compatiblewith Minkowski space-time with respect to the embedding given in the theorem statement, andhence leads to superluminal signalling. For this, our ﬁrst step is to show, using Equations (6.18)and (6.19) that B aﬀects at least one of { A, Λ } and { C, Λ } . Suppose that B aﬀects neither ofthe sets, then by its exogeneity, this is equivalent to B Æ { A, Λ } and B Æ { C, Λ } (6.20)Then we have P A ∣ CB Λ = P A ∣ B Λ = P AB Λ P B Λ = P A Λ ∣ B P B P B Λ = P A Λ P B P B P Λ = P A ∣ Λ , (6.21) HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES

Where we have used A Æ C ∣{ B, Λ } in the ﬁrst line and B Æ Λ and B Æ { A, Λ } in the secondline. Further, B Æ { C, Λ } implies that P BC Λ = P B P C Λ . Using this in Equation (6.21), P ABC Λ = P A ∣ Λ P BC Λ = P B P A ∣ Λ P C Λ (6.22)Summing over B and Λ, we obtain P AC = ∑ Λ P A ∣ Λ P C Λ . Using this in Equation (6.22) gives P ABC = P B ∑ Λ P A ∣ Λ P C Λ = P B P AC . (6.23)The last line is equivalent to B Æ {

A, C } , which contradicts Equation (6.19). Therefore, giventhe conditions in the theorem statement, Equation (6.20) cannot hold i.e., B is correlated withat least one of { A, Λ } and { C, Λ } . Using the exogeneity of B , this implies that B aﬀects atleast one of these sets. For compatibility with the space-time, this aﬀects relation implies thefollowing constraint on the space-time embedding where the accessible region of each ORV istaken to coincide with its inclusive future. F (A) ∩ F ( Λ ) ⊆ F (B) ∨ F (C) ∩ F ( Λ ) ⊆ F (B) . (6.24)Now, note that Equation (6.19) implies that A /Æ C ∣ B (6.25)This is because if we had the contrary, i.e., A Æ C ∣ B , then along with B Æ A this gives P A ∣ BC = P A ∣ B = P A . Using this along with B Æ C gives P ABC = P A ∣ BC P BC = P A P B P C . This inturn implies B Æ {

A, C } , which contradicts the ﬁrst condition of Equation (6.19).Next, we show that { B, Λ } aﬀects A and { B, Λ } aﬀects C , again by contradiction. Supposethat { B, Λ } does not aﬀect A , this is equivalent to { B, Λ } Æ A (i.e., P A ∣ B Λ = P A ) since B and Λ are exogenous. Along with Equation (6.18), this implies that P A ∣ BC Λ = P A ∣ B Λ = P A .Then P ABC = ∑ Λ P A P BC Λ = P A P BC , which contradicts Equation (6.19). Therefore we musthave { B, Λ } aﬀects A . Similarly, one can show that { B, Λ } aﬀects C must also hold under theassumptions given in the theorem statement. Now, we have four cases depending on whetheror not Λ aﬀects A or C , which we enumerate below.• Case 1: Λ aﬀects neither A nor C In this case, both the outgoing arrows from Λ wouldbe dashed (by Deﬁnition 6.3.5). In this case, any compatible embedding of the causalmodel in space-time must satisfy (c.f. Theorem 6.5.1)

A ∈ F (B) ∩ F ( Λ ) ∧ C ∈ F (B) ∩ F ( Λ ) . (6.26)• Case 2: Λ aﬀects A but not C In this case we have Λ —→ A and Λ (cid:57)(cid:57)(cid:75) C , and a necessarycondition for compatibly embedding this causal model in space-time is A ∈ F ( Λ ) ∧ C ∈ F (B) ∩ F ( Λ ) . (6.27) HAPTER 6. CYCLIC AND FINE-TUNED CAUSAL MODELS AND COMPATIBILITY WITH RELATIVISTICPRINCIPLES • Case 3: Λ aﬀects C but not A In this case we have Λ —→ C and Λ (cid:57)(cid:57)(cid:75) A and compatiblywith the space-time requires that A ∈ F (B) ∩ F ( Λ ) ∧ C ∈ F ( Λ ) . (6.28)• Case 4: Λ aﬀects A as well as C Here we have Λ —→ A and Λ —→ C and compatibilitywith the space-time necessitates A ∈ F ( Λ ) ∧ C ∈ F ( Λ ) . (6.29)In Cases 1-3, we can immediately see that at least one of A and C must be in the future of B to restrict the causal model from signalling superluminally. In Case 4, this follows fromEquations (6.24) and (6.25)— since A and C must be in the future of Λ , the joint futures ofeach of these ORVs and Λ coincides with the future of the ORV itself. Therefore, the space-timecompatibility condition is always violated by the embedding of [105, 116] where A , B and C arepairwise space-like separated, and for this embedding, the jamming causal model consideredhere, with observed Λ leads to superluminal signalling. art IIMulti-agent paradoxes he ‘paradox’ is only a conﬂict between reality and your feeling of whatreality ‘ought to be’. - Richard Feynman HAPTER Multi-agent paradoxes in quantum theory P rocessing empirically acquired data and making inferences about the world form a crucialpart of the scientiﬁc method. For a consistent description of the world, these inferencesshould be based on a sound system of logic— simple reasoning principles applicable to generalsituations, on which there is common agreement. For example, we would like to make inferencessuch as “if I know that a holds, and I know that a implies b , then I know that b holds”independently of the nature of a and b . When considering scenarios with several rationalagents, inferences may involve reasoning about each other’s knowledge. In such cases, we oftenuse logical primitives such as, “If I know that she knows a , and I know that she arrived at a using a set of rules that we commonly agreed upon, then I know a ”. Examples include gameslike poker, complex auctions, cryptographic scenarios, and logical hat puzzles, where we mustprocess complex statements of the sort “I know that she knows that he does not know a ” basedon some logical primitives, and the common knowledge of the agents. A system of logic thatprovides these simple and intuitive rules for multi-agent reasoning is modal logic, which weoutlined in Section 2.6.On the other hand, when agents (such as ourselves) describe the world through physical theories,we would like to be able to model the agents also as physical systems of the theory, in orderto develop a more complete understanding of the physical world. In particular, we would likeour theory to model at least those parts of the agent that are responsible for storing andprocessing empirical data, such as their memory. When that theory is quantum mechanics,it turns out that these two desiderata (applying standard rules of logic to reason about eachother’s knowledge, and modelling agents’ memories as physical systems) are incompatible withagents’ empirical observations. This incompatibility (or “paradox”) was ﬁrst formalised byFrauchiger and Renner, in a thought experiment where agents who can measure each othersmemories (modelled as quantum systems) and reason about shared and individual knowledgemay reach conclusions that contradict their own observations [85].The FR paradox is originally presented in terms of a prepare and measure scenario where172 HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY parties exchange quantum states and thereby generate quantum correlations between eachother. However it can be equivalently described by an entanglement-based scenario wherethe parties do not communicate but pre-share suitable quantum correlations, as illustrated inFigure 7.1. Here, we present the entanglement version of the paradox (Section 7.1) as this willbe the most convenient form for generalising the paradox to post-quantum settings as we willdo in Chapter 8. The Frauchiger-Renner thought experiment has fuelled an enormous volumeof discussion and debate in the scientiﬁc community that is still ongoing. In Section 7.2, we aimto provide a brief overview of this debate and an outlook on this matter, which will also serveto motivate the results of the next chapter where we generalise the analysis beyond quantumtheory.

The main result of Frauchiger and Renner establishes the impossibility of a physical theory T to simultaneously satisfy three assumptions, denoted by Q , C and S . In [152], it was pointedout that the FR argument involved an additional implicit assumption, U . We present thesefour assumptions below as stated in [85] and [152], before we proceed to demonstrate theirincompatibility with the entanglement version of FR’s argument.The ﬁrst assumption Q pertains to the validity of certain predictions of quantum mechanics. ForFR’s argument, the full framework of quantum theory need not be assumed, but only requiresa weaker version of the Born rule that is applicable for making deterministic predictions. Assumption ( Q ) . A theory T that satisﬁes Q allows any agent Alice to reason as follows.Suppose Alice is in the vicinity of a system S associated with the Hilbert space H S , and sheknows that “The system S is in a state ∣ ψ ⟩ S ∈ H S at time t ”. Furthermore, suppose that Alicealso knows that“ the value x is obtained by a measurement of S w.r.t. the family { π t x } x ∈ X of Heisenberg operators relative to time t , and the measurement is completed at time t ”. If ⟨ ψ ∣ π t ξ ∣ ψ ⟩ = for some ξ ∈ X , then Alice can conclude that “I am certain at time t that x = ξ at time t ”. The second assumption C , originally referred to as “self-consistency”, pertains to how agentsreason about each other’s knowledge. Assumption ( C ) . A theory T that satisﬁes C allows any agent Alice to reason as follows. IfAlice has established that “I am certain that an agent Bob, upon reasoning using theory T , iscertain that x = ξ at time t ” then Alice can conclude “I am certain that x = ξ at time t .” HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY

The third assumption, S (originally referred to as “single-world”) pertains to the intuition thatan agent experiences a single outcome when they perform or witness a measurement. Assumption ( S ) . A theory T that satisﬁes S disallows any agent Alice from making both thestatements “I am certain at time t that x = ξ at time t .” and “I am certain at time t that x ≠ ξ at time t ”, where x is a value that can be observed at time t > t . The fourth assumption U , requires a bit more explanation. Suppose an agent Alice, has asystem S prepared in the state ∣ ψ ⟩ S = √ (∣ ⟩ + ∣ ⟩) . Alice measures S in the Z basis, {∣ ⟩ , ∣ ⟩} ,and records the outcome x of the measurement in her memory A (which was initialised to thestate ∣ ⟩ A . From Alice’s perspective, x is a uniformly distributed classical bit. Suppose also that(rather idealistically) that this measurement happens in Alice’s lab which is a closed systemconsisting of the subsystems S and A that does not leak any information to the environment.Now, consider how an outside agent Bob who models S and A both as quantum systems woulddescribe the measurement process. Since Alice’s lab is a closed system, Bob knows nothingabout Alice’s measurement outcome x and would describe the evolution of systems A and S in the lab through a unitary map U AS . Further, due to the perfect correlations betweenthe measurement outcome and the memory state that records it, this unitary would be acoherent controlled NOT with the S as control and A as target. That is, Bob describes Alice’smeasurement through the evolution,1 √ (∣ ⟩ S + ∣ ⟩ S ) ⊗ ∣ ⟩ A U AS ——→ √ (∣ ⟩ S ∣ ⟩ A + ∣ ⟩ S ∣ ⟩ A ) . (7.1)Hence, Bob would see Alice’s memory A to be entangled with her measured system S . The FRthought experiment assumes that outside agents such as Bob, who model an inside agent suchas Alice as a quantum system would model measurements performed by the latter in throughsuch reversible evolutions. This was noted in [152], and is encapsulated in the assumption U proposed by therein. Assumption ( U ) . A theory T that satisﬁes U allows any agent Bob to model measurementsperformed by any other agent Alice as reversible evolutions in Alice’s lab — for example, aunitary evolution U AS of the joint state of Alice’s memory A and the system S measured byher. One may have already spotted a tension between the assumptions S and U which describe the“inside” and “outside” perspectives of a quantum measurement i.e., the subjective experienceof a single classical outcome for Alice, vs the entangled superposition state of Alice and hersystem as seen by Bob. This is precisely the tension that Wigner originally illustrated in histhought experiment. In fact, Wigner’s thought experiment essentially corresponds to the twoagent scenario described in the previous section (paragraph preceding assumption U ), wherethe inside observer Alice plays the role of “Wigner’s friend” and the outside observer Bob who HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY (a)

Inside perspective (b)

Outside perspective

Figure 7.1: An entanglement-based version of the Frauchiger-Renner set-ting [85] from diﬀerent perspectives.

Alice and Bob (inside agents) share a Hardystate ∣ Ψ ⟩ P R = (∣ ⟩ + ∣ ⟩ + ∣ ⟩)/√

3, measure each their qubit ( P and R respectively)and update their memories A and B accordingly. Their labs are contained inside thelabs of the outside observers Ursula and Wigner, who can measure the systems AP and RB respectively. The paradox arises when one tries to combine the inside andoutside perspectives of quantum measurements on an entangled system into a singleperspective. (a) From their viewpoints, Alice and Bob measure their halves of ∣ Ψ ⟩ P R in the Z basis {∣ ⟩ , ∣ ⟩} to obtain the outcomes a and b . They then perform a classicalCNOT (i.e., classical copy) to copy their classical outcome into their memories A and B both initialised to ∣ ⟩ . (b) Ursula and Wigner perceive Alice and Bob’s memoryupdates as implementing quantum CNOTs on A controlled by P and B controlled by R respectively. The resultant joint state is ∣ Ψ ⟩ AP RB = (∣ ⟩ + ∣ ⟩ + ∣ ⟩)/√ AP and RB in the “ X basis” {∣ ok ⟩ = (∣ ⟩−∣ ⟩)/√ , ∣ f ail ⟩ = (∣ ⟩+∣ ⟩)/√ } to obtain the outcomes u and w respec-tively. If they obtain u = w = ok , the agents can reason about each others’ knowledgeto arrive at the paradoxical chain of statements u = w = ok ⇒ b = ⇒ a = ⇒ w = f ail .In Chapter 8, we extend this scenario to box world where Alice and Bob share a PRbox instead of the Hardy state and ﬁnd a suitable memory update operation and mea-surements for the parties such that a stronger version of the paradox is recovered,independently of the outcomes obtained. describes Alice as a quantum system plays the role of Wigner. By including the assumptions Q and C which introduce the element of reasoning about the knowledge of other agents, the FRexperiment elevates this apparent tension to an explicit mathematical contradiction betweenthe four assumptions. The following entanglement version of the thought experiment is entirelyequivalent to the original prepare and measure version, but simpliﬁes the description. This ﬁgure is taken from our paper [209] which is joint work with Nuriya Nurgalieva and Lídia del Rio, andthe credits for this pretty illustration goes to Nuriya.

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY

Experimental setup.

The experiment involves four agents, whom we call Alice, Bob, Ursulaand Wigner . The experiment starts with Alice and Bob sharing the following bipartite state ∣ ψ ⟩ P R = √ (∣ ⟩ P ∣ ⟩ R + ∣ ⟩ P ∣ ⟩ R + ∣ ⟩ P ∣ ⟩ R ) , (7.2)where P and R denote the subsystems held by Alice and Bob respectively (for reasons thatwill become more apparent in the next chapter). This state is known as a Hardy state dueto its particular relevance in Hardy’s paradox [108, 109]. Let A and B denote the subsystemsthat correspond to Alice’s and Bob’s memory respectively, where they store the outcome ofany measurement that they may perform. Suppose that Alice’s lab is located inside the labof another agent, Ursula who can perform joint measurements on Alice’s system P and hermemory A . Similarly, let Bob’s lab be located inside Wigner’s lab, such that Wigner canperform joint measurements on Bob’s system R and his memory B . Alice’s and Bob’s labs areisolated such that no information about their measurement outcomes leaks out. The protocol:

The protocol is the following: t=1

Alice measures her half ( P ) of joint state ∣ ψ ⟩ P R , in the {∣ ⟩ , ∣ ⟩} (i.e., Z or computational)basis, and stores the outcome a in her memory A . t=2 Bob measures his half ( R ) of joint state ∣ ψ ⟩ P R , also in the Z basis, and stores the outcome b in his memory B . By the assumption U , the ﬁnal joint state of Alice and Bob’s systemsand memories after their measurements at times t = t =

2, as described from theoutside would be ∣ ψ ⟩ P ARB = √ (∣ ⟩ P ∣ ⟩ A ∣ ⟩ R ∣ ⟩ B + ∣ ⟩ P ∣ ⟩ A ∣ ⟩ R ∣ ⟩ B + ∣ ⟩ P ∣ ⟩ A ∣ ⟩ R ∣ ⟩ B ) (7.3) t=3 Ursula jointly measures the systems P and A in Alice’s lab, in the {∣ ok ⟩ , ∣ f ail ⟩} basisdeﬁned below and obtains the outcome u . ∣ ok ⟩ P A ∶= √ (∣ ⟩ P ∣ ⟩ A − ∣ ⟩ P ∣ ⟩ A )∣ f ail ⟩ P A ∶= √ (∣ ⟩ P ∣ ⟩ A + ∣ ⟩ P ∣ ⟩ A ) (7.4) t=4 Wigner jointly measures the systems R and B in Bob’s lab, also in the {∣ ok ⟩ , ∣ f ail ⟩} basisand obtains the outcome w . ∣ ok ⟩ RB ∶= √ (∣ ⟩ R ∣ ⟩ B − ∣ ⟩ R ∣ ⟩ B )∣ f ail ⟩ RB ∶= √ (∣ ⟩ R ∣ ⟩ B + ∣ ⟩) R ∣ ⟩ B ) (7.5) In accordance with the naming convention of [152, 209].

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY t=5

Ursula and Wigner compare the outcomes of their measurements. If they were both “ok”,they halt the experiment. Otherwise, they reset the timer and all systems to the initialconditions, and repeat the experiment. Note that there is a non-zero probability for bothUrsula and Wigner to obtain the outcome “ok” when measuring the subsystems

P A and RB in the state ∣ ψ ⟩ P ARB of Equation (7.3) i.e., P ( u = w = “ ok ′′ ∣ ψ P ARB ) = ∣(⟨ ok ∣ P A ⊗ ⟨ ok ∣ RB ) ⋅ ∣ ψ ⟩ P ARB ∣ = . (7.6)The measurement basis for each agent mentioned above are agreed on beforehand and the agentsdo not communicate once the experiment begins, except for the communication between Ursulaand Wigner at t =

5. We now discuss how the agents reason about each other’s knowledge toarrive at the contradiction, given that the experiment halted i.e., post-selecting on the conditionthat u = w = ok . Agents’ reasoning: Ursula reasons about Bob:

Rewriting the joint state ∣ ψ ⟩ P ARB as follows ∣ ψ ⟩ P ARB = √ ∣ f ail ⟩ P A ∣ ⟩ R ∣ ⟩ B + √ ∣ ⟩ P ∣ ⟩ A ∣ ⟩ R ∣ ⟩ B , = √ ∣ f ail ⟩ P A ∣ ⟩ R ∣ ⟩ B + √ ∣ f ail ⟩ P A ∣ ⟩ R ∣ ⟩ B − √ ∣ ok ⟩ P A ∣ ⟩ R ∣ ⟩ B , (7.7)Ursula at t = Q ) on obtaining the outcome u = ok ,that she knows withcertainty that Bob must have obtained the outcome b = t =

2. This is because Ursulaknows that her (normalised) post-measurement state on obtaining the outcome u = ok would be ∣ ok ⟩ P A ∣ ⟩ R ∣ ⟩ B , and there is unit probability of Bob obtaining b = B given this outcome. Thus Ursula (at t =

3) knows that u = ok ⇒ b = . Bob reasons about Alice:

From the original form of the joint state (7.3), Bob reasonsusing Q , on obtaining the outcome b = t = a = t = t =

2) knows that b = ⇒ a = . Alice reasons about Wigner:

Rewriting the joint state as ∣ ψ ⟩ P ARB = √ ∣ ⟩ P ∣ ⟩ A ∣ ⟩ R ∣ ⟩ B + √ ∣ ⟩ P ∣ ⟩ A ∣ f ail ⟩ RB , (7.8) HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY

Alice, on obtaining the outcome a = t =

1, uses Q to reason about what Wignerwould observe at t = w = f ail i.e., Alice (at t =

1) knows that a = ⇒ w = f ail. Note that in each of the steps above, S is implicitly used. These statements reﬂect the knowledgeand reasoning of diﬀerent agents, and are combined using the assumption C to give the followingparadoxical chain of statements when u = w = ok is observed at t = u = w = ok ⇒ b = ⇒ a = ⇒ w = f ail (7.9)In other words, whenever the experiment halts with u = w = ok , the agents can make determinis-tic statements about each other’s reasoning and measurement outcomes, and thereby concludethat Alice had predicted w = f ail with certainty. This is in conﬂict with the assumption S which forbids w from taking both the values with certainty.In the modal logic language, the assumption C can be seen as encoding the distribution axiom(Axiom 1) i.e., “If an agent knows a statement φ and also that φ ⇒ ψ , then they know ψ ”,along with the concept of trust (Deﬁnition 2.6.3) i.e., an agent A trusts another agent B if andonly if “ A knows that B knows φ implies that A knows φ ”. In [152], it is shown that the FRparadox can be reformulated as: the Kripke structure satisfying the knowledge axioms of modallogic described in Section 2.6 is incompatible with agents who apply quantum theory to modeleach other and reason using that logical system. They show that this is the case even if thetruth axiom (Axiom 3) is replaced with the weaker notion of a trust structure (Deﬁnition 2.6.3).We refer the reader to [152] for a detailed analysis of this paradox in terms of the axioms ofmodal logic and the trust relations at diﬀerent times. These ideas will also become clearerin Chapter 8 where we derive an analogous paradox in box world, a generalised probabilistictheory (see Section 2.4 for an overview of GPTs) by generalising the quantum theory dependentassumptions of FR to the general setting, in the language of modal logic. We now proceed todiscuss some of the implications of FR’s result and its relation to other no-go theorems. Remark 7.1.1.

It is important to note here, also as stressed in the original paper [85] that an“agent” need not necessarily correspond to a conscious observer in such thought experiments.An agent simply refers to any entity capable of performing measurements on other systemsand applying simple deductive principles to process empirical information. For example, asmall quantum computer would in principle fulﬁl the desiderata for agency in such settings.Hence, FR-type thought experiments could possibly be implementable in the near future, withprogress in scalable quantum computing, even though they cannot be implemented with currenttechnology.

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY

The intuition behind the FR paradox described above comes from Bell’s theorem, combinedwith the unitary description of the measurement from the outside perspective (Equation (7.1)).The entanglement of the shared Hardy state ∣ ψ ⟩ P R implies the inability to assign simultaneousvalues to the outcomes corresponding to diﬀerent measurement settings [109], the FR experi-ment corresponds to a situation where all the choices of measurements are actually implementedin a single run of the experiment since the unitarity of the measurement map (7.1) allows theoutside agents to reverse the eﬀect of the inside agents’ measurement before performing theirmeasurement. This also means that the ﬁnal joint state ∣ ψ ⟩ P ARB that Ursula and Wigner mea-sure contains the initial correlations of ∣ ψ ⟩ P R as evident from Equations (7.2) and (7.3). Thisleads to the paradoxical situation that the outcomes of all four parties cannot admit simul-taneous value assignments. FR’s paradox can be viewed as a manifestation of Hardy’s proofof Bell’s theorem [109] which is not based on the violation of Bell inequalities, but on deriv-ing a logical contradiction between the prediction of quantum mechanics for certain entangledstates (such as (7.2)) and the assumption of local hidden variables for the outcomes of possiblemeasurements. Hardy derives such a contradiction for almost all bipartite entangled states in[109]. It can be shown that in the 4 agent setting of FR (Figure 7.1), replacing the sharedstate ∣ ψ ⟩ P R by any one state from the class of Hardy states, and modifying the deﬁnition of the {∣ ok ⟩ , ∣ f ail ⟩} basis accordingly (also as proposed in [109]), one can elevate Hardy’s argument forthe outcomes of potentially unperformed measurements in the regular bipartite Bell scenario tothe outcomes of the four measurements (one per agent) actually performed by in an FR-typesetup. Hence the FR paradox is closely related to Hardy’s paradox [108], as was also pointedout in [175]. We leave the formal derivation of this and a more general characterisation of thestates and measurements required for deriving FR-type paradoxes in N -agent experiments tofuture work. There are other no-go theorems in a similar vein as FR, that are based on an extension ofWigner’s thought experiment. All these theorems illustrate the incompatibility between anobjective assignment of values to the measurement outcomes of all agents in scenarios wherequantum predictions are extended to the level of agents. The most notable of these is Brukner’sno-go theorem [36] which states that the following four assumptions are incompatible: 1. quan-tum theory is universally valid (also when applied to observers), 2. locality i.e., the setting ofone party doesn’t aﬀect the outcome of the other, 3. freedom of choice 4. the ability to jointlyassign truth values to observed outcomes of all the observers. Brukner derives the CHSH in- This class excludes the maximally entangled Bell states which are too symmetric to derive a logical contra-diction akin to the chain (7.9).

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY equalities from these assumptions 2.-4. and the well known violation of these Bell inequalitiesin quantum theory demonstrates the aforementioned incompatibility of the assumptions 1.-4.Assumption 1 roughly corresponds to the assumptions Q and U of FR, with the diﬀerence be-ing that the former would need to assume full validity of the Born rule rather than the weaker,deterministic version given by Q . Assumption 4 is similar in motivation as the assumption C of FR, with the diﬀerence being that C relates to combining deterministic statements made byagents to assign values to all their outcomes. In this sense, C is a stronger assumption than4. which does not require the value assignments to be deterministic. Assumptions 2 and 3 arediﬀerent from the FR setting, where the measurement choice for each party is ﬁxed. One couldsay that FR’s no-go theorem is to Brukner’s what Hardy’s is to Bell’s. The former, in bothcases are based on logical arguments, that can be made without reference to inequalities butrequire certain probabilities to be deterministic i.e., have 0 or 1 values. For example, in theFR case, we require P ( b = ∣ u = ok ) = P ( a = ∣ b = ) = P ( w = ok ∣ a = ) = When treating agents as physical systems, the divide between the “observer” and the “observed”becomes subjective— we always put ourselves in the former category, but may be put in thelatter category by another observer. Wigner [221] was among the ﬁrst to point out through his

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY famous thought experiment, that this leads to conceptual issues in the quantum case, and thisproblem is sometimes colloquially referred to as the shifty split . FR’s extension of the Wigner’sfriend scenario formalises this conceptual argument as a no-go theorem, thereby providing aconcrete footing for the longstanding debates regarding the measurement problem and in turn,interpretations of quantum theory, which represent the diﬀerent stances that one can take on themeasurement problem. We begin this discussion by noting the importance of diﬀerentiatingbetween objections against the FR result vs objections against speciﬁc assumptions used inderiving the result. The result itself, demonstrates an incompatibility between a certain set ofassumptions and subsumes the latter type of objections— it leaves open the choice of whichassumption(s) among Q , C , S and U must be dropped to resolve the paradox, it is agnosticto the choice itself. This allows for a classiﬁcation of the diﬀerent interpretations of quantumtheory based on the choice they make, and also indicates that any interpretation that satisﬁesall four assumptions should be ruled out. It appears that the main interpretation(s) underthreat of being ruled out are those presented in standard textbooks which albeit ambiguous,seem to be trying to have the cake and eat it too [175]. Possible resolutions to the apparentparadox involve dropping one or more of the assumptions. Dropping Q and/or U:

Objections against FR such as “you cannot put agents in super-positions”, or “the outside description of the measurement (7.1) should lead to a mixed stateand not a pure, entangled state” do not refute the validity of the theorem but that of theassumptions U and/or Q about extending quantum theory to agents. This is one possibleresolution allowed by the theorem. Further, in [197], Anthony Sudbery analyses Bohmianmechanics [72, 27, 78] in the light of FR’s result, arguing that the interpretation satisﬁes allassumptions, except possibly U regarding the particular evolution of the labs. This wouldimply that Bohmian mechanics does not agree with the predictions of quantum theory at allscales, as is commonly believed. Veronika Baumann and Stefan Wolf [19] have provided aninteresting analysis of the thought experiment in the relative state formalism [81, 220], consid-ering diﬀerent ways of describing the evolution of the labs and showing that these can lead todiﬀerent predictions. They also compare other interpretations in this regard and show that therelative state formalism admitting unitary and universal quantum description deviates from thestandard Born rule. Dropping C:

Certain versions of the Copenhagen interpretation [113, 28] also turn out toproblematic for similar reasons as the textbook approach, and Matthew Leifer argues (for ex-ample, in this talk) that only a perspectival version of the interpretation, which gives up theassumption C would survive. QBism [91] and relational quantum mechanics are also perspecti-val in this sense and propose a similar resolution— measurement results are real only from the Similar misinterpretations can occur with Bell’s theorem. For example, if we interpret Bell’s theoremas telling us that there is fundamental uncertainty in quantum theory, that cannot be explained away byconditioning on classical information. A realist would argue that this is not true, since we can have ﬁne-tuned,classical explanations i.e., non-local HV theories. Stating Bell’s theorem as a no-go result, this argument issubsumed: you can either drop “realism” or “locality”, you have a choice, the theorem doesn’t tell you whichone to pick.

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY subjective perspective of speciﬁc agents, and not objective properties of the world. Further, atheory that satisﬁes all assumptions except C , would call for a revision of our understandingof causal and logical inference. The latter, since dropping C implies a break down of standardrules of classical modal logic [152]. The former, due to the following reason. As we have seen inPart I of this thesis, standard approaches to causal inference depict causal structures as havingcertain observed and certain unobserved nodes. The former are classical variables correspond-ing to the observed measurement statistics. In theories where agents as modelled as quantumsystems according to Q and U , whether measurement outcomes are seen as classical variablesor as entangled quantum subsystems is subjective, which challenges existing models for causalinference (both in classical and quantum theory). This suggests that causal structure itself maybe subjective in such scenarios. It would be interesting to study causal and logical inference insuch scenarios and understand how these diﬀer from situations where agents are modelled asclassical systems. Dropping S:

This appears to be an intuitive assumption but a violation of this is not neces-sarily in conﬂict with the quantum formalism. It is argued that interpretations such as many-worlds (and its numerous variations) [76, 74] as well as the relative state formalism mentionedearlier, do not satisfy this assumption [85]. In the former, measurement is seen as leading to abranching into diﬀerent possible worlds such that every possible outcome of the measurementoccurs in a corresponding world. Whether many-worlds type interpretations satisfy the otherassumptions, may depend on the particular variation being considered, and how the branchingis deﬁned therein.

Other assumptions:

Other implicit assumptions of the FR experiment have been pointedout [198, 152]. Some of these assumptions relate to idealizations such as the ability to perfectlyprepare pure states and perform perfect projective measurements. While it is true that arelaxation of such assumptions may no longer result in a deterministic, logical paradox, thesewould still result in a conceptual problem regarding the incompatibility of universal quantumtheory and objectivity of measurement results, akin to those pointed out by [36, 30]. Further,one can argue that most interpretations would allow for such idealizations to be made, whichare common place in theoretical studies. It would be interesting to analyse whether some ofthe other implicit assumptions in the result pointed out in [198] are no longer needed in theentanglement version of the experiment presented here. This discussion by no means does fulljustice to the massive volume of discussions and literature generated by the FR result. Werefer the reader to the original paper [85] as well as [152] for more in-depth discussions oninterpretations. However, as pointed out in [152, 85], it may be diﬃcult to fully analyse such multi-agent experiments inQBism which tends to limit its focus on the experience and actions of single agents.

HAPTER 7. MULTI-AGENT PARADOXES IN QUANTUM THEORY

In conclusion, we draw yet another analogy with Bell’s theorem [22]. The theorem has resultedin signiﬁcant conceptual and practical advancement by ruling out certain classical descriptionsof the world (i.e., local hidden variables) due to their incompatibility with our observations,which agree with quantum theory. It has also faced several objections and criticism over theyears, and generated several alternate derivations/versions [57, 56, 109, 102, 225] which revealdiﬀerent facets of quantum theory that deviate from our understanding of the classical world.In a similar manner, theorems such as those of Frauchiger and Renner have deep foundationalimplications for the nature of objectivity, agents’ experience, logical reasoning and causal infer-ence. Critically examining the assumptions behind such theorems and characterising generalscenarios where such incompatibilities can arise are likely to reveal important connections be-tween these fundamental concepts and the role that quantum theory plays in them. Finally,as previously noted, FR-type thought experiments could possibly be implemented in the nearfuture depending on the advancements in scalable quantum computing, and don’t necessarilyrequire the agents to be conscious observers [85]. This would in principle allow for the interpre-tations of quantum theory to be subject to experimental tests. The caveat being that a clearcut answer would require a loophole-free implementation. This would be signiﬁcantly morediﬃcult, as we have seen in the case of Bell’s theorem where the ﬁrst experimental test [86] wasrealised within a decade but the ﬁrst loophole free tests [98, 114, 190] were only realised morethan 5 decades after the original theorem [22] was proposed.

HAPTER Multi-agent paradoxes beyond quantum theory T he Frauchiger-Renner thought experiment [85] reveals an incompatibility between extend-ing quantum theory to reasoning agents and certain simple rules of logical deduction [152],which we have discussed in Chapter 7. Our goal here is to understand whether this incom-patibility is a peculiar feature of quantum theory, or whether modelling reasoning agents usingother physical theories can also lead to such contradictions. Previous works have analysed suchmulti-agent paradoxes only in quantum theory, and a theory-independent analysis could pos-sibly reveal more fundamental aspects these examples. For example, is the reversibility (in thequantum case, unitarity) of the global measurement transformation a necessary property forsuch paradoxes or the entanglement of the shared state (in the quantum case, the Hardy state),or both? Generalising this study beyond the quantum setting would not only help us identifythe features of physical theories responsible for such paradoxes, but may also shed some lighton the structure of logic in non-classical theories. Here, we investigate this question within thelandscape of generalized probabilistic theories [110, 16]. This chapter is based on our paper[209], which is coauthored by Nuriya Nurgalieva and Lídia del Rio.

In Section 8.2, we generalize the Frauchiger-Renner conditions using the laguage of modal logic,so that they can be applied to any physical theory. In particular, in Section 8.2.4, we introducea way to describe an agent’s measurement from the perspective of other agents in generalisedprobabilistic theories (GPT). Finally, in Section 8.4 we derive a logical inconsistency akin toone found in [85], using a setup where agents share a PR box, a maximally non-local resourcein box world (a particular GPT). The paradox found is stronger than the quantum one, in184

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY the sense that it does not rely on post-selection. . Another important point brought to lightby our version of the paradox is that the reversibility of the measurement map (akin to theassumption U in the quantum case) is not necessary for deriving such paradoxes in general. InSection 8.5, we provide a detailed discussion which includes a comparison of our results with thequantum case (Section 8.5.1), its implications for measurements in box world (Section 8.5.2),relationships between multi-agent paradoxes and contextuality (Section 8.5.4), and the plentifulscope for future work. Based on the knowledge of the summary sections of previous chapters,the reader might logically reason and thereby consider possible a world s where the presentchapter also contains a poetic summary. This is indeed that world. We talk, so we reason.We reason about what we know,We reason about what others knowAnd if we trust them, make their knowledge our own.We learn, so we store.We store our knowledge in a part of our memory.We model that memory by a physical theory.But if that theory is quantum, this leads to an inconsistency [85].We learn, so we wonder.Which theories lead to such apparent inconsistensies,Between reasoning agents and their memories modelled “physically”?Proposing a mathematical map for update of memories,An example we ﬁnd in box world, a GPT,where using a PR box, agents ﬁnd a stronger paradox.We answer questions, so we question more.What properties of theories lead to these paradoxes galore?Must be contextuality, which forms of it, we are not yet sureThat, in future work we will explore!

Here we generalize the assumptions underlying the Frauchiger-Renner result to general physicaltheories. The conditions can be instantiated by each speciﬁc theory. This includes but is notlimited to theories framed in the approach of generalized probabilistic theories (GPTs) [110]. Insome theories, like quantum mechanics and box world (a GPT), we will ﬁnd these four conditionsto be incompatible, by ﬁnding a direct contradiction in examples like the Frauchiger-Renner The joint state and the probability distributions of the original Frauchiger-Renner paradox are akin to thoseof Hardy’s paradox [109]. For a comparison of Hardy’s paradox and PR box and why the latter allows for acontradiction without post-selection, see [4]. An entanglement version of the Frauchiger-Renner experiment andit’s relation to our extension is explained in Figure 7.1.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY experiment or the PR-box experiment described in Section 8.4. In other theories (like classicalmechanics and Spekkens’ toy theory [196]) these four conditions may be compatible. A completecharacterization of theories where one can ﬁnd these paradoxes is the subject of future work.

This condition is theory-independent. It tells us that rational agents can reason about eachother’s knowledge in the usual way. This is formalized by a weaker version of epistemic modallogic , which we explain in the following. For the full derivation of the form used here see [152],and for an overview of the modal logic framework, see Section 2.6.We start with a simple example. The goal of modal logic is to allow us to operate with chainedstatements like “Alice knows that Bob knows that Eve doesn’t know the secret key k , and Alicefurther knows that k = K A [( K B ¬ K E k ) ∧ k = ] , where the operators K i stand for “agent i knows.” If in addition Alice trusts Bob to be arational, reliable agent, she can deduce from the statement “I know that Bob knows that Evedoesn’t know the key” that “I know that Eve doesn’t know the key”, and forget about thesource of information (Bob). This is expressed as K A ( K B ¬ K E k ) (cid:212)⇒ K A ¬ K E k. We should also allow Alice to make deductions of the type “since Eve does not know the secretkey, and one would need to know the key in order to recover the encrypted message m , Iconclude that Eve cannot know the secret message,” which can be encoded as K A [(¬ K E k ) ∧ ( K i m (cid:212)⇒ K i k, ∀ i )] (cid:212)⇒ K A ¬ K E m. Generalizing from this example, this gives us the following structure.

Deﬁnition 8.2.1 (Reasoning agents) . An experimental setup with multiple agents A , . . . A N can be described by knowledge operators K , . . . K N and statements φ ∈ Φ, such that K i φ denotes “agent A i knows φ .” It should allow agents to make deductions , that is K i [ φ ∧ ( φ (cid:212)⇒ ψ )] (cid:212)⇒ K i ψ. Furthermore, each experimental setup deﬁnes a trust relation between agents (Deﬁnition 2.6.3):we say that an agent A i trusts another agent A j (and denote it by A j ↝ A j ) iﬀ for all statements φ , we have K i ( K j φ ) (cid:212)⇒ K i φ. Note that this is the distribution axiom, (Axiom 1).

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

Remark 8.2.1 (One human ≠ one agent) . Note that in general ‘one human ≠ one agent.’For example, consider a setting where we know that Alice’s memory will be tampered with attime τ (much like the original Frauchiger-Renner experiment, or the sleeping beauty paradox[80]). We can deﬁne two diﬀerent agents A t < τ and A t > τ to represent Alice before and after thetampering— and then for example Bob could trust pre-tampering (but not post-tampering)Alice, A t < τ ↝ B . Remark 8.2.2 (Complexity cost of reasoning) . Note that in general, even the most rationalphysical agents may be limited by bounded processing power and memory capacity, and willnot be able to chain an indeﬁnite number of deductions within sensible time scales. That is,these axioms for reasoning are an idealization of absolutely rational agents with unboundedprocessing power (see [1] for an overview of this and related issues). If we would like modallogic to apply to realistic, physical agents, we might account for a cost (in time, or in memory)of each logical deduction, and require it to stay below a given threshold, much like a resourcetheory for complexity. However, in the examples of this chapter, agents only need to make ahandful of logical deductions, and these complexity concerns do not play a signiﬁcant role.

This condition is to be instantiated by each physical theory, and is the way that we incorporatethe physical theory into the reasoning framework used by agents in a given setting. If all agentsuse the same theory to model the operational experiment (like quantum mechanics, specialrelativity, classical statistical physics, or box world), this is included in the common knowledge shared by the agents. For example, in the case of quantum theory, we have that “everyoneknows that the probability of obtaining outcome ∣ x ⟩ when measuring a state ∣ ψ ⟩ is given by ∣⟨ x ∣ ψ ⟩∣ , and everyone knows that everyone knows this, and so on.” Deﬁnition 8.2.2 (Common knowledge) . We model a physical theory shared by all agents { A i } i in a given setting as a set T of statements that are common knowledge shared by all agents,i.e. φ ∈ T ⇐⇒ ({ K i } i ) n φ, ∀ n ∈ N , where ({ K i } i ) n is the set of all possible sequences of n operators picked from { K i } i . Forexample, ( K K K K ) ∈ ({ K i } i ) and stands for “agent A knows that agent A knows thatagent A knows that agent A knows.”Note that the set T of common knowledge may include statements about the settings of theexperiment, as well as complex derivations. To ﬁnd our paradoxical contradiction, we mayonly need a very weak version of a full physical theory: for example Frauchiger and Renneronly require a possibilistic version of the Born rule, which tells us whether an outcome will beobserved with certainty [85]. This will also be the case in box world. One can also alternatively model a physical theory as a subset T P of the set T of common knowledge, T P ⊆ T , in the case when details of experimental setup are not relevant to the theoretical formalism. HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

In operational experiments, a reasoning agent can make statements about systems that shestudies; consequently, the theory used by the agent must be able to produce a description or amodel of such a system, namely, in terms of a set of states. For example, in quantum theory atwo-state quantum system with a ground state ∣ ⟩ and an excited state ∣ ⟩ ( qubit ) can be fullydescribed by a set of states {∣ ψ ⟩} in the Hilbert space C , where ∣ ψ ⟩ = α ∣ ⟩ + β ∣ ⟩ with α, β ∈ C and ∣ α ∣ + ∣ β ∣ =

1. Other examples of theories and respective descriptions of states of systemsinclude: GPTs, where e.g. a generalised bit ( gbit ) is a system completely characterized by twobinary measurements which can be performed on it [16] (a review of GPTs can be found inSection 2.4); algebraic quantum mechanics, with states deﬁned as linear functionals ρ ∶ A → C ,where A is a C ∗ -algebra [212]; or resource theories with some state space Ω, and epistemicallydeﬁned subsystems [73, 130]. Deﬁnition 8.2.3 (Systems) . A “physical system” (or simply “system”) is anything that canbe an object of a physical study . A system can be characterized, according to the theory T , bya set of possible states P S . In addition, a system is associated with a set of allowed operations, O S ∶ P S ↦ P S on these states. Deﬁnition 8.2.4 (Parallel composition) . For any two systems S and S , the union of the twodeﬁnes a new system S ∪ S or simply S S . The operator ∥ denotes parallel composition ofstates and operations such that P S ∥ P S ∈ P S S whenever P S ∈ P S and P S ∈ P S andsimilarly, O S ∥ O S ∈ O S S whenever O S ∈ O S and O S ∈ O S . In other words, the state P S ∥ P S of S S can be prepared by simply preparing the states P S and P S of the individualsystems S and S and the operation O S ∥ O S can be implemented by locally performing theoperations O S and O S on the individual systems.We assume no further structure to this operator. Note also that we do not assume that agiven composite system can be split into/described in terms of its parts even though combiningindividual systems in this manner allows us to deﬁne certain states of composite systems . Nowwe introduce agents into the picture. Deﬁnition 8.2.5 (Agents) . A physical setting may be associated with a set A of agents. Anagent A i ∈ A is described by a knowledge operator K i ∈ K A and a physical system M i ∈ M A ,which we call a “memory.” Each agent may study other systems according to the theory T . Anagent’s memory M i records the results and the consequences of the studies conducted by A i .The memory may be itself an object of a study by other agents. We strive to be as general as possible and do not suppose or impose any structure on systems and connec-tions between them; in particular, we don’t make any assumptions about how composite systems are formallydescribed in terms of their parts. In fact, in box world, we can consider operations on two initial systems that transform it into a new, largersystem that can no longer be seen as being made up of 2 smaller systems. We call this “supergluing", seeSection 8.5 for a discussion.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

Here we consider measurements both from the perspective of an agent who performs them, andthat of another agent who is modeling the ﬁrst agent’s memory. In an experiment involvingmeasurements, each agent has the subjective experience of only observing one outcome (inde-pendently of how others may model her memory), and we can see this as the deﬁnition of ameasurement: if there is no subjective experience of observing a single outcome, we don’t callit a measurement. We can express this experience as statements such as φ = “The outcomewas 0, and the system is now in state ∣ ⟩ .” We explain this further after the formal deﬁnition. Deﬁnition 8.2.6 (Measurements) . A measurement is a type of study that can be conductedby an agent A i on a system S , the essential result of which is the obtained “outcome” x ∈ X S .If witnessed by another agent A j (who knows that A i performed the measurement but does notknow the outcome), the measurement is characterized by a set of propositions { φ x } ∈ Φ, where φ x corresponds to the outcome x , satisfying:• K j ( K i (∃ x ∈ X S ∶ K i φ x )) ,• K j K i φ x (cid:212)⇒ K j K i ¬( φ y ) , ∀ y ≠ x .The ﬁrst condition tells us that A j knows that from A i ’s perspective, she must have observed oneoutcome x ∈ X , and A i would have used this knowledge to derive all the relevant conclusions,as expressed by the proposition φ x . For example, if the measurement represents a perfect Z measurement of a qubit, φ may include statements like “the qubit is now in state ∣ ⟩ ; beforethe measurement it was not in state ∣ ⟩ ; if I measure it again in the same way, I will obtainoutcome 0” and so on. Note that this condition does not imply that the measurement outcomestored in A i ’s memory is classical for A j . In fact, in the quantum case A j may see A i ’s memoryas a quantum system entangled with the system that A i measured. Despite this, A j knows thatfrom A i ’s perspective, this outcome appears to be classical, which is what the ﬁrst conditioncaptures. The second condition implements A i ’s experience of observing a single outcome,and the fact that the outside agent A j knows that this is the case from A i ’s perspective. If A i observes x , they conclude that the conclusions φ y that they would have derived had theyobserved a diﬀerent outcome y are not valid and A j knows that A i would do so. In the previousexample, they would know that it does not hold φ = “the qubit is now in state ∣ ⟩ ; before themeasurement it was not in state ∣ ⟩ ; if I measure it again I will see outcome 1.” This conditionalso ensures that the conclusions { φ x } x are mutually incompatible, i.e. that the measurementis tightly characterized.A measurement of another agent’s memory is also an example of a valid measurement. In otherwords, agent A j can choose A i ’s lab, consisting of A i ’s memory and another system S (which A i studies), as an object of her study. Thus, any agent’s memory can be modelled by theother agents as a physical system undergoing an evolution that correlates it with the measured HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY system. In quantum theory, this corresponds to the unitary evolution ( N − ∑ x = p x ∣ x ⟩ system ) ⊗ ∣ ⟩ memory → N − ∑ x = p x ∣ x ⟩ system ⊗ ∣ x ⟩ memory ·„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„‚„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„„¶ =∶ ∣ ˜ x ⟩ SM . (8.1)The key aspect here is that the set of states of the joint system of observed system and memory, P SM = span {∣ x ⟩ system ⊗ ∣ x ⟩ memory } N − x = is post-measurement isomorphic to the set of states P S system alone. That is, for every transformation (cid:15) S that you could apply to the system beforethe measurement, there is a corresponding transformation (cid:15) SM acting on the P SM that isoperationally identical. By this we mean that an outside observer would not be able to tell ifthey are operating with (cid:15) S on a single system before the measurement, or with (cid:15) SM on systemand memory after the measurement. In particular, if (cid:15) S is itself another measurement on S within a probabilistic theory, it should yield the same statistics as post-measurement (cid:15) SM . Fora quantum example that helps clarify these notions, consider S to be a qubit initially in anarbitrary state α ∣ ⟩ S + β ∣ ⟩ S . An agent Alice measures S in the Z basis and stores the outcomein her memory A . While she has a subjective experience of seeing only one possible outcome,an outside observer Bob could model the joint evolution of S and A as ( α ∣ ⟩ S + β ∣ ⟩ S ) ⊗ ∣ ⟩ A → α ∣ ⟩ S ∣ ⟩ A + β ∣ ⟩ S ∣ ⟩ A . Suppose now that (before Alice’s measurement) Bob was interested in performing an X mea-surement on S . This would have been a measurement with projectors {∣+⟩⟨+∣ S , ∣−⟩⟨−∣ S } , where ∣±⟩ S = √ (∣ ⟩ S ± ∣ ⟩ S ) . However, he arrived too late: Alice has already performed her Z mea-surement on S . If now Bob simply measured X on S , he would obtain uniform statistics, whichwould be uncorrelated with the initial state of S . So what can he do? It may not be veryfriendly, but he can measure S and Alice’s memory A jointly, by projecting onto ∣+⟩ SA = √ (∣ ⟩ S ∣ ⟩ A + ∣ ⟩ S ∣ ⟩ A )∣−⟩ SA = √ (∣ ⟩ S ∣ ⟩ A − ∣ ⟩ S ∣ ⟩ A ) , which yields the same statistics of Bob’s originally planned measurement on S , had Alice notmeasured it ﬁrst. In fact, this is precisely the {∣ ok ⟩ , ∣ f ail ⟩} basis in which the outside observers,Ursula and Wigner measure, in the Frauchiger-Renner case (c.f. Section 7.1). Ideally, thisequivalence should also hold in the more general case where the observed system may have beenpreviously correlated with some other reference system: such correlations would be preservedin the measurement process, as modelled by the “outside” observer Bob.There are many options to formalize this notion that “every way that an outside observer couldhave manipulated the system before the measurement, he may now manipulate a subspace of‘system and observer’s memory,’ with the same results.” A possible simpliﬁcation to restrictour options is to take subsystems and the tensor product structure as primitives of the theory,which may apply to GPTs [16] but not for general physical theories (like ﬁeld theories; for adiscussion see [130]). Here, we will for now restrict ourselves to this case, and leave a moregeneral formulation of this condition as future work. We also restrict ourselves to information-preserving measurements (excluding for now those where some information may have leaked toan environment external to Alice’s memory), which are suﬃcient to derive the contradiction. HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY S | ψ i • A | i (a) Alice’s perspective. Themeasurement in Z basis per-formed by Alice, who writes theclassical result down to her mem-ory A . S | ψ i • A | i (b) Bob’s perspective on Aliceperforming a measurement. Thememory update of Alice, after she mea-sures the system S in Z basis, as seenfrom the point of view of the outsideobserver, corresponding to the memoryupdate U . E S [ Z, X ] S | ψ i H • B | i (c) Bob’s perspective. Bobperforms a measurement in the X basis of a system S . E SA [ Z, X ] US | ψ i • • H • A | i B | i (d) Bob’s perspective on perform-ing a measurement after Alice. Bobperforming a measurement in ˜ X basis ofsystems S and A , after Alice’s memoryupdate U . Figure 8.1: The measurement and memory update in quantum theoryfrom diﬀerent perspectives.

From Alice’s point of view, the measurement of thesystem S either in Z basis yields a classical result, which she records to her memory A , performing a classical CNOT (Figure 8.1a). For an outside observer, Bob who isnot aware of Alice’s measurement result, Alice’s memory is entangled with the systemand the CNOT is a quantum entangling operation which corresponds to the memoryupdate U (Figure 8.1b). Further, there is no classical measurement outcome fromBob’s perspective even though he knows that Alice would perceive one. If Bob hadaccess to the system S prior to the measurement by A , and wanted to measure it in Xbasis ( {∣+⟩ S , ∣−⟩ S } ), he would have to perform an operation E S [ Z, X ] (and then copythe classical result into his memory B ) (Figure 8.1c). If the system S was initiallyin a state ∣ ψ ⟩ = ∣+⟩ S , then a proposition which would correspond to this operation is φ [ E S [ Z, X ](∣ ψ ⟩ S )] = “ s = + ”. However, if the measurement in Z is already performedby A and the result is written to her memory, the whole process described by Bob asa memory update u , and in order to comply his initial wish to measure S only, he canperform an operation E SA [ Z, X ] on S and A together instead, which is a measurementin {∣+⟩ SA , ∣−⟩ SA } basis (Figure 8.1d). A proposition which this operation yields is φ [ E SM i ○ u (∣ φ ⟩ S ⊗ ∣ ⟩ A )] = “ sa = + ” (as ∣ ξ ⟩ SA = ∣+⟩ SA ), which naturally follows from“ s = + ”, given the structure of the memory update u . Deﬁnition 8.2.7 (Information-preserving memory update) . Let P S be a set of states of asystem S that is being studied by an agent A i with a memory M i , and P SM i be a set of states This ﬁgure is taken from our paper [209] and was made by coauthors.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY of the joint system SM i . If for a given initial state Q inM i ∈ P M i of the memory, there exists acorresponding map U Q ∶ P SM i → P SM i ( ∈ O SM i ) that satisﬁes the following conditions ( ) and ( ) , then U Q is called an information-preserving memory update .1. Local operations on S before the memory update can be simulated by joint operations on S and M i after the update. That is, for all P S ∈ P S , O S ∈ O S , A j ∈ A , φ , there exists anoperation O SM i ∈ O SM i such that K j φ [ O S ( P s )] ⇒ K j φ [ O SM i ○ U Q ( P S ∥ Q inM i )] , where φ [ . . . ] are arbitrary statements that depend on the argument.2. The memory update does not factorize into local operations. That is, there exist nooperations O ′ S ∈ O S and O ′ M i ∈ O M i such that U Q = O ′ S ∥ O ′ M i Condition ( ) was explained in previous paragraphs. Condition ( ) is required because thetrivial map which entails doing nothing to the system and memory (i.e., the identity) satisﬁesCondition ( ) even though such an operation should certainly not be regarded as a memoryupdate. Condition ( ) requires that U Q does not factorise into local operations over S and M i is required in order to rule out such trivial operations that cannot be taken to representa memory update. See Figure 8.1 for an example of U Q in the quantum case where it is areversible unitary operation and the initial state of the memory, Q inM i is ∣ ⟩ M i . In general, thememory update map U Q need not be reversible; for example, in box world it is an irreversibletransformation, as we will see later.Note that it is enough to consider the memory update map U Q corresponding to a particularchoice of initial state Q inM i since the map U Q ′ corresponding to any other state Q ′ inM i ∈ P M i canbe obtained by ﬁrst locally transforming the memory state into Q inM i and then applying U Q .Thus without loss of generality, we will consider only speciﬁc initial states in the chapter anddrop the label Q on this map, simply calling it U . For example, in the quantum case, it isenough to consider the memory update with the memory initialised to the state, ∣ ⟩ M i .The characterization of measurements introduced in this section is rather minimal. In physicaltheories like classical and quantum mechanics, measurements have other natural properties thatwe do not require here. Two striking examples are “after her measurement, Alice’s memorybecomes correlated with the system measured in such a way that, for any subsequent operationthat Bob could perform on the system, there is an equivalent operation he may perform onher memory” and “the correlations are such that there exists a joint operation on the systemand Alice’s memory that would allow Bob to conclude which measurement Alice performed.”While these properties hold in the familiar classical and quantum worlds, we do not know ofother physical theories where measurements can satisfy them, and they require Bob to be ableto act independently on the system and on Alice’s memory, which may not always be possible.For example, we will see that in box world, these two subsystems become superglued after HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

Alice’s measurement, and that Bob only has access to them as a whole and not as individualcomponents. As such, we will not require these properties out of measurements, for now. Werevisit this discussion in Section 8.5.

In Section 2.4, we reviewed Barrett’s framework [16] for generalised probabilistic theories, inparticular describing states, transformations and measurements in box world, a particular GPT.We will now instantiate our general conditions for agents, memories and measurements (deﬁ-nitions deﬁnitions 8.2.5 to 8.2.7) in box world. As there is no physical theory for the dynamicsbehind box world, there is plenty of freedom in the choice of implementation. In principle eachsuch choice could represent a diﬀerent physical theory leading to the same black-box behaviourin the limit of a single agent with an implicit memory. This is analogous to the way in whichdiﬀerent versions of quantum theory (Bohmian mechanics, collapse theories, unitary quantummechanics with von Neumann measurements) result in the same eﬀective theory in that limit.

Deﬁnition 8.3.1 (Agents in box world) . Let T be the theory that describes box world, ac-cording to [16] (see also Section 2.4). As per deﬁnition 8.2.5, an agent A i ∈ A is described by aknowledge operator K i ∈ K A and a physical memory M i ∈ M A .We will focus on the case where the memory consists of bit or gbits. Each agent may study othersystems according to the theory T . An agent’s memory records the results and the consequencesof the studies conducted by them, and may be an object of a study by other agents.It is worth mentioning that boxes do not correspond to physical systems, but to input/outputfunctions that can only be evaluated once. As such, the post-measurement state of a physicalsystem is described by a whole new box. The notion of an individual system itself, as wewill see, may be unstable under measurements— some measurements glue the system to theobserver’s memory, in a way that makes individual access to the original system impossible. Measurement: observer’s perspective.

From the point of view of the observer who ismeasuring (say Alice), making a measurement on a system corresponds simply to runningthe box whose state vector encodes the measurement statistics. Alice may then commit theresult of her measurement to a physical memory, like a notebook where she writes ‘I measuredobservable X and obtained outcome a .’ To be useful, this should be a memory that may beconsulted later, i.e. it could receive an input Y = ‘start: open and read the memory’, and outputthe pair ( X, a ) . In the language of GPTs, this means that Alice, from her own perspective,prepares a new box with a trivial input Y = ‘start’ and two outputs ( X ′ , a ′ ) , with the behaviour ⃗ P ( X ′ , a ′ ∣ Y ) = δ X,X ′ δ a,a ′ , which depends on her observations. She may later run this box (look ather notebook) and recover the measurement data. The exact dimension of the box will depend Thus the state-space P SM i can also contain such “super-glued states". HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

Ursula runs the box at time t=3aboxfor UrsulaAlice’s memory update(as seen by Ursula, at time t=1) (a)

Generally, in GPTs with somenotion of subsystems, Ursula canthink of the physical system mea-sured by Alice, and Alice’s mem-ory pre-measurement as two boxes,which Ursula could in principlerun if Alice chose not to measure(left). From Ursula’s perspective,Alice’s measurement corresponds toa transformation of the two initialboxes that results in a new, post-measurement box available to Ur-sula, whose behaviour will dependon the concrete physical theory.

Alice’s memory update(as seen by Ursula, at time t=1) (b)

In box world, suppose the twoinitial systems correspond to gbits,and Alice’s memory is initialized asshown. If we want to preserve theglobal system dimensions, then therules for allowed transformations(Deﬁnition 8.2.7) limit the statis-tics of Ursula’s ﬁnal box to be ofthe form shown in the right (Ap-pendix 8.6.1). The asterisks repre-sent arbitrary values, which will de-pend on the choice of implementa-tion of Alice’s measurement.

Figure 8.2: Memory update after a measurement: an outsider’s perspective.

Here Alice makes a measurement of a system (blue, top) at time t = on how Alice perceives and models her own memory; for example it could consist of two bits,or two gbits, or, if we think that before the measurement she stored the information about thechoice of observable elsewhere, it could be a single bit or gbit encoding only the outcome. Weleave this open for now, as we do not want to constrain the theory too much at this stage. Measurements: inferences.

To see the kind of inferences and conclusions that an agentcan take from a measurement in box world, it’s convenient to look at the example where Aliceand Bob share a PR box (Figure 2.4). Suppose that Alice measured her half of the box withinput X = a =

0. From the PR correlations, XY = a ⊕ b , she canconclude that if Bob measures Y =

0, he must obtain b =

0, and if he measures Y =

1, he mustobtain b =

1. This is independent of whether Bob’s measurement happens before or after Alice(or even space-like separated). She could reach similar deterministic conclusions for her other This ﬁgure is taken from our paper [209] and was made by Lídia del Rio.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY choice of measurement and possible outcomes. In the language of Deﬁnition 8.2.6, we have φ X = ,a = = “ [ Y = (cid:212)⇒ b = ] ∧ [ Y = (cid:212)⇒ b = ] ” ,φ X = ,a = = “ [ Y = (cid:212)⇒ b = ] ∧ [ Y = (cid:212)⇒ b = ] ” ,φ X = ,a = = “ [ Y = (cid:212)⇒ b = ] ∧ [ Y = (cid:212)⇒ b = ] ” ,φ X = ,a = = “ [ Y = (cid:212)⇒ b = ] ∧ [ Y = (cid:212)⇒ b = ] ” . Measurement: memory update from an outsider’s perspective.

Next we need tomodel how an outside agent, Ursula, models Alice’s measurement, in the case where Alicedoes not communicate her outcome to Ursula. Suppose that all agents share a time referenceframe, and Alice makes her measurement at time t =

1. From Ursula’s perspective, in the mostgeneral case, this will correspond to Alice preparing a new box, with some number of inputsand outputs, which Ursula can later run (Figure 8.2a). The exact form of this box will dependon the underlying physical theory for measurements: in the quantum case it corresponds to abox with the measurement statistics of a state that’s entangled between the system measuredand Alice’s memory, as we saw. In classical mechanics, it will correspond to perfect classicalcorrelations between those two subsystems. In the other extreme, we could imagine a theory ofvery destructive measurements, where after Alice’s measurement, the physical system she hadmeasured would vanish. From Ursula’s perspective, this could be modelled by a box with avoid associated distribution. Now suppose that we would like to have a physical theory wherethe dimension of systems is preserved by measurements: for example, if the system that Alicemeasures is instantiated by a box with binary input and output (e.g. a gbit, or half of a PR-box),and Alice’s memory, where she stores the outcome of the measurement is also represented as agbit, then we would want the post-measurement box accessible to Ursula to have in total twobinary inputs and two binary outputs (or more generally, four possible inputs and four possibleoutputs). Note that this is not a required condition for a theory to be physical per se— it isjust a familiar rule of thumb that gives some persistent meaning to the notion of subsystemsand dimensions. In such a theory that supports box world correlations, we ﬁnd that the allowedstatistics of Ursula’s box must satisfy the conditions of Figure 8.2b (proof in Appendix 8.6.1).These conditions still leave us some wiggle room for possible diﬀerent implementations.

Measurements: information-preserving memory update.

In order to ﬁnd a multi-agent paradox, we will need a model of memory update that is information-preserving, in thesense of Deﬁnition 8.2.7. This does not imply that Alice’s transformation (as seen by Ursula)be reversible. Firstly, it is known that all reversible dynamics in box world is trivial and cannotcorrelate product states [103], hence the memory update map (Deﬁnition 8.2.7) in box worldcannot be reversible. Further, we ﬁnd that in general, the memory update (from the outsideperspective) can glue two subsystems such that Ursula will only be able to address them as awhole (since separating them could lead to a violation of no-signaling). But the relevant fact Naming convention: as we will see in Section 8.4, the proposed experiment feature two “internal” agents,Alice and Bob, who will in turn be measured by two “external” agents, Ursula and Wigner. In the example ofSection 8.2, the internal agent was Alice and the external Bob, so that their diﬀerent pronouns could help keeptrack of whose memory we were referring to.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

Ursula’s box(prepared by Alice at time t=1) or (a) Alice measures a system (innerblue box) and stores the outcome inher memory (inner pink box). FromUrsula’s perspective, this appears asthough Alice simply wired the outputsof the two boxes with a controlled- not gate, so that the measurement outputis copied to the output of the memory.When Ursula later runs the outer greenbox, she provides two inputs, which gothrough this circuit, resulting in twoidentical outputs. or

0X a~ ~ (b)

In an information-preservingmodel, for every measurement X on the initial state of the systemmeasured by Alice, there exists ameasurement ˜ X that Ursula can dojointly on the system and Alice’smemory that reproduces the samestatistics. This is achieved, for ex-ample, by Ursula ﬁxing her secondinput to 0, and simply discarding thesecond (trivial) output as shown. Figure 8.3: Information-preserving memory update.

This (trivial) physical im-plementation of Alice’s measurement in box world satisﬁes the conditions of Figure 8.2band is information-preserving (Deﬁnition 8.2.7). The crucial detail is that Ursula isnot allowed to open her box (in green) and access the circuitry inside. Note that isUrsula can perform arbitrary measurements on the joint system then the pre and postmeasurement states give diﬀerent statistics, the former is a product state while thelatter isn’t. Further, there are other possibilities for modelling measurements— thisis the simplest one that still allows us to derive the paradox. Details and proofs inAppendix 8.6.1. is that Ursula can apply some post-processing in order to obtain a new box with the samebehaviour as the pre-measurement system that Alice observed. In Figure 8.3 we give anexample of a model that satisﬁes these conditions, in addition to the conditions of Figure 8.2b.It is a minimal implementation among many possible, which already allows us to derive such aparadox beyond quantum theory. We further discuss some of the limitations and alternativesto this model in Section 8.5. What is important here (and proven in Appendix 8.6.1) is thatthis model generalizes to the case where Alice measures half of a bipartite state, like a PRbox. That is, suppose that Alice and Bob share a PR box. Imagine that at time t = X , obtaining (from her perspective) an outcome a , and that Bobmakes his measurement Y at time t =

2, obtaining outcome b . As usual, if Alice and Bob wereto communicate at this point, they would ﬁnd that XY = a ⊕ b , and indeed the propositions φ X,a and φ Y,b that represent their subjective measurement experience would hold. But nowsuppose that Alice and Bob do not get the chance to communicate and compare their inputand outputs; instead, at time t =

3, an observer Ursula, who models Alice’s measurement as inFigure 8.3a, runs the box corresponding to Alice’s half of the PR box and Alice’s memory, andapplies the post-processing of Figure 8.3b. Ursula’s input is ˜ X and her output is ˜ a . Then the This ﬁgure is taken from our paper [209] and was made by Lídia del Rio.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY claim is that ˜ XY = ˜ a ⊕ b : that is, Ursula and Bob eﬀectively share a PR box. This is provenin Appendix 8.6.1. We now have all the ingredients needed to ﬁnd a multi-agent epistemicparadox in box world. We now describe a scenario in box world where reasoning, physical agents reach a logicalparadox.

Experimental setup.

The proposed thought experiment is similar in spirit to the one pro-posed by Frauchiger and Renner [85], with the PR box playing the role of the Hardy state(c.f. Figure 7.1). Alice and Bob share a PR box (the corresponding box world state is given inFigure 2.4b); they each will measure their half of the PR box and store the outcomes in theirlocal memories. Let Alice’s lab be located inside the lab of another agent, Ursula’s lab suchthat Ursula can now perform joint measurements on Alice’s system (her half of the PR box) andmemory, as seen in the previous section. Similarly, let Bob’s lab be located inside Wigner’s lab,such that Wigner can perform joint measurements on Bob’s system and memory. We assumethat Alice’s and Bob’s labs are isolated such that no information about their measurementoutcomes leaks out. The protocol is the following: t=1

Alice measures her half of the PR box, with measurement setting X , and stores theoutcome a in her memory A . t=2 Bob measures his half of the PR box, with measurement setting Y , and stores the outcome b in his memory B . t=3 Ursula measures the box corresponding to Alice’s lab (as in Figure 8.3b), with measure-ment setting ˜ X = X ⊕

1, obtaining outcome ˜ a . t=4 Wigner measures the box corresponding to Bob’s lab, with measurement setting ˜ Y = Y ⊕ b .Agents can agree on their measurement settings beforehand, but should not communicate oncethe experiment begins. The trust relation, which speciﬁes which agents consider each other tobe rational agents (as opposed to mere physical systems), is A t = , ↭ B t = , B t = , ↭ U t = U t = , ↭ W t = W t = ↭ A t = . HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

Note that the time information is important here as it tells us that the trust relations hold for thepre-tampering versions of the “inside” agents, Alice and Bob (who are measured by the “outside”agents Ursula and Wigner). The common knowledge T shared by all four agents includes thePR box correlations, the way the external agents model Alice and Bob’s measurements, andthe trust structure above. Reasoning.

Now the agents can reason about the events in other agents’ labs. We take herethe example where the measurement settings are X = Y = , ˜ X = ˜ Y =

1, and where Wignerobtained the outcome ˜ b =

0; the reasoning is analogous for the remaining cases.1.

Wigner reasons about Ursula’s outcome.

At time t =

4, Wigner knows that, by virtue oftheir information-preserving modelling of Alice and Bob’s measurements, he and Ursulaeﬀectively shared a PR box . He can therefore use the PR correlations ˜ X ˜ Y = ˜ a ⊕ ˜ b toconclude that Ursula’s output must be 1, K W ( ˜ b = (cid:212)⇒ ˜ a = ) . Wigner reasons about Ursula’s reasoning.

Now Wigner thinks about what Ursula mayhave concluded regarding Bob’s outcome. He knows that at time t =

3, Ursula and Bobeﬀectively shared a PR box , satisfying ˜ XY = ˜ a ⊕ b , and that therefore Ursula must haveconcluded K W K U ( ˜ a = (cid:212)⇒ b = ) . Wigner reasons about Ursula’s reasoning about Bob’s reasoning.

Next, Wigner wonders“What could Ursula, at time t =

3, conclude about Bob’s reasoning at time t = t = XY = a ⊕ b , and therefore concludes K W K U K B ( b = (cid:212)⇒ a = ) . Wigner reasons about Ursula’s reasoning about Bob’s reasoning about Alice’s reasoning.

Now Wigner thinks about Alice’s perspective at time t =

1, through the lenses of Bob (attime t =

2) and Ursula ( t = a , and that Wigner would model Bob’s measurement in an information-preserving way,such that Alice (at time t =

1) and Wigner (of time t =

4) share an eﬀective PR box ,satisfying X ˜ Y = a ⊕ ˜ b , which results, in particular, in K W K U K B K A ( a = (cid:212)⇒ ˜ b = ) . Wigner applies trust relations.

In order to combine the statements obtained above, weneed to apply the trust relations described above, starting from the inside of each propo- See Appendix 8.6.1 for a proof.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY sition, for example, K W K U K B K A ( a = (cid:212)⇒ ˜ b = )(cid:212)⇒ K W K U K B ( a = (cid:212)⇒ ˜ b = ) (∵ A ↝ B )(cid:212)⇒ K W K U ( a = (cid:212)⇒ ˜ b = ) (∵ B ↝ U )(cid:212)⇒ K W ( a = (cid:212)⇒ ˜ b = ) (∵ U ↝ W ) and similarly for the other statements (where A ↝ B denotes B trusts A , c.f. Deﬁni-tion 2.6.3), so that we obtain K W [( ˜ b = (cid:212)⇒ ˜ a = ) ∧ ( ˜ a = (cid:212)⇒ b = ) ∧ ( b = (cid:212)⇒ a = ) ∧ ( a = (cid:212)⇒ ˜ b = )](cid:212)⇒ K W ( ˜ b = (cid:212)⇒ ˜ b = ) . We could have equally taken the point of view of any other observer, and from any particularoutcome or choice of measurement, and through similar reasoning chains reached the followingcontradictions, K A [( a = (cid:212)⇒ a = ) ∧ ( a = (cid:212)⇒ a = )] ,K B [( b = (cid:212)⇒ b = ) ∧ ( b = (cid:212)⇒ b = )] ,K U [( ˜ a = (cid:212)⇒ ˜ a = ) ∧ ( ˜ a = (cid:212)⇒ ˜ a = )] ,K W [( ˜ b = (cid:212)⇒ ˜ b = ) ∧ ( ˜ b = (cid:212)⇒ ˜ b = )] . We have generalized the conditions of the Frauchiger-Renner theorem and made them applicableto arbitrary physical theories, including the framework of generalized probability theories . Wethen applied these conditions to the GPT of box world and found an experimental setting thatleads to a multi-agent logical paradox.

We showed that box world agents reasoning about each others’ knowledge can come to adeterministic contradiction, which is stronger than the original paradox, as it can be reachedwithout post-selection, from the point of view of every agent and for any measurement outcomeobtained by them.

Post-selection.

In contrast to the original Frauchiger-Renner experiment of [85], no post-selection was required to arrive at this contradictory chain of statements as, in fact, all the

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY implications above are symmetric, for example˜ a = ⇐⇒ b = ⇐⇒ a = ⇐⇒ ˜ b = ⇐⇒ ˜ a = . As a result, one can arrive at a similar (symmetric) paradoxical chain of statements irrespectiveof the choice of agent and outcome for the ﬁrst statement. In other words, irrespective of theoutcomes observed by every agent, each agent will arrive at a contradiction when they try toreason about the outcomes of other agents. This is because, as shown in [4], the PR box exhibitsstrong contextuality and no global assignments of outcome values for all four measurementsexists for any choice of local assignments. In contrast, the original paradox of [85] admits thesame distribution as that of Hardy’s paradox [109]. It is shown in [4] that this distribution isan example of logical contextuality where for a particular choice of local assignments (the onesthat are post-selected on in the original Frauchiger-Renner experiment), a global assignmentof values compatible with the support of the distribution fails to exist, but this is not truefor all local assignments. This makes the paradox even stronger in box world, since it can befound without post-selection and by any of the agents, for any outcome that they observe. Inparticular, the paradox would already arise in a single run of the experiment. For a simplemethod to enumerate all possible contradictory statements that the agents may make, see theanalysis of the PR box presented in [4].

Reversibility of the memory update map

As mentioned previously, the memory updatemap U in the quantum case is quantum CNOT gate which is a unitary and hence reversible.In box world however, this map cannot be reversible since it is known that all reversible mapsin box world map product states to product states [103] and hence no reversible U in boxworld could satisfy Deﬁnition 8.2.7 of an information-preserving memory update. The map wepropose here for box world is clearly irreversible as it leads to correlations between the initiallyuncorrelated system and memory. Since we lack a physical theory to explain how measurements and transformations are instanti-ated for generalised non-signaling boxes, and only have access to their input/output behaviour,all allowed transformations consist of pre- and post-processing. In the quantum case, we havein addition to a description of possible input-output correlations, a mathematical framework forthe underlying states producing those correlations, the theory of von Neumann measurementsand transformations as CPTP maps. In box world, introduction of dynamical features (forexample, a memory update algorithm) is less intuitive and requires additional constructions.In the following, we outline the main limitations we found.

Systems vs boxes.

In quantum theory, a system corresponds to a physical substrate thatcan be acted on more than once. For example, Alice could measure a spin ﬁrst in the Z basisand then in X basis (obviously with diﬀerent results than if she had measured ﬁrst X and HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY then Z ). The predictions for each subsequent measurement are represented by a diﬀerent boxin the GPT formalism, such that each box encodes the current state of the system in termsof the measurement statistics of a tomographically complete set of measurements. After eachmeasurement, the corresponding box disappears, but quantum mechanics gives us a rule tocompute the post-measurement state of the underlying system, which in turn speciﬁes the boxfor future measurements. On the other hand, the default theory for box world lacks the notionof underlying physical systems and a deﬁnite rule to compute the post-measurement vector stateof something that has been measured once. Indeed, Equations 2.44a-2.44c tell us that post-measurement states is only partially speciﬁed: for instance, if the measurement performed wasﬁducial, we know that the block corresponding to that measurement in the post-measurementstate would have a “1” corresponding to the outcome obtained and “0” for all other outcomesin the block. However, we still have freedom in deﬁning the entries in the remaining blocks.Our model proposes a possible physical mechanism for updating boxes (which could be read asupdating the state of the underlying system), but so far only for the case where we comparethe perspectives of diﬀerent agents, and we leave it open whether Alice has a subjective updaterule that would allow her to make subsequent measurements on the same physical system. Verifying a measurement.

In our simple model, the external observer Ursula has no wayto know which measurement Alice performed, or whether she measured anything at all— theconnection between Alice’s and Ursula’s views is postulated rather than derived from a physicaltheory. Indeed, Alice could have simply wired the boxes as in Figure 8.3a without actuallyperforming the measurement, and Ursula will not know the diﬀerence: she obtains the samejoint state of Alice’s memory and the system she measured. In contrast, consider the case ofquantum mechanics with standard von Neumann measurements. There, Alice’s memory getsentangled with the system, and the post-measurement state depends on the basis in which Alicemeasured her system. For example, if Alice’s qubit S starts oﬀ in the normalised pure state ∣ ψ ⟩ = α ∣ ⟩ S + β ∣ ⟩ S and her memory M initialised to ∣ ⟩ M , the initial state of her system and memoryfrom Ursula’s perspective is ∣ Ψ ⟩ inSM = [ α ∣ ⟩ S + β ∣ ⟩ S ] ⊗ ∣ ⟩ M = [( α + β √ )∣+⟩ S + ( α − β √ )∣−⟩ S )] ⊗ ∣ ⟩ M IfAlice measures the system in the Z basis, the post-measurement state from Ursula’s perspectiveis ∣ Ψ ⟩ out,ZSM = α ∣ ⟩ S ∣ ⟩ M + β ∣ ⟩ S ∣ ⟩ M , which is an entangled state. If instead, Alice measured inthe Hadamard (X) basis, the post-measurement state would be ∣ Ψ ⟩ out,XSM = ( α + β √ )∣+⟩ S ∣ ⟩ M +( α − β √ )∣−⟩ S ∣ ⟩ M . Clearly the measurement statistics of ∣ Ψ ⟩ inSM , ∣ Ψ ⟩ out,ZSM and ∣ Ψ ⟩ out,XSM are diﬀerentand Ursula can thus (in principle, with some probability) tell whether or not Alice performeda measurement and which measurement was performed by her. In the absence of a physicaltheory backing box world, we can still lift this degenerancy between the three situations (Alicedidn’t measure, she measured X =

0, or she measured X =

1) by adding another classical systemto the circuitry of 8.3a: for example, a trit that stores what Alice did, and which Ursula couldconsult independently of the glued box of system and Alice’s memory. However, we’d still havea postulated connection between what’s stored in this trit and what Alice actually did, and notone that is physically motivated.

Supergluing of non-signaling boxes.

For the memory update circuit (from Ursula’s per-spective) of Figure 8.3a, and the initial state of Equation (8.3), the ﬁnal state would be

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY (following the notation of Section 2.4) ⃗ P SMfin = ( p − p ∣ p − p ∣ q − q ∣ q − q ) TSM . Note that while the reduced ﬁnal state of S does not depend on the input X ′ to M , the reduced ﬁnal state on Alice’s memory M , ⃗ P Mfin clearly depends on the input X ofthe system S if p ≠ q . If X = ⃗ P Mfin = ( p − p ∣ p − p ) T and if X = ⃗ P Mfin = ( q − q ∣ q − q ) T ,i.e., the systems S and M are signalling . This is expected since there is clearly a transfer ofinformation from S to M during the measurement as seen in Figure 2.5. However, this meansthat the state ⃗ P SMfin is not a valid box world state of 2 systems S and M but a valid stateof a single system SM i.e., after Alice performs her wiring/measurement, it is not possible tophysically separate Alice’s system S from her memory M from Ursula’s perspective. For ifthis were possible, there would be a violation of the no-signaling principle and the notion ofrelativistic causality. In quantum theory, on the other hand it is always possible to performseparate measurements on Alice’s system and her memory even after she measures. We callthis feature supergluing of post-measurement boxes, where it is no longer possible for Ursula toseparately measure S or M , but she can only jointly measure SM as though it were a singlesystem. Note that this is only the case for p ≠ q and in our example with the PR-box (Section8.4), p = q = / ⃗ P SMfin remains a valid bipartite non-signaling state in this particular, ﬁne-tuned case of the PR box and there is no supergluing in the particular example described inSection 8.4.

A glass half full.

The above-mentioned features of the memory update in box world arecertainly not desirable, and not what one would expect to ﬁnd in a physical theory with mean-ingful notions of subsystems. An optimistic way to look at these limitations is to see themas providing us with further intuition for why PR boxes have not yet been found in nature.One of the main contributions of this chapter is the ﬁnding that despite these peculiar featuresof box world and the fact that it has no entangling bipartite joint measurements (a crucialstep in the original quantum paradox), a consistent outside perspective of the memory updateexists such that with our generalised assumptions, a multi-agent paradox can be recovered.This indicates that the reversibility of dynamics akin to quantum unitarity is not necessaryto derive this kind of paradox. We suspect that contextuality, and the existence of a suitableinformation-preserving memory update (Deﬁnition 8.2.7) are necessary conditions for derivingsuch paradoxes. We discuss the relation to contextuality further in Section 8.5.4, and leave adetailed analysis to future work.

Other models for physical measurements.

Ours is not the ﬁrst attempt at coming upwith a (partial) physical theory that reproduces the statistics of box world. Here we reviewthe approach of Skrzypczyk et al. in [194]. There the authors consider a variation of box worldthat has a reduced set of physical states (which the authors call genuine ), which consists ofthe PR box and all the deterministic local boxes. The wealth of box world state vectors (i.e.the non-signaling polytope, or what we could call epistemic states) is recovered by allowingclassical processing of inputs and outputs via classical wirings, as well as convex combinationsthereof. In contrast, box world takes all convex combinations of maximally non-signaling boxes(of which the PR box is an example) to be genuine physical states; this becomes relevant aswe require the allowed physical operations to map such states to each other. For the restricted

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY state space of [194], the set of allowed operations is larger than in box world, particularly formultipartite settings. For example, there we are allowed maps that implement the equivalentof entanglement swapping: if Bob shares a PR box with Alice, and another with Charlie, thereis an allowed map that he can apply on his two halves which leaves Alice and Charlie sharing aPR box, with some probability. It would be interesting to try to model memory update in thismodiﬁed theory, to see if (1) there is a more natural implementation of measurements withinthe extended set of operations, and (2) whether this theory allows for multi-agent paradoxes.

While we have shown that a consistency paradox, similar to the one arising in the Frauchiger-Renner setup, can also be adapted for the box world in terms of GPTs, it still remains unclearhow to characterize all possible theories where it is possible to ﬁnd a setup leading to a contra-diction. It seems that contextuality is a key property of such theories, this is discussed in moredetail in Section 8.5.4. Another central ingredient seems to be information-preserving modelsfor physical measurements such as our memory update of Deﬁnition 8.2.7, which allow us toreplace counter-factuals with actual measurements, performed in sequence by diﬀerent agents.

Beyond standard composition of systems.

Additionally, it is still an open problem to ﬁndan operational way to state the outside view of measurements (and a memory update operation),for theories without a prior notion of subsystems and a tensor rule for composing them. Thiswill allow us to search for multi-agent logical paradoxes in ﬁeld theories, for example. Onepossible direction is to use notions of eﬀective and subjective locality, as outlined for examplein [130].

Multi-agent logical paradoxes involve chains (or possibly more general structures) of statementsthat cannot be simultaneously true in a consistent manner. Contextuality, on the other hand,can often be expressed in terms of the inability to consistently assign deﬁnite outcome valuesto a set of measurements [126, 195].The examples of Frauchiger-Renner in quantum theory and the the present one in box world,both arise in contextual theories. Given this, and the strong links between these, Hardy’sparadox [108, 109] and logical contextuality [4] (also pointed out in Section 7.2)— our hypothesisis that a contextual physical theory, when applied to systems that are themselves reasoningagents, can give rise to logical multi-agent paradoxes. The fact that such theories may allowa very diﬀerent description of a measurement process from the points of views of an agentperforming the measurement vs an outside agent (who analyses this agent and her systemtogether) also has an important role to play in these paradoxes. In the quantum case thisis closely linked to the measurement problem, the problem of reconciling unitary dynamics

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY (outside view) and non-unitary “collapse" (inside view). The existence of a connection betweenmulti-agent paradoxes and contextuality is hard to miss, but it is the nature of this connectionthat is unknown i.e., are all proofs of multi-agent logical paradoxes proofs of contextuality,or vice-versa? Based on our results, and the analysis of the Frauchiger-Renner experimentpresented in Section 7.2, the hunch is that contextuality is necessary, but not suﬃcient forderiving such paradoxes: speciﬁc forms of contextuality, combined with suitable conditions onthe memory update might be required. These questions will be formally addressed in futurework. Nevertheless, in the following, we provide an overview of further connections and somemore speciﬁc open questions in this direction.

Liar cycles.

In [4] relations between logical paradoxes and quantum contextuality are ex-plored; in particular, the authors point out a direct connection between contextuality and atype of classic semantic paradoxes called

Liar cycles [65]. A Liar cycle of length N is a chainof statements of the form: φ = “ φ is true ′′ , φ = “ φ is true ′′ , . . . , φ N − = “ φ N is true ′′ , φ N = “ φ is false ′′ . (8.2)It can be shown that the patterns of reasoning which are used in ﬁnding a contradiction in thechain of statements above are similar to the reasoning we make use of in FR-type arguments,and can also be connected to the cases of PR box (which corresponds to a Liar cycle of length4) and Hardy’s paradox. This further suggests that multi-agent paradoxes are closely linked tothe notion of contextuality. Relation to logical pre-post selection paradoxes.

In [176], it has been shown that everyproof of a logical pre-post selection paradox is a proof of contextuality. The exact connectionbetween FR-type paradoxes and logical pre-post selection paradoxes is not known and thiswould be an interesting avenue to explore which would also provide insights into the relationshipbetween FR paradoxes and contextuality.

In this section, we describe how a box world agent would measure a system and store the resultin a memory. From the perspective of an outside observer (who does not know the outcome ofthe agent’s measurement), we describe the initial and ﬁnal states of the system and memorybefore and after the measurement as well as the transformation that implements this memoryupdate in box world. In the quantum case, any initial state of the system S is mapped to HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY an isomorphic joint state of the system S and memory M (see Equation (8.1)) and hence thememory update map that maps the former to the latter (an isometry in this case ) satisﬁesDeﬁnition 8.2.7 of an information-preserving memory update. We will now characterise theanalogous memory update map in box world and show that it also satisﬁes Deﬁnition 8.2.7. Theorem 8.6.1.

In box world, there exists a valid transformation u that maps every arbitrary,normalized state ⃗ P Sin of the system S to an isomorphic ﬁnal state ⃗ P SMfin of the system S andmemory M and hence constitutes an information-preserving memory update (Deﬁnition 8.2.7).Proof. To simplify the argument, we will describe the proof for the case where S and M aregbits. For higher dimensional systems, a similar argument holds, this will be explained at theend of the proof.We start with the system in an arbitrary, normalized gbit state ⃗ P Sin = ( p − p ∣ q − q ) T (where the subscript T denotes transpose and p, q ∈ [ , ] ) and the memory initialised to oneof the 4 pure states , say ⃗ P Min = ⃗ P = ( ∣ ) T . Then the joint initial state, ⃗ P SMin =( p − p ∣ q − q ) TS ⊗ ( ∣ ) TM of the system and memory can be written as follows, where P in ( a = i, a ′ = j ∣ X = k, X ′ = l ) denotes the probability of obtaining the outcomes a = i and a ′ = j when performing the ﬁducial measurements X = k and X ′ = l on the system and memoryrespectively, in the initial state ⃗ P SMin . ⃗ P SMin = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) P in ( a = , a ′ = ∣ X = , X ′ = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ SM = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ p − p p − p q − q q − q ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ SM (8.3)The rest of the proof proceeds as follows: we ﬁrst describe a ﬁnal state ⃗ P SMfin of the systemand memory and a corresponding memory update map U that satisfy Deﬁnition 8.2.7 of ageneralized information-preserving memory update. Then, we show that this map can be seenan allowed box world transformation which completes the proof. An isometry since it introduces an initial pure state on M , followed by a joint unitary on SM . It does not matter which pure state the memory is initialized in, a similar argument applies in all cases.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY

If an agent performs a measurement on the system, the state of the memory must be up-dated depending on the outcome and the ﬁnal state of the system and memory after themeasurement must hence be a correlated (i.e., a non-product) state. Although the full statespace of the 2 gbit system SM is characterised by the 4 ﬁducial measurements ( X, X ′ ) ∈{( , ) , ( , ) , ( , ) , ( , )} , Deﬁnition 8.2.7 allows us to restrict possible ﬁnal states to a usefulsubspace of this state space that contain correlated states of a certain form. The deﬁnitionrequires that for every map E S on the system before measurement, there exists a correspondingmap E SM on the system and memory after the measurement that is operationally identical.Thus it suﬃces if the joint ﬁnal state ⃗ P SMfin belongs to a subspace of the 2 gbit state space forwhich only 2 of the 4 ﬁducial measurements are relevant for characterising the state, namelyany 2 ﬁducial measurements on ⃗ P SMfin that are isomorphic to the 2 ﬁducial measurements on ⃗ P Sin .Note that by deﬁnition of ﬁducial measurements, the outcome probabilities of any measurementcan be found given the outcome probabilities of all the ﬁducial measurements and without lossof generality, we will only consider the case where the agents perform ﬁducial measurements ontheir systems.A natural isomorphism between ﬁducial measurements on ⃗ P Sin and those on ⃗ P SMfin to considerhere (in analogy with the quantum case) is: X = i ⇔ ( X, X ′ ) = ( i, i ) , ∀ i ∈ { , } i.e., onlyconsider the cases where the ﬁducial measurements performed on S and M are the same.Now, in order for the states to be isomorphic or operationally equivalent, one requires thatperforming the ﬁducial measurements ( X, X ′ ) = ( i, i ) on ⃗ P SMfin should give the same outcomestatistics as measuring X = ⃗ P Sin . This can be satisﬁed through an identical isomorphismon the outcomes: a = i ⇔ ( a, a ′ ) = ( i, i ) , ∀ i ∈ { , } . Then the ﬁnal state of the system andmemory, ⃗ P SMfin will be of the form ⃗ P SMfin = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) P fin ( a = , a ′ = ∣ X = , X ′ = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ SM = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ p − p ∗∗∗∗∗∗∗∗ q − q ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ SM , (8.4)where ∗ are arbitrary, normalised entries and where P fin ( a = i, a ′ = j ∣ X = k, X ′ = l ) denotesthe probability of obtaining the outcomes a = i and a ′ = j when performing the ﬁducialmeasurements X = k and X ′ = l on the system and memory respectively, in the ﬁnal state HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY ⃗ P SMfin . This ﬁnal state can be compressed since the only relevant and non-zero probabilities in ⃗ P SMfin occur when X = X ′ and a = a ′ . We can then deﬁne new variables ˜ X and ˜ a such that X = X ′ = i ⇔ ˜ X = i and a = a ′ = j ⇔ ˜ a = j for i, j ∈ { , } and ⃗ P SMfin can equivalently be writtenas in Equation (8.5) which is clearly of the same form as P Sin . ⃗ P SMfin ≡ ⎛⎜⎜⎜⎜⎝ P ( ˜ a = ∣ ˜ X = ) P ( ˜ a = ∣ ˜ X = ) P ( ˜ a = ∣ ˜ X = ) P ( ˜ a = ∣ ˜ X = ) ⎞⎟⎟⎟⎟⎠ SM = ⎛⎜⎜⎜⎝ p − pq − q ⎞⎟⎟⎟⎠ SM (8.5)Hence the initial state of the system, ⃗ P Sin = ( p − p ∣ q − q ) T (which is an arbitrary gbit state) isisomorphic to the ﬁnal state of the system and memory, ⃗ P SMfin (as evident from Equation (8.5))with the same outcome probabilities for X = , X = ,

1. This implies that for everytransformation E S on the former, there exists a transformation E SM on the latter such thatfor all outside agents A j and for all p, q ∈ [ , ] (i.e., all possible input gbit states on thesystem), K j φ [ E S ( ⃗ P Sin )] ⇒ K j φ [ E SM ○ ⃗ P SMfin ] , where ⃗ P finSM = U ( ⃗ P inS ) . Thus any map U thatmaps ⃗ P SMin = ⃗ P inS ⊗ P Min to ⃗ P finSM satisﬁes Deﬁnition 8.2.7.We now ﬁnd a valid box world transformation that maps the initial state ⃗ P SMin (Equation (8.3))to any ﬁnal state of the form ⃗ P SMfin (Equation (8.4)) which would correspond to the memoryupdate map U .Noting that all bipartite transformations in box world can be decomposed to a classical circuitof a certain form (see Appendix 2.4.2 or the original paper [16] for details), In Figure 8.4, weconstruct an explicit circuit of this form that converts ⃗ P SMin to ⃗ P SMfin . By construction, we onlyneed to consider the case of X = X ′ since for X ≠ X ′ , the entries of ⃗ P SMfin can be arbitraryand are irrelevant to the argument. For X ≠ X ′ , one can consider any such circuit descriptionand it is easy to see that ⃗ P SMin = ( p − p ∣ q − q ) TS ⊗ ( ∣ ) TM is indeed transformed into ⃗ P SMfin = ( p − p ∣∗ ∗ ∗ ∗∣∗ ∗ ∗ ∗∣ q − q ) TSM through the map U deﬁnedby these sequence of steps. For example, if the circuit description for the X ≠ X ′ case is sameas that for the X = X ′ case, then the resultant memory update map is equivalent to the circuitof Figure 8.3a which corresponds to performing a ﬁxed measurement X ′ = M and a classical CNOT on the output wire of M controlled by the output wire of S . Theﬁnal state in that case is ( p − p ∣ p − p ∣ q − q ∣ q − q ) TSM .For higher dimensional systems S with n > X ∈ { , ..., n − } and k > a ∈ { , ..., k − } , let b n and b k be the number of bits required to represent n and k in binary respectively. Then the memory M would be initialized to b k copies of the purestate ⃗ P Min,n = ( ∣ ... ∣ ) TM which contains n identical blocks (one for each of the n ﬁducialmeasurements). One can then perform the procedure of Figure 8.4 “bitwise” combining eachoutput bit with one pure state of M and apply the same argument to obtain the result. For The output wires of boxes carry classical information after the measurement.

HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY the speciﬁc case of the memory update transformation of Figure 8.3a, this would correspondto a bitwise CNOT on the output wires of S and M . So far, we have considered a single agent measuring a system in her lab. We can also considersituations where multiple agents jointly share a state and measure their local parts of the state,updating their corresponding memories. One might wonder whether the initial correlations inthe shared state are preserved once the agents measure it to update their memories (clearlythe local measurement probabilities remain unaltered as we saw in this section). The answer isaﬃrmative and this is what allows us to formulate the Frauchiger-Renner paradox in box worldas done in the Section 8.4, even though a coherent copy analogous to the quantum case doesnot exist here.

Theorem 8.6.2.

Suppose that Alice and Bob share an arbitrary bipartite state ⃗ P P Rin (whichmay be correlated), locally perform a ﬁducial measurement on their half of the state and storethe outcome in their local memories A and B . Then the ﬁnal joint state ⃗ P ˜ A ˜ Bfin of the systems ˜ A ∶= P A and ˜ B ∶= RB as described by outside agents is isomorphic to ⃗ P P Rin with the systems ˜ A and ˜ B taking the role of the systems P and R i.e., local memory updates by Alice and Bobpreserve any correlations initially shared between them.Proof. In the following, we describe the proof for the case where the bipartite system sharedby Alice and Bob consists of 2 gbits, however, the result easily generalises to arbitrary higherdimensional systems by the argument presented in the last paragraph of the proof of Theo-rem 8.6.1.Let ⃗ P P Rin be an arbitrary 2 gbit state with entries P in ( ab = ij ∣ XY = kl ) ( i, j, k, l ∈ { , } ), whichcorrespond to the joint probabilities of Alice and Bob obtaining the outcomes a = i and b = j when measuring X = k and Y = l on the P and R subsystems when sharing that initial state.Let X ′ , a ′ ∈ { , } and Y ′ , b ′ ∈ { , } be the ﬁducial measurements and outcomes for the memorysystems A and B (also gbits) respectively. We describe the measurement and memory updateprocess for each agent separately and characterise the ﬁnal state of Alice’s and Bob’s systemsand memories after the process as would appear to outside agents who do not have access toAlice and Bob’s measurement outcomes. This analysis does not depend on the order in whichAlice and Bob perform the measurement as the correlations are symmetric between them, sowithout loss of generality, we can consider Bob’s measurement ﬁrst and then Alice’s.Suppose that Bob’s memory B is initialised to the state ⃗ P Bin = ⃗ P B = ( ∣ ) TB . Thenthe joint initial state of the Alice’s and Bob’s system and Bob’s memory as described byan agent Wigner outside Bob’s lab is ⃗ P P RBin = ⃗ P P Rin ⊗ ⃗ P B . This can be expanded as followswhere P in ( abb ′ = ijk ∣ XY Y ′ = lmn ) represents the probability of obtaining the binary outcomes a = i , b = j , b ′ = k when performing the binary ﬁducial measurements X = l , Y = m , Y ′ = n onthe initial state ⃗ P P RBin . HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY ⃗ P P RBin = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = )⋅⋅⋅ P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) P in ( abb ′ = ∣ XY Y ′ = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ P RB = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) ⋅⋅⋅ P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ P RB (8.6) ⃗ P P RBin has 8 blocks G XY Y ′ , one for each value of ( X, Y, Y ′ ) and is a product state with 4 equalpairs of blocks, G in = G in , G in = G in , G in = G in , G in = G in since both measurements onthe initial state of B give the same outcome.Now, the outside observer Wigner will describe the transformation on RB through the memoryupdate map U of Figure 8.4. Let ⃗ P P RBfin be the ﬁnal state that results by applying this map tothe systems RB in the initial state ⃗ P P RBin . Any transformation on a system characterised by n ﬁducial measurements with k outcomes each can be represented by a nk × nk block matrixwhere each block is a k × k matrix (see [16] for further details), for the system RB , n = k = U RB would be a 16 ×

16 block matrix of the following form whereeach U ij is a 4 × U RB = ⎛⎜⎜⎜⎜⎜⎜⎝ U ⋅ ⋅ ⋅ U ⋅ ⋅⋅ ⋅⋅ ⋅ U ⋅ ⋅ ⋅ U ⎞⎟⎟⎟⎟⎟⎟⎠ RB Here, the ﬁrst 4 rows decide the entries in the ﬁrst block of the transformed matrix, the next4, the second block and so on. Noting that the memory update transformation (Figure 8.4)merely permutes elements within the relevant blocks (and does not mix elements betweendiﬀerent blocks), the only non-zero blocks of U RB are the diagonal ones U ii . Further, by thesame argument as in Theorem 8.6.1, the only relevant entries in the transformed state are whenthe same ﬁducial measurement is performed on Bob’s system R and memory B i.e., only caseswhere Y = Y ′ . The remaining measurement choices maybe arbitrary for the ﬁnal state (just as HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY X = X X =

0X X ′ X X S M a a = a a ′ = a ⊕ a ⃗ P SMfin a a ′ a P Sin ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ M Figure 8.4: Classical circuit decomposition of the memory update map U asa box world transformation: The blue box represents the ﬁnal state of the system S and memory M after the memory update characterised by the ﬁducial measurements X and X ′ and the outcomes a , a ′ . Let U be the memory update map that maps theinitial state ⃗ P SMin to a ﬁnal state ⃗ P SMfin . Noting that we only need to consider the caseof X = X ′ since the for X ≠ X ′ , the entries of ⃗ P SMfin can be arbitrary, the action of T is equivalent to the circuit shown here i.e., 1) Choose X = X (= X ′ ) and perform thisﬁducial measurement on the initial state of the system ⃗ P Sin to obtain the outcome a .2) Fix X = X =

1) and perform this ﬁducial measurement on the initial state ofthe memory ⃗ P Min = ( ∣ ) TM to obtain the outcome a . 3) Set a = a . 4) If a = a ′ = a , otherwise set a ′ = a ⊕

1, where ⊕ denotes modulo 2 addition. they are for X ≠ X ′ in Equation (8.4)). This means that among the 4 diagonal blocks, only 2of them are relevant. The 4 ﬁducial measurements on RB are Y Y ′ = , , ,

11 and in thatorder, only the ﬁrst and fourth are relevant since they correspond to Y = Y ′ . Within theserelevant blocks (in this case U and U ), the operation is a CNOT on the output b ′ controlledby the output b and we have the following matrix representation of the memory update map U of Figure 8.4 . U RB = ⎛⎜⎜⎜⎝ CN ∗ ∗

00 0 0 CN ⎞⎟⎟⎟⎠ RB , CN = ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ (8.7) The memory update map corresponding to the circuit of Figure 8.3a is a speciﬁc case of this map wherethe arbitrary blocks ∗ are also equal to CN HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY where 0 represents the 4 × ∗ can be arbitrary. The ﬁnal state ⃗ P P RBfin as seen by Wigner is then ⃗ P P RBfin = (I P ⊗ U RB ) ⃗ P P RBin = (I P ⊗ U RB )[ ⃗ P P Rin ⊗ ⎛⎜⎜⎜⎝ ⎞⎟⎟⎟⎠ B ] , (8.8)where I P is the identity transformation on the P system. Since the CN blocks are the onlyrelevant blocks in u RB and each block of ⃗ P P RBin has the same pattern of non-zero and zeroentries (Equation (8.6)), it is enough to look at the action of I P ⊗ CN on the ﬁrst block G in of ⃗ P P RBin . Noting that I P is a 2 × (I P ⊗ CN ) G in = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ = G fin = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) P fin ( abb ′ = ∣ XY Y ′ = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ , where P fin ( abb ′ = ijk ∣ XY Y ′ = lmn ) represents the probability of obtaining the outcomes a = i , b = j , b ′ = k when performing the ﬁducial measurements X = l , Y = m , Y ′ = n on the ﬁnalstate ⃗ P P RBfin and G fin is the ﬁrst block of this ﬁnal state. Clearly the only non-zero outcomeprobabilities are when b = b ′ and this allows us to compress the ﬁnal state by deﬁning ˜ b = i ⇔ b = b ′ = i for i ∈ { , } and we have the following. (I P ⊗ CN ) G in ≡ ⎛⎜⎜⎜⎝ P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) ⎞⎟⎟⎟⎠ = ⎛⎜⎜⎜⎜⎝ P fin ( a ˜ b = ∣ XY Y ′ = ) P fin ( a ˜ b = ∣ XY Y ′ = ) P fin ( a ˜ b = ∣ XY Y ′ = ) P fin ( a ˜ b = ∣ XY Y ′ = ) ⎞⎟⎟⎟⎟⎠ = G in Here G in is the ﬁrst block of the initial state ⃗ P P Rin and we have that the ﬁrst block of the ﬁnalstate of

P RB is equivalent (up to zero entries) to the ﬁrst block of the initial state over

P R alone or G fin = G in . Among the 8 blocks of ⃗ P P RBfin , only the 4 blocks G fin , G fin , G fin and G fin are HAPTER 8. MULTI-AGENT PARADOXES BEYOND QUANTUM THEORY the relevant ones (since Y = Y ′ for these) and we can similarly show that G fin ≡ G in , G fin ≡ G in and G fin ≡ G in for the remaining 3 relevant blocks. Deﬁning ˜ Y = i ⇔ Y = Y ′ = i for i ∈ { , } ,we obtain ⃗ P P RBfin = ⃗ P P ˜ Bfin ≡ ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) P fin ( a ˜ b = ∣ X ˜ Y = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) P in ( ab = ∣ XY = ) ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ = ⃗ P P Rin (8.9)Equation (8.9) shows that ﬁnal state ⃗ P P ˜ Bfin of Alice’s system P , Bob’s system R and Bob’smemory B after Bob’s local memory update is isomorphic to the initial state ⃗ P P Rin sharedby Alice and Bob, having the same outcome probabilities as the latter for all the relevantmeasurements. Thus the initial correlations present in ⃗ P P Rin are preserved after Bob locallyupdates his memory according to the update procedure of Figure 8.4. One can now repeat thesame argument for Alice’s local memory update taking ⃗ P P ˜ Bfin ⊗ ( ∣ ) TA to be the initialstate and by analogously deﬁning ˜ s = i ⇔ s = s ′ = i for s ∈ { a, X } , i ∈ { , } , we have the requiredresult that the ﬁnal state after both parties perform their local memory updates (as describedby outside agents Ursula and Wigner) is isomorphic and operationally equivalent to the initialstate shared by the parties before the memory update. ⃗ P P ARBfin = ⃗ P ˜ A ˜ Bfin ≡ ⃗ P P Rin (8.10)

HAPTER Conclusions and outlook

This two part thesis proposes new techniques for analysing acyclic and cyclic causal struc-tures as well as multi-agent logical paradoxes in non-classical theories. By developing thesetechniques, we have derived interesting preliminary insights and have identiﬁed a number ofpossible directions for future work on these aspects of quantum and post-quantum theories,which have both foundational as well as practical applications.The results reported in Part I pertain to the entropic analysis of causal structures (Chapters 4and 5) and to the connections between the notions of causation and space-time (Chapter 6).In the former case, we have developed a systematic method for analysing causal structuresusing generalised entropies, providing, in particular, a new set of Tsallis entropic inequalitiesfor the Triangle causal structure. We have also extended our analysis to post-selected causalstructures and identiﬁed families of distributions whose non-classicality can be detected inBell scenarios with non-binary outcomes. Further, the properties of classical and quantumTsallis entropies derived in Chapter 4 can be potentially useful beyond the analysis of causalstructures, since Tsallis entropies are widely employed in other areas of information theory aswell as non-extensive statistical physics.On the other hand, our results also reveal signiﬁcant drawbacks of the entropic technique forcertifying the non-classicality of causal structures, both with and without the additional post-selection method. We identiﬁed the main reasons for the computational intractability of theentropy vector method using Tsallis entropies. Although we were able to circumvent this issue toderive new Tsallis entropic inequalities for the Triangle scenario, no quantum violations of theseinequalities were found even by known non-classical distributions in the scenario. We notedthat Rényi entropies do not seem to overcome these limitations either, due to the lack of linearconstraints required to construct a good initial outer approximation to the Rényi entropy cone.In post-selected causal structures, we provided numerical and analytical evidence suggestingthat both Shannon and Tsallis entropic inequalities are in general insuﬃcient for detectingnon-classicality in Bell scenarios with binary inputs and three or more outcomes per party.213

HAPTER 9. CONCLUSIONS AND OUTLOOK

Limitations for distinguishing between classical and non-classical correlations in causal struc-tures using Shannon entropies have been previously encountered in the absence of post-selection[215, 216]. By showing that similar limitations arise even for generalised entropies and in thepresence of post-selection, our results shed further light on the debate regarding the usefulnessof the entropic method for this purpose. It appears that the advantages of working in entropyspace such as convexity and better scaling of the number of bounding inequalities come at thecost of losing valuable information about the non-classical nature of the correlations, even inpost-selected causal structures. Our results illuminate some of the merits as well as limitationsof the entropic technique and point to the need for developing novel methods for understandingcausation in non-classical theories.In general, this is a diﬃcult task— fully characterising the bounding inequalities for the setof classical correlations in a given causal structure is known to be an NP-complete problem[170]. Therefore, increase in the computational power of current technologies and improve-ments to the performance of variable elimination algorithms would naturally increase the set ofscenarios that can be characterised. For example, developing a variable elimination algorithm(such as Fourier-Motzkin) capable of dealing with systems of non-linear constraints would allowfor tighter entropic characterisations through the inclusion of non-linear inequalities. One canalso consider other algorithms for polyhedral projection that can sometimes produce resultseven when Fourier-Motzkin Elimination (FME) becomes intractable [99]. We have provided amethod to bypass the FME to derive generalised entropic inequalities in cases where Shannonentropic inequalities are known. Our Tsallis inequalities thus derived depend on the cardinalityof the variables in the Triangle causal structure. Hence one could, in principle, investigate di-mension dependent distinctions between classical and quantum correlations using these, whilealso taking into account the new class of non-local distributions recently discovered in the Tri-angle network [179, 129]. Other possibilities include employing the inﬂation technique [224] incombination with the entropic method, constructing more eﬃcient algorithms (than the ran-dom search typically employed) for ﬁnding quantum violations, and considering an alternativedeﬁnition of the Tsallis conditional entropies [3] that satisfy simpler causal constraints but vi-olate the chain rule. These methods could lead to improvements but are not guaranteed to becomputationally tractable.Cryptographic and other information-processing applications have greatly beneﬁted from thecharacterisation of the sets of correlations in the Bell scenario [79, 145, 17, 169, 60, 59], whichare relatively well understood due to their convexity. Often, these applications also go beyondthe Bell scenario and involve networks of parties sharing multiple sources of entanglement,where the set of observed correlations in probability space is no longer convex. The entan-glement swapping protocol [26, 230] is an important example, which also plays a role in theconstruction of quantum repeater networks [35] used in quantum communication scenarios andthe quantum internet. Therefore, exploring diﬀerent techniques for certifying non-classicalityin and beyond the Bell causal structure, and understanding their advantages and limitations asdone in this thesis is also of practical importance. At the foundational level, this reveals fun-damental distinctions between the principles of causality that manifest themselves in classicaland quantum regimes. The information-theoretic approach to causal structures also allow usto consider post-quantum theories and identify physical principles other than relativistic onesthat single out quantum theory among this spectrum of theories.

HAPTER 9. CONCLUSIONS AND OUTLOOK

In Chapter 6, we have further explored this fundamental aspect by developing a causal mod-elling framework for a general class of physical theories, and formalising relationships betweencausality, space-time and relativistic principles (such as the inability to signal superluminally)therein. This allowed us to analyse scenarios involving superluminal inﬂuences and causal loopsthat nevertheless do not lead to superluminal signalling at the operational level. In particular,our framework provides an initial tool set for causally modelling post-quantum theories admit-ting jamming non-local correlations [105, 116], which have not been rigorously analysed before,and for identifying further physical principles underlying such theories.Moreover, these results also generate scope for interesting future investigations into causalityand space-time structure. One promising avenue would be to extend this work to constructa more complete framework for causally modelling cyclic and ﬁne-tuned causal structures innon-classical theories, which is likely to be of independent interest. Furthermore, this workcan aid an operational analysis of closed time-like curves (CTCs) in the presence of ﬁne-tunedcausal inﬂuences, and an investigation of whether ﬁne-tuned and non ﬁne-tuned interpretationsof quantum theory behave diﬀerently in the presence of CTCs. Yet another direction is toexamine simulations of indeﬁnite causal structures [111, 154, 11] using CTCs and ﬁne-tuning,and analyse their embeddings within a ﬁxed space-time structure. Indeﬁnite causal structurescan in theory be implemented through a quantum superposition of gravitating masses [231], theyhave also been claimed to be physically implemented by superposing the orders of quantumoperations on a target system [174, 185]. The space-time information associated with theseoperations plays an important role in distinguishing between the two implementations [156] .Further, general relativity is known to admit CTC solutions [205, 100] even in the absenceof quantum eﬀects, while certain fundamental principles of quantum theory such as the no-cloning principle and linearity may no longer hold in the presence of CTCs [75]. A deeperunderstanding of causality and space-time structure in non-classical theories, in the presenceof causal loops would hence provide insights not only into quantum theory, but potentiallyalso into its uniﬁcation with general relativity, which has been a longstanding open problem intheoretical physics. The thesis provides a small contribution towards these larger goals.Part II of this thesis contributes to an understanding of multi-agent logic in scenarios wherereasoning agents model each other’s memories as systems of a physical theory. Generalising theconditions of the Frauchiger-Renner quantum experiment to generalised probabilistic theoriesallowed us to derive a stronger paradox in the particular GPT of box world. Our result revealsthat reversibility of the memory update akin to quantum unitarity is not necessary for derivingsuch multi-agent paradoxes. Rather, a logical form of contextuality along with an appropriatenotion of an information-preserving memory update appear to be more fundamental for de-riving a deterministic paradox. This work paves the way for a more general characterisationof Frauchiger-Renner type paradoxes, their relationships to contextuality and the structure ofmulti-agent logic in quantum and post-quantum theories.Apart from the implications for logic, these multi-agent paradoxes also challenge standardapproaches to causal modelling that we have discussed in the ﬁrst part of this thesis. In these As we also ﬁnd in: Vilasini, V., del Rio, L. and Renner, R.

Causality in deﬁnite and indeﬁnite space-times.

In preparation (2020). https://wdi.centralesupelec.fr/users/valiron/qplmfps/papers/qs01t3.pdf

HAPTER 9. CONCLUSIONS AND OUTLOOK settings, a system, such as a measurement outcome that appears classical to one agent maybe modelled as an entangled subsystem (entangled with the ﬁrst agent’s memory) by anotheragent. Therefore the distinction between observed and unobserved systems in a causal structurecan become subjective, and a detailed investigation of causality in these general settings is yetto be undertaken.The fact that the agents do not necessarily model the measurement outcomes of other agentsas classical systems implies that one cannot assume the existence of a joint distribution overall these outcomes. As we have also seen in Part I of this thesis, no joint distribution can ingeneral be assigned to non-coexisting systems in a non-classical causal structure. It is preciselythis inability to make a global assignment of values to all measurement outcomes that resultsin the apparent paradox, suggesting that these outcomes cannot be seen as objective factsabout the world, but only subjective to the agents who observed them. In both the quantumand box-world versions of the paradox, the states and measurements involved produce non-classical correlations in the bipartite Bell causal structure. The treatment of agents as systemsof a non-classical physical theory promotes this non-classicality to the level of the outcomesobserved by the agents, however it appears that not all non-classical correlations can lead tosuch paradoxes. A better understanding of non-classical correlations in causal structures wouldhence enhance our knowledge about these general settings involving complex quantum systemsthat are capable of logical deductions.With suﬃcient technological progress in creating and manipulating stable many-qubit super-positions, the physical implementation of multi-agent paradoxes may be a distinct possibilityin the near future, since a relatively small quantum computer would fulﬁll the role of an agentfor these purposes. There are global eﬀorts towards achieving such macroscopic superpositions,both for the practical purposes of scalable quantum computing and for pushing the horizonsof fundamental physics by probing the quantum eﬀects of gravity. These make it imperativeto develop more general theoretical frameworks for modelling multi-agent paradoxes as wellas causality and space-time structure in non-classical theories. We believe there are severaldeep connections between causality and multi-agent logical paradoxes in non-classical theoriesthat are yet to be discovered, and hope that the results presented in both parts of this thesiscontribute towards a more complete physical framework or theory in the future. ibliography [1]

Aaronson, S.

Why philosophers should care about computational complexity. arXiv:1108.1791 (2011).[2]

Aaronson, S., and Watrous, J.

Closed timelike curves make quantum and classicalcomputing equivalent.

Proceedings of the Royal Society A: Mathematical, Physical andEngineering Sciences 465 , 2102 (2008), pp. 631–647.[3]

Abe, S., and Rajagopal, A.

Nonadditive conditional entropy and its signiﬁcance forlocal realism.

Physica A: Statistical Mechanics and its Applications 289 , 1 (2001), pp.157–164.[4]

Abramsky, S., Barbosa, R. S., Kishida, K., Lal, R., and Mansfield, S.

Contex-tuality, Cohomology and Paradox. (2015), Ed. Stephan Kreutzer, pp. 211–228.[5]

Acin, A., and Masanes, L.

Certiﬁed randomness in quantum physics.

Nature 540 ,7632 (2016), pp. 213–219.[6]

Allen, J.-M. A., Barrett, J., Horsman, D. C., Lee, C. M., and Spekkens,R. W.

Quantum common causes and quantum causal models.

Physical Review X 7 , 3(2017), p. 031021.[7]

Araki, H., and Lieb, E. H.

Entropy inequalities.

Communications in MathematicalPhysics 18 , 2 (1970), pp. 160–170.[8]

Araújo, M., Branciard, C., Costa, F., Feix, A., Giarmatzi, C., and Brukner,Č.

Witnessing causal nonseparability.

New Journal of Physics 17 , 10 (2015), p. 102001.[9]

Araújo, M., Costa, F., and Brukner, Č.

Computational Advantage fromQuantum-Controlled Ordering of Gates.

Physical Review Letters 113 , 25 (2014), p.250402.[10]

Araújo, M., Feix, A., Navascués, M., and Brukner, Č.

A puriﬁcation postulatefor quantum mechanics with indeﬁnite causal order.

Quantum 1 (2016), p. 10.[11]

Araújo, M., Guérin, P. A., and Baumeler, A.

Quantum computation with indef-inite causal structures.

Physical Review A 96 (2017), p. 052315.217

IBLIOGRAPHY [12]

Arnon-Friedman, R., Dupuis, F., Fawzi, O., Renner, R., and Vidick, T.

Prac-tical device-independent quantum cryptography via entropy accumulation.

Nature Com-munications 9 , 1 (2018), p. 459.[13]

Audenaert, K.

Subadditivity of q-entropies for q > Journal of Mathematical Physics48 (2007), p. 083507.[14]

Avis, D., Imai, H., Ito, T., and Sasaki, Y.

Deriving tight Bell inequalities for2 parties with many 2-valued observables from facets of cut polytopes. arxiv:quant-ph/0404014 (2004).[15]

Bancal, J.-D., Gisin, N., and Pironio, S.

Looking for symmetric Bell inequalities.

Journal of Physics A: Mathematical and Theoretical 43 , 38 (2010), p. 385303.[16]

Barrett, J.

Information processing in generalized probabilistic theories.

Physical Re-view A 75 (2007), p. 032304.[17]

Barrett, J., Hardy, L., and Kent, A.

No signaling and quantum key distribution.

Physical Review Letters 95 (2005), p. 010503.[18]

Barrett, J., Lorenz, R., and Oreshkov, O.

Cyclic quantum causal models. arxiv:2002.1215731 (2020).[19]

Baumann, V., and Wolf, S.

On Formalisms and Interpretations.

Quantum 2 (2018),p. 99.[20]

Baumeler, A., Degorre, J., and Wolf, S.

Bell correlations and the common future.

Quantum Foundations, Probability and Information (2018), pp. 255–268.[21]

Baumeler, Ä., and Wolf, S.

The space of logically consistent classical processeswithout causal order.

New Journal of Physics , 1 (2016), p. 013036.[22]

Bell, J. S.

On the Einstein Podolsky Rosen paradox.

Physics Physique Fizika 1 (1964),pp. 195–200.[23]

Bemporad, A., Fukuda, K., and Torrisi, F. D.

Convexity recognition of the unionof polyhedra.

Computational Geometry 18 , 3 (2001), pp. 141–154.[24]

Bennett, C., Brassard, G., Crepeau, C., and Maurer, U.

Generalized privacyampliﬁcation.

IEEE Transactions on Information Theory 41 (1995), pp. 1915–1923.[25]

Bennett, C., and Schumacher, B.

Talk at QUPON Wien, (2005).[26]

Bennett, C. H., Brassard, G., Crépeau, C., Jozsa, R., Peres, A., and Woot-ters, W. K.

Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels.

Physical Review Letters 70 (1993), pp. 1895–1899.[27]

Bohm, D.

A suggested interpretation of the quantum theory in terms of “hidden”variables. I.

Physical Review 85 (1952), pp. 166–179.

IBLIOGRAPHY [28]

Bohr, N.

Discussions with Einstein on epistemological problems in atomic physics.

Albert Einstein: Philosopher-Scientist, The Library of Living Philosophers, Ed. Schilpp,P. A. (1949), pp. 200–241.[29]

Boltzmann, L.

Über die mechanische bedeutung des zweiten hauptsatzes der wärmethe-orie.

Wiener Berichte 53 (1866), pp. 195–220.[30]

Bong, K.-W., Utreras-Alarcón, A., Ghafari, F., Liang, Y.-C., Tischler, N.,Cavalcanti, E. G., Pryde, G. J., and Wiseman, H. M.

A strong no-go theoremon the Wigner’s friend paradox.

Nature Physics (2020), https://doi.org/10.1038/s41567–020–0990–x.[31]

Boyd, S., and Vandenberghe, L.

Convex optimization.

Cambridge University Press (2004).[32]

Branciard, C., Araújo, M., Feix, A., Costa, F., and Brukner, Č.

The simplestcausal inequalities and their violation.

New Journal of Physics 18 , 1 (2016), p. 13008.[33]

Branciard, C., Rosset, D., Gisin, N., and Pironio, S.

Bilocal versus nonbilocalcorrelations in entanglement-swapping experiments.

Physical Review A 85 (2012), p.032119.[34]

Braunstein, S. L., and Caves, C. M.

Information-theoretic Bell inequalities.

PhysicalReview Letters 61 (1988), pp. 662–665.[35]

Briegel, H.-J., Dür, W., Cirac, J. I., and Zoller, P.

Quantum repeaters: Therole of imperfect local operations in quantum communication.

Physical Review Letters 81 (1998), pp. 5932–5935.[36]

Brukner, Č.

A no-go theorem for observer-independent facts.

Entropy 20 , 350 (2018).[37]

Brukner, Č., Taylor, S., Cheung, S., and Vedral, V.

Quantum entanglementin time. arXiv:quant-ph/0402127 (2004).[38]

Brulé, J.

A causation coeﬃcient and taxonomy of correlation/causation relationships. arxiv:1708.05069 (2017).[39]

Brunner, N., Cavalcanti, D., Pironio, S., Scarani, V., and Wehner, S.

Bellnonlocality.

Reviews of Modern Physics 86 (2014), pp. 419–478.[40]

Buhrman, H., Cleve, R., Massar, S., and de Wolf, R.

Nonlocality and commu-nication complexity.

Reviews of Modern Physics 82 (2010), pp. 665–698.[41]

Carnap, R.

Meaning and necessity: a study in semantics and modal logic.

Universityof Chicago Press (1988).[42]

Castro-Ruiz, E., Giacomini, F., Belenchia, A., and Brukner, Č.

Quantumclocks and the temporal localisability of events in the presence of gravitating quantumsystems.

Nature Communications 11 , 1 (2020), p. 2672.

IBLIOGRAPHY [43]

Cerf, N. J., and Adami, C.

Negative entropy and information in quantum mechanics.

Physical Review Letters 79 (1997), pp. 5194–5197.[44]

Chaves, R.

Entropic inequalities as a necessary and suﬃcient condition to noncontex-tuality and locality.

Physical Review A 87 (2013), p. 022102.[45]

Chaves, R., Carvacho, G., Agresti, I., Di Giulio, V., Aolita, L., Giacomini,S., and Sciarrino, F.

Quantum violation of an instrumental test.

Nature Physics 14 ,3 (2018), pp. 291–296.[46]

Chaves, R., and Fritz, T.

Entropic approach to local realism and noncontextuality.

Physical Review A 85 (2012), pp. 032113.[47]

Chaves, R., Luft, L., and Gross, D.

Causal structures from entropic information:geometry and novel scenarios.

New Journal of Physics 16 , 4 (2014), p. 043001.[48]

Chaves, R., Majenz, C., and Gross, D.

Information-theoretic implications of quan-tum causal structures.

Nature Communications 6 , 1 (2015), p. 5766.[49]

Chernikov, S. N.

Contraction of systems of linear inequalities.

Doklady Akademii NaukSSSR 131 , 3 (1960), pp. 518–521.[50]

Chernikov, S. N.

The convolution of ﬁnite systems of linear inequalities.

USSR Com-putational Mathe- matics and Mathematical Physics 5 (1965), pp. 1–24.[51]

Chiribella, G., Banik, M., Bhattacharya, S. S., Guha, T., Alimuddin, M.,Roy, A., Saha, S., Agrawal, S., and Kar, G.

Indeﬁnite causal order enables perfectquantum communication with zero capacity channel. arXiv:1810.10457 (2018).[52]

Chiribella, G., D’Ariano, G. M., and Perinotti, P.

Probabilistic theories withpuriﬁcation.

Physical Review A 81 (2010), p. 062348.[53]

Chiribella, G., D’Ariano, G. M., Perinotti, P., and Valiron, B.

Quantumcomputations without deﬁnite causal structure.

Physical Review A 88 , 2 (2013), p. 022318.[54]

Choi, M.-D.

Completely positive linear maps on complex matrices.

Linear Algebra andits Applications 10 , 3 (1975), pp. 285–290.[55]

Christof, T., and Loebel, A.

POlyhedron Representation Transformation Algo-rithm. (PORTA) (1997).[56]

Clauser, J. F., and Horne, M. A.

Experimental consequences of objective localtheories.

Physical Review D 10 (1974), pp. 526–535.[57]

Clauser, J. F., Horne, M. A., Shimony, A., and Holt, R. A.

Proposed ex-periment to test local hidden-variable theories.

Physical Review Letters 23 (1969), pp.880–884.[58]

Clausius, R.

Ueber die bewegende kraft der wärme und die gesetze, welche sich darausfür die wärmelehre selbst ableiten lassen.

Annalen der Physik 155 , 3 (1850), pp. 368–397.

IBLIOGRAPHY [59]

Colbeck, R.

Quantum and relativistic protocols for secure multi-party computation.

PhD Dissertation, University of Cambridge. arxiv:0911.3814 (2007).[60]

Colbeck, R., and Kent, A.

Private randomness expansion with untrusted devices.

Journal of Physics A 44 , 9 (2011), p. 095305.[61]

Colbeck, R., and Renner, R.

A short note on the concept of free choice. arxiv:1302.4446 (2013).[62]

Colbeck, R., and Vilasini, V.

LPAssumptions (Mathematica package).https://github.com/rogercolbeck/LPAssumptions.[63]

Collins, D., and Gisin, N.

A relevant two qubit Bell inequality inequivalent to theCHSH inequality.

Journal of Physics A: Mathematical and General 37 , 5 (2004), p. 1775.[64]

Collins, D., Gisin, N., Linden, N., Massar, S., and Popescu, S.

Bell inequalitiesfor arbitrarily high-dimensional systems.

Physical Review Letters 88 (2002), p. 040404.[65]

Cook, R. T.

Patterns of paradox.

The Journal of Symbolic Logic 69 , 3 (2004), pp.767–774.[66]

Cope, T., and Colbeck, R.

Bell inequalities from no-signaling distributions.

PhysicalReview A 100 (2019), p. 022114.[67]

Costa, F., and Shrapnel, S.

Quantum causal modelling.

New Journal of Physics 18 ,6 (2016), p. 63032.[68]

Curado, E. M. F., and Tsallis, C.

Generalized statistical mechanics: connectionwith thermodynamics.

Journal of Physics A 24 (1991), pp. L69–L72.[69]

Dantzig, G. B.

Linear programming and extensions.

Princeton University Press (1963).[70]

Dantzig, G. B.

Origins of the simplex method.

A History of Scientiﬁc Computing (1990), pp. 141–151.[71]

Daróczy, Z.

Generalized information functions.

Information and Control 16 , 1 (1970),pp. 36–51.[72]

De Broglie, L.

La mécanique ondulatoire et la structure atomique de la matiére et durayonnement.

Journal of Physics Radium 8 , 5 (1952), pp. 225–241.[73] del Rio, L., Krämer, L., and Renner, R.

Resource theories of knowledge. arXiv:1511.08818 (2015).[74]

Deutsch, D.

Quantum theory as a universal physical theory.

International Journal ofTheoretical Physics 24 , 1 (1985), pp. 1–41.[75]

Deutsch, D.

Quantum mechanics near closed timelike lines.

Physical Review D 44 (1991), pp. 3197–3217.[76]

DeWitt, B. S.

Quantum mechanics and reality.

Physical Today 23 (1970), pp. 155–165.

IBLIOGRAPHY [77]

Dourdent, H.

A quantum Gödelian hunch. arXiv:2005.04274 (2020).[78]

Dürr, D., and Teufel, S.

Bohmian mechanics: The physics and mathematics ofquantum theory.

Springer (2009).[79]

Ekert, A. K.

Quantum cryptography based on Bell’s theorem.

Physical Review Letters67 (1991), pp. 661–663.[80]

Elga, A.

Self-locating belief and the sleeping beauty problem.

Analysis 60 , 2 (2000),pp. 143–147.[81]

Everett, H. “Relative state” formulation of quantum mechanics.

Reviews of ModernPhysics 29 (1957), pp. 454–462.[82]

Fagin, R., Halpern, J. Y., Moses, Y., and Vardi, M.

Reasoning about knowledge.

MIT press (2004).[83]

Fehr, S., and Berens, S.

On the conditional Rényi entropy.

IEEE Transactions onInformation Theory 60 (2014), pp. 6801–6810.[84]

Forré, P., and Mooij, J. M.

Markov properties for graphical models with cycles andlatent variables. arxiv:1710.08775 (2017).[85]

Frauchiger, D., and Renner, R.

Quantum theory cannot consistently describe theuse of itself.

Nature Communications 9 , 1 (2018), p. 3711.[86]

Freedman, S. J., and Clauser, J. F.

Experimental test of local hidden-variabletheories.

Physical Review Letters 28 (1972), pp. 938–941.[87]

Friedman, M.

The fed’s thermostat.

The Wall Street Journal (2003).[88]

Fritz, T.

Beyond Bell’s theorem: correlation scenarios.

New Journal of Physics 14 , 10(2012), p. 103001.[89]

Fritz, T., and Chaves, R.

Entropic inequalities and marginal problems.

IEEE Trans-actions on Information Theory 59 , 2 (2013), pp. 803–817.[90]

Fritz, T., and Chaves, R.

Entropic inequalities and marginal problems.

IEEE Trans-actions on Information Theory 59 (2013), pp. 803–817.[91]

Fuchs, C. A., and Schack, R.

Quantum-bayesian coherence.

Reviews of ModernPhysics 85 (2013), pp. 1693–1715.[92]

Fukuda, K.

Polyhedral computation (lecture notes), Spring 2016. .[93]

Furuichi, S.

Information theoretical properties of Tsallis entropies.

Journal of Mathe-matical Physics 47 (2004).[94]

Furuichi, S., Yanagi, K., and Kuriyama, K.

Fundamental properties of Tsallisrelative entropy.

Journal of Mathematical Physics 45 (2004).

IBLIOGRAPHY [95]

Garson, J.

Modal logic.

The Stanford Encyclopedia of Philosophy (2016), Ed. EdwardN. Zalta.[96]

Geiger, D.

Towards the formalization of informational dependencies.

Tech. rep. 880053.UCLA Computer Science (1987).[97]

Gibbs, J. W.

Elementary principles in statistical mechanics.

Charles Scribner’s Sons (1902).[98]

Giustina, M., Versteegh, M. A., Wengerowsky, S., Handsteiner, J.,Hochrainer, A., Phelan, K., Steinlechner, F., Kofler, J., Larsson, J.-Å.,Abellán, C., Amaya, W., Pruneri, V., Mitchell, M. W., Beyer, J., Gerrits,T., Lita, A. E., Shalm, L. K., Nam, S. W., Scheidl, T., Ursin, R., Wittmann,B., and Zeilinger, A.

Signiﬁcant-loophole-free test of Bell’s theorem with entangledphotons.

Physical Review Letters 115 , 25 (2015), p. 250401.[99]

Gläßle, T., Gross, D., and Chaves, R.

Computational tools for solving a marginalproblem with applications in Bell non-locality and causal modeling.

Journal of PhysicsA: Mathematical and Theoretical 51 , 48 (2018), p. 484002.[100]

Gödel, K.

An example of a new type of cosmological solutions of einstein’s ﬁeld equa-tions of gravitation.

Reviews of Modern Physics 21 (1949), pp. 447–450.[101]

Goldblatt, R.

Mathematical modal logic: A view of its evolution.

Handbook of theHistory of Logic 7 (2006), pp. 1–98.[102]

Greenberger D.M., Horne M.A., Z. A.

Going beyond Bell’s theorem.

QuantumTheory and Conceptions of the Universe. Fundamental Theories of Physics 37 (1989).[103]

Gross, D., Müller, M., Colbeck, R., and Dahlsten, O. C. O.

All reversibledynamics in maximally nonlocal theories are trivial.

Physical Review Letters 104 (2010),p. 080402.[104]

Grötschel, M., Lovasz, L., and Shrijver, A.

Geometric algorithms and combina-torial optimization.

Springer Science and Business Media (2012).[105]

Grunhaus, J., Popescu, S., and Rohrlich, D.

Jamming nonlocal quantum corre-lations.

Physical Review A 53 (1996), pp. 3781–3784.[106]

Guérin, P. A., and Brukner, Č.

Observer-dependent locality of quantum events.

New Journal of Physics 20 , 10 (2018), p. 103031.[107]

Guérin, P. A., Feix, A., Araújo, M., and Brukner, Č.

Exponential communica-tion complexity advantage from quantum superposition of the direction of communication.

Physical Review Letters 117 (2016), p. 100502.[108]

Hardy, L.

Quantum mechanics, local realistic theories, and Lorentz-invariant realistictheories.

Physical Review Letters 68 (1992), pp. 2981–2984.[109]

Hardy, L.

Nonlocality for two particles without inequalities for almost all entangledstates.

Physical Review Letters 71 (1993), pp. 1665–1668.

IBLIOGRAPHY [110]

Hardy, L.

Quantum theory from ﬁve reasonable axioms. arXiv:quant-ph/0101012 (2001).[111]

Hardy, L.

Probability theories with dynamic causal structure: A new framework forquantum gravity. arxiv:gr-qc/0509120 (2005).[112]

Hardy, L.

Operational general relativity: Possibilistic, probabilistic, and quantum. arxiv:1608.06940 (2016).[113]

Heisenberg, W.

Ist eine deterministische ergänzung der quantenmechanik möglich?

Wolfgang Pauli. Wissenschaftlicher Briefwechsel mit Bohr, Einstein, Heisenberg, Springer2 (1935), Ed. Hermann, A. and von Meyenn, K. and Weisskopf, V. F. pp. 409–418.[114]

Hensen, B., Bernien, H., Dréau, A. E., Reiserer, A., Kalb, N., Blok, M. S.,Ruitenberg, J., Vermeulen, R. F. L., Schouten, R. N., Abellán, C., Amaya,W., Pruneri, V., Mitchell, M. W., Markham, M., Twitchen, D. J., Elkouss,D., Wehner, S., Taminiau, T. H., and Hanson, R.

Loophole-free Bell inequalityviolation using electron spins separated by 1.3 kilometres.

Nature 526 , 7575 (2015), pp.682–686.[115]

Henson, J., Lal, R., and Pusey, M. F.

Theory-independent limits on correlationsfrom generalized bayesian networks.

New Journal of Physics 16 , 11 (2014), p. 113043.[116]

Horodecki, P., and Ramanathan, R.

The relativistic causality versus no-signalingparadigm for multi-party correlations.

Nature Communications 10 , 1 (2019), p. 1701.[117]

Horsman, D., Heunen, C., Pusey, M. F., Barrett, J., and Spekkens, R. W.

Can a quantum state over time resemble a quantum state at a single time?

Proceedingsof the Royal Society A 473 (2017), p. 20170395.[118]

Horst, R., and Thoai, N. V.

DC Programming: Overview.

Journal of OptimizationTheory and Applications 103 , 1 (1999), pp. 1–43.[119]

Hu, X., and Ye, Z.

Generalized quantum entropy.

Journal of Mathematical Physics47 , 2 (2006), p. 023502.[120]

Iwamoto, M., and Shikata, J.

Revisiting conditional Rényi entropies and generaliz-ing Shannon’s bounds in information theoretically secure encryption?

Technical report,Cryptology ePrint Archive 440 (2013).[121]

Jamiołkowski, A.

Linear transformations which preserve trace and positive semideﬁ-niteness of operators.

Reports on Mathematical Physics 3 , 4 (1972), pp. 275–278.[122]

Kaszlikowski, D., Kwek, L. C., Chen, J.-L., Zukowski, M., and Oh, C. H.

Clauser-horne inequality for three-state systems.

Physical Review A 65 (2002), p. 032118.[123]

Katta, M.

Linear Programming.

John Wiley and Sons (1983), p. 482.[124]

Khachiyan, L., Boros, E., Borys, K., Elbassioni, K., and Gurvich, V.

Gen-erating all vertices of a polyhedron is hard.

Discrete and Computational Geometry , 1(2008), pp. 174–190.

IBLIOGRAPHY [125]

Kim, J. S.

Tsallis entropy and general polygamy of multi-party quantum entanglementin arbitrary dimensions.

Physical Review A 94 (2016).[126]

Kochen, S., and Specker, E.

Logical structures arising in quantum theory. inAddison, J., L. Henkin, and A. Tarski (eds.), The theory of models, North-Holland,Amsterdam (1967), pp. 177–189.[127]

Kochen, S., and Specker, E.

The problem of hidden variables in quantum mechanics.

Indiana University Mathematics Journal 17 (1968), pp. 59–87.[128]

König, R., and Renner, R.

A de Finetti representation for ﬁnite symmetric quantumstates.

Journal of Mathematical Physics 46 (2005), p. 122108.[129]

Kraft, T., Designolle, S., Ritz, C., Brunner, N., Gühne, O., and Huber,M.

Quantum entanglement in the triangle network. arXiv:2002.03970 (2020).[130]

Krämer, L., and del Rio, L.

Operational locality in global theories.

PhilosophicalTransactions of the Royal Society of London A: Mathematical, Physical and EngineeringSciences 376 , 2123 (2018).[131]

Kripke, S. A.

Semantical considerations on modal logic.

Universal Logic: An Anthology (2012), pp. 197–208.[132]

Leifer, M. S., and Spekkens, R. W.

Towards a formulation of quantum theory as acausally neutral theory of Bayesian inference.

Physical Review A 88 (2013), p. 052130.[133]

Lewis, C. I.

A survey of symbolic logic.

University of California Press (1918).[134]

Lewis, C. I., Langford, C. H., and Lamprecht, P.

Symbolic logic.

Dover Publi-cations New York (1959).[135]

Lieb, E., and Ruskai, M.

Proof of the strong subadditivity of quantum-mechanicalentropy.

Journal of Mathematical Physics 14 (1973), pp. 1938–1941.[136]

Linden, N., Mosonyi, M., and Winter, A.

The structure of rényi entropic inequali-ties.

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences469 , 2158 (2013), p. 20120737.[137]

Lloyd, S., Maccone, L., Garcia-Patron, R., Giovannetti, V., and Shikano,Y.

Quantum mechanics of time travel through post-selected teleportation.

PhysicalReview D 84 (2011), p. 025007.[138]

Lloyd, S., Maccone, L., Garcia-Patron, R., Giovannetti, V., Shikano, Y.,Pirandola, S., Rozema, L. A., Darabi, A., Soudagar, Y., Shalm, L. K., andSteinberg, A. M.

Closed timelike curves via postselection: Theory and experimentaltest of consistency.

Physical Review Letters 106 (2011), p. 040403.[139]

Ludwig, G.

Versuch einer axiomatischen grundlegung der quantenmechanik und allge-meinerer physikalischer theorien.

Zeitschrift für Physik 181 (1964), pp. 233–260.

IBLIOGRAPHY [140]

Ludwig, G.

Attempt of an axiomatic foundation of quantum mechanics and more generaltheories II.

Communications in Mathematical Physics 4 , 5 (1967), pp. 331–348.[141]

Ludwig, G.

Attempt of an axiomatic foundation of quantum mechanics and more generaltheories III.

Communications in Mathematical Physics 9 (1968), pp. 1–12.[142]

Martin, R. K.

Large scale linear and integer optimization: A uniﬁed approach.

SpringerScience and Business Media (1999).[143]

Masanes, L.

Tight Bell inequality for d-outcome measurements correlations.

Quantuminformation and computation 3 (2002), p. 345.[144]

Masanes, L., Acin, A., and Gisin, N.

General properties of nonsignaling theories.

Physical Review A 73 (2006), p. 012112.[145]

Mayers, D., and Yao, A.

Quantum cryptography with imperfect apparatus.

Pro-ceedings of the 39th Annual Symposium on Foundations of Computer Science (FOCS-98) (1998), pp. 503–509.[146]

Megidish, E., Halevy, A., Shacham, T., Dvir, T., Dovrat, L., and Eisenberg,H. S.

Entanglement swapping between photons that have never coexisted.

PhysicalReview Letters 110 (2013), p. 210403.[147]

Mermin, N. D.

Simple uniﬁed form for the major no-hidden-variables theorems.

PhysicalReview Letters 65 (1990), pp. 3373–3376.[148]

Müller-Lennert, M., Dupuis, F., Szehr, O., Fehr, S., and Tomamichel, M.

On quantum rényi entropies: A new generalization and some properties.

Journal ofMathematical Physics 54 , 12 (2013), p. 122203.[149]

Navascues, M., and Wolfe, E.

The inﬂation technique completely solves the classicalinference problem. arxiv:1707.06476 (2017).[150]

Neal, R. M.

On deducing conditional independence from d-separation in causal graphswith feedback (research note).

Journal of Artiﬁcial Intelligence Research 12 (2000), pp.87–91.[151]

Nielsen, M. A., and Chuang, I. L.

Quantum Computation and Quantum Informa-tion: 10th Anniversary Edition.

Cambridge University Press (2011).[152]

Nurgalieva, N., and del Rio, L.

Inadequacy of modal logic in quantum settings.

InProceedings QPL 2018, EPTCS 287 (2019), pp. 267–297.[153]

Oreshkov, O.

Time-delocalized quantum subsystems and operations: on the existenceof processes with indeﬁnite causal structure in quantum mechanics.

Quantum 3 (2019),p. 206.[154]

Oreshkov, O., Costa, F., and Brukner, Č.

Quantum correlations with no causalorder.

Nature Communications 3 (2012), p. 1092.

IBLIOGRAPHY [155]

Oreshkov, O., and Giarmatzi, C.

Causal and causally separable processes.

NewJournal of Physics 18 , 9 (2016), p. 93020.[156]

Paunković, N., and Vojinović, M.

Causal orders, quantum circuits and spacetime:distinguishing between deﬁnite and superposed causal orders.

Quantum 4 (2020), p. 275.[157]

Pawłowski, M., Paterek, T., Kaszlikowski, D., Scarani, V., Winter, A., andZukowski, M.

Information causality as a physical principle.

Nature 461 , 7267 (2009),pp. 1101–1104.[158]

Pearl, J.

Causal diagrams for empirical research.

Biometrika 82 , 4 (1995), p. 669–688.[159]

Pearl, J.

On the testability of causal models with latent and instrumental variables.

UAI’95: Proceedings of the Eleventh conference on Uncertainty in artiﬁcial intelligence (1995).[160]

Pearl, J.

Causality: Models, reasoning, and inference.

Second edition, CambridgeUniversity Press (2009).[161]

Pearl, J., and Dechter, R.

Identifying independencies in causal graphs with feed-back.

UAI’96: Proceedings of the Twelfth international conference on Uncertainty inartiﬁcial intelligence (2013).[162]

Peres, A.

Incompatible results of quantum measurements.

Physics Letters A 151 , 3(1990), pp. 107–108.[163]

Petz, D., and Virosztek, D.

Some inequalities for quantum Tsallis entropy relatedto the strong subadditivity.

Mathematical Inequalities and Applications 18 (2014), p. 555.[164]

Pienaar, J.

A time-reversible quantum causal model. arxiv:1902.00129 (2019).[165]

Pienaar, J.

Quantum causal models via quantum bayesianism.

Physical Review A 101 (2020), p. 012104.[166]

Pienaar, J., and Brukner, Č.

A graph-separation theorem for quantum causalmodels.

New Journal of Physics 17 , 7 (2015), p. 073020.[167]

Pirandola, S., Andersen, U. L., Banchi, L., Berta, M., Bunandar, D., Col-beck, R., Englund, D., Gehring, T., Lupo, C., Ottaviani, C., Pereira, J.,Razavi, M., Shaari, J. S., Tomamichel, M., Usenko, V. C., Vallone, G., Vil-loresi, P., and Wallden, P.

Advances in quantum cryptography. arxiv:1906.01645 (2019).[168]

Pironio, S.

Lifting Bell inequalities.

Journal of Mathematical Physics 46 (2005), p.062112.[169]

Pironio, S., Acín, A., Massar, S., Boyer de la Giroday, A., Matsukevich, D.,Maunz, P., Olmschenk, S., Hayes, D., Luo, L., A Manning, T., and Monroe,C.

Random numbers certiﬁed by Bell’s theorem.

Nature 464 (2010), pp. 1021–4.

IBLIOGRAPHY [170]

Pitowski, I.

Quantum probability – quantum logic.

Springer-Verlag Berlin Heidelberg321 (1989).[171]

Popescu, S., and Rohrlich, D.

Quantum nonlocality as an axiom.

Foundations ofPhysics 24 , 3 (1994), pp. 379–385.[172]

Portmann, C., Matt, C., Maurer, U., Renner, R., and Tackmann, B.

CausalBoxes: Quantum Information-Processing Systems Closed under Composition.

IEEETransactions on Information Theory 63 , 5 (2017), pp. 3277–3305.[173]

Preskill, J.

Quantum shannon theory. arXiv:1604.07450 (2016).[174]

Procopio, L. M., Moqanaki, A., Araújo, M., Costa, F., Alonso Calafell,I., Dowd, E. G., Hamel, D. R., Rozema, L. A., Brukner, Č., and Walther,P.

Experimental superposition of orders of quantum gates.

Nature Communications 6 (2015), p. 7913.[175]

Pusey, M. F.

An inconsistent friend.

Nature Physics 14 , 10 (2018), pp. 977–978.[176]

Pusey, M. F., and Leifer, M. S.

Logical pre- and post-selection paradoxes are proofsof contextuality.

In Proceedings QPL 2015, EPTCS 195 (2015), pp. 295–306.[177]

Reichenbach, H.

The direction of time.

Dover Publications (1956).[178]

Renner, R.

Security of quantum key distribution.

PhD Dissertation, ETH Zürich.arxiv:quant-ph/0512258 (2006).[179]

Renou, M.-O., Bäumer, E., Boreiri, S., Brunner, N., Gisin, N., and Beigi,S.

Genuine quantum nonlocality in the triangle network.

Physical Review Letters 123 (2019), p. 140401.[180]

Rényi, A.

On measures of information and entropy.

Proceedings of the 4th BerkeleySymposium on Mathematics, Statistics and Probability, University of California Press (1961), pp. 547–561.[181]

Richardson, T.

A discovery algorithm for directed cyclic graphs.

Proceedings of theTwelfth International Conference on Uncertainty in Artiﬁcial Intelligence (1996), pp.454–461.[182]

Ried, K., Agnew, M., Vermeyden, L., Janzing, D., Spekkens, R. W., andResch, K. J.

A quantum advantage for inferring causal structure.

Nature Physics 11 ,5 (2015), pp. 414–420.[183]

Rosset, D., Gisin, N., and Wolfe, E.

Universal bound on the cardinality of localhidden variables in networks.

Quantum Information and Computation 18 (2018), pp.0910–0926.[184]

Rowe, N.

Why there’s so little good evidence that ﬁscal (or monetary) policy works (on-line), (2009). https://worthwhile.typepad.com/worthwhile_canadian_initi/2009/01/why-theres-so-little-good-evidence-that-fiscal-or-monetary-policy-works.html . IBLIOGRAPHY [185]

Rubino, G., Rozema, L. A., Feix, A., Araújo, M., Zeuner, J. M., Procopio,L. M., Brukner, Č., and Walther, P.

Experimental veriﬁcation of an indeﬁnitecausal order.

Science Advances 3 , 3 (2017).[186]

Salazar, R., Kamon, M., Goyeneche, D., Horodecki, K., Saha, D., Ra-manathan, R., and Horodecki, P.

A no-go theorem for device-independent securityin relativistic causal theories. arxiv:1712.01030 (2020).[187]

Scheines, R.

An introduction to causal inference.

In McKim and Turner (eds.) (1997),pp. 185–99.[188]

Schrijver, A.

Theory of linear and integer programming.

John Wiley and Sons, Inc. (1986).[189]

Segal, I. E.

Postulates for general quantum mechanics.

Annals of Mathematics 48 , 4(1947), pp. 930–948.[190]

Shalm, L. K., Meyer-Scott, E., Christensen, B. G., Bierhorst, P., Wayne,M. A., Stevens, M. J., Gerrits, T., Glancy, S., Hamel, D. R., Allman,M. S., Coakley, K. J., Dyer, S. D., Hodge, C., Lita, A. E., Verma, V. B.,Lambrocco, C., Tortorici, E., Migdall, A. L., Zhang, Y., Kumor, D. R.,Farr, W. H., Marsili, F., Shaw, M. D., Stern, J. A., Abellán, C., Amaya,W., Pruneri, V., Jennewein, T., Mitchell, M. W., Kwiat, P. G., Bienfang,J. C., Mirin, R. P., Knill, E., and Nam, S. W.

Strong Loophole-Free Test of LocalRealism.

Physical Review Letters 115 , 25 (2015), p. 250402.[191]

Shannon, C. E.

A mathematical theory of communication.

The Bell System TechnicalJournal 27 , 3 (1948), pp. 379–423.[192]

Shor, P. W.

Algorithms for quantum computation: discrete logarithms and factoring.

Proceedings 35th Annual Symposium on Foundations of Computer Science (1994), pp.124–134.[193]

Short, A. J., Popescu, S., and Gisin, N.

Entanglement swapping for generalizednonlocal correlations.

Physical Review A 73 , 1 (2006), p. 012101.[194]

Skrzypczyk, P., Brunner, N., and Popescu, S.

Emergence of quantum correlationsfrom non-locality swapping.

Physical Review Letters 102 (2009), p. 110402.[195]

Spekkens, R. W.

Contextuality for preparations, transformations, and unsharp mea-surements.

Physical Review A 71 , 5 (2005), p. 052108.[196]

Spekkens, R. W.

Evidence for the epistemic view of quantum states: A toy theory.

Physical Review A 75 (2007), p. 032110.[197]

Sudbery, A.

Single-world theory of the extended wigner’s friend experiment.

Founda-tions of Physics 47 , 5 (2017), pp. 658–669.[198]

Sudbery, A.

The hidden assumptions of Frauchiger and Renner. arxiv:1905.13248 (2019).

IBLIOGRAPHY [199]

Svetlichny, G.

Time Travel: Deutsch vs. Teleportation.

International Journal ofTheoretical Physics 50 , 12 (2011), pp. 3903–3914.[200]

Tomamichel, M., Colbeck, R., and Renner, R.

A fully quantum asymptoticequipartition property.

IEEE Transactions on Information Theory 55 , 12 (2009), pp.5840–5847.[201]

Tsallis, C.

Possible generalization of Boltzmann-Gibbs statistics.

Journal of StatisticalPhysics 52 , 1 (1988), pp. 479–487.[202]

Tsirelson, B.

Some results and problems on quantum Bell-type inequalities.

HadronicJournal Supplement (1993), pp. 329–345.[203]

Tsirelson, B.

Some results and problems on quantum Bell-type inequalities.

HadronicJournal Supplement 8 (1993), pp. 329–345.[204]

Van Himbeeck, T., Bohr Brask, J., Pironio, S., Ramanathan, R., Sainz,A. B., and Wolfe, E.

Quantum violations in the Instrumental scenario and theirrelations to the Bell scenario.

Quantum 3 (2019), p. 186.[205] van Stockum, W. J.

The gravitational feild of a distribution of particles rotatingabout an axis of symmetry.

Proceedings of the Royal Society of Edinburgh 57 (1937), pp.135–154.[206]

Verma, T., and Pearl, J.

Causal networks: Semantics and expressiveness.

Proceed-ings of the Fourth Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI ’88) (1990), pp. 69–78.[207]

Vilasini, V., and Colbeck, R.

Analyzing causal structures using Tsallis entropies.

Physical Review A 100 (2019), p. 062108.[208]

Vilasini, V., and Colbeck, R.

Limitations of entropic inequalities for detectingnonclassicality in the postselected Bell causal structure.

Physical Review Research 2 (2020), p. 033096.[209]

Vilasini, V., Nurgalieva, N., and del Rio, L.

Multi-agent paradoxes beyondquantum theory.

New Journal of Physics 21 , 11 (2019), p. 113028.[210]

Vilasini, V., Portmann, C., and del Rio, L.

Composable security in relativisticquantum cryptography.

New Journal of Physics 21 , 4 (2019), p. 043057.[211]

Vitter, J. S., Larmore, L., Leighton, T., and Raz, R.

Exponential separation ofquantum and classical communication complexity.

Proceedings of the Thirty-First AnnualACM Symposium on Theory of Computing (1999), pp. 358–367.[212]

Von Neumann, J.

Mathematical foundations of quantum mechanics.

Princeton uni-versity press (1955).[213] von Neumann, J., Wheeler, N. A., and Beyer, R. T.

Mathematical foundationsof quantum mechanics.

Princeton University Press (1932).

IBLIOGRAPHY [214]

Wajs, M., Kurzynski, P., and Kaszlikowski, D.

Information-theoretic Bell in-equalities based on Tsallis entropy.

Physical Review A 91 (2015), p. 012114.[215]

Weilenmann, M., and Colbeck, R.

Inability of the entropy vector method to certifynonclassicality in linelike causal structures.

Physical Review A 94 (2016), p. 042112.[216]

Weilenmann, M., and Colbeck, R.

Analysing causal structures with entropy.

Pro-ceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sci-ences 473 , 2207 (2017).[217]

Weilenmann, M., and Colbeck, R.

Non-Shannon inequalities in the entropy vectorapproach to causal structures.

Quantum 2 (2018), p. 57.[218]

Weilenmann, M., and Colbeck, R.

Analysing causal structures in generalised prob-abilistic theories.

Quantum 4 (2020), p. 236.[219]

Weilenmann, M., and Colbeck, R.

Self-testing of physical theories, or, is quantumtheory optimal with respect to some information-processing task?

Physical Review Letters125 (2020), p. 060406.[220]

Wheeler, J. A.

Assessment of Everett’s “relative state” formulation of quantum theory.

Reviews of Modern Physics 29 (1957), pp. 463–465.[221]

Wigner, E. P.

Remarks on the mind-body question.

The Scientist Speculates, Heineman (1961), Ed. I. J. Good.[222]

Williams, H. P.

Fourier’s method of linear programming and its dual.

The AmericanMathematical Monthly 93 , 9 (1986), pp. 681–695.[223]

Wolfe, E., Schmid, D., Sainz, A. B., Kunjwal, R., and Spekkens, R. W.

Quantifying Bell: the resource theory of nonclassicality of common-cause boxes.

Quantum4 (2020), p. 280.[224]

Wolfe, E., Spekkens, R. W., and Fritz, T.

The inﬂation technique for causalinference with latent variables.

Journal of Causal Inference 7 (2019).[225]

Wood, C. J., and Spekkens, R. W.

The lesson of causal discovery algorithmsfor quantum correlations: causal explanations of Bell-inequality violations require ﬁne-tuning.

New Journal of Physics 17 , 3 (2015), p. 33002.[226]

Yeung, R. W.

A framework for linear information inequalities.

IEEE Transactions onInformation Theory 43 , 6 (1997), pp. 1924–1934.[227]

Zhalama, Zhang, J., and Mayer, W.

Weakening faithfulness: some heuristic causaldiscovery algorithms.

International Journal of Data Science and Analytics 3 , 2 (2017),pp. 93–104.[228]

Zhang, Z., and Yeung, R. W.

A non-Shannon-type conditional inequality of infor-mation quantities.

IEEE Trans. Information Theory 43 (1997), pp. 1982–1986.

IBLIOGRAPHY [229]

Zukowski, M., Kaszlikowski, D., Baturo, A., and Larsson, J.-Å.

Strengtheningthe Bell theorem: conditions to falsify local realism in an experiment. arXiv:quant-ph/9910058 (1999).[230]

Zukowski, M., Zeilinger, A., Horne, M. A., and Ekert, A. K. “Event-ready-detectors” Bell experiment via entanglement swapping.

Physical Review Letters 71 (1993),pp. 4287–4290.[231]

Zych, M., Costa, F., Pikovski, I., and Brukner, Č.

Bell’s theorem for temporalorder.

Nature Communications 10 , 1 (2019), p. 3772., 1 (2019), p. 3772.