[PDF] Two-qubit causal structures and the geometry of positive qubit-maps

Abstract

We study quantum causal inference in a set-up proposed by Ried et al. [Nat. Phys. 11, 414 (2015)] in which a common-cause scenario can be mixed with a cause-effect scenario, and for which it was found that quantum mechanics can bring an advantage in distinguishing the two scenarios: Whereas in classical statistics, interventions such as randomized trials are needed, a quantum observational scheme can be enough to detect the causal structure if the common cause results from a maximally entangled state. We analyze this setup in terms of the geometry of unital positive but not completely positive qubit-maps, arising from the mixture of qubit-channels and steering maps. We find the range of mixing parameters that can generate given correlations, and prove a quantum advantage in a more general setup, allowing arbitrary unital channels and initial states with fully mixed reduced states. This is achieved by establishing new bounds on signed singular values of sums of matrices. Based on the geometry, we quantify and identify the origin of the quantum advantage depending on the observed correlations, and discuss how additional constraints can lead to a unique solution of the problem.

Full PDF

TTwo-qubit causal structures and the geometry of positivequbit-maps

Jonas Kübler and Daniel Braun

Institut für theoretische Physik, Universität Tübingen, 72076 Tübingen, Germany

Abstract

We study quantum causal inference in a set-up proposed by Ried et al. [Nat. Phys. 11, 414 (2015)]in which a common-cause scenario can be mixed with a cause-eﬀect scenario, and for which it wasfound that quantum mechanics can bring an advantage in distinguishing the two scenarios: Whereasin classical statistics, interventions such as randomized trials are needed, a quantum observationalscheme can be enough to detect the causal structure if the common cause results from a maximallyentangled state.We analyze this setup in terms of the geometry of unital positive but not completely positive qubit-maps, arising from the mixture of qubit-channels and steering maps. We ﬁnd the range of mixingparameters that can generate given correlations, and prove a quantum advantage in a more generalsetup, allowing arbitrary unital channels and initial states with fully mixed reduced states. This isachieved by establishing new bounds on signed singular values of sums of matrices. Based on thegeometry, we quantify and identify the origin of the quantum advantage depending on the observedcorrelations, and discuss how additional constraints can lead to a unique solution of the problem. a r X i v : . [ qu a n t - ph ] M a r . INTRODUCTION Imagine a scenario where two experimenters, Alice and Bob, sit in two distinct labora-tories. At one point Alice opens the door of her laboratory, obtains a coin, checks whetherit shows heads or tails and puts it back out of the laboratory. Some time later also Bob ob-tains a coin and also he checks whether it shows heads or tails. This experiment is repeatedmany times (ideally: inﬁnitely many times) and after this they meet and analyze their jointoutcomes. Assuming their joint probability distribution entails correlations, there must besome underlying causal mechanism which causally connects their coins [1]. This could bean unobserved confounder (acting as a common-cause), and they actually measured twodistinct coins inﬂuenced by the confounder. Or it could be that Alice’s coin was propagatedby some mechanism to Bob’s laboratory, and hence they actually measured the same coin,with the consequence that manipulations of the coin by Alice can directly inﬂuence Bob’sresult (cause-eﬀect scenario). The task of Alice and Bob is to determine the underlyingcausal structure, i.e. to distinguish the two scenarios. This would be rather easy if Alicecould prepare her coin after the observation by her choice and then check whether thisinﬂuences the joint probability (so-called “interventionist scheme”). In the present scenario,however, we assume that this is not allowed (so-called “observational scheme”). All thatAlice and Bob have are therefore the given correlations, and from those alone, in generalthey cannot solve this task without additional assumptions. Ried et al. [2] showed that ina similar quantum scenario involving qubits the above task can actually be accomplished incertain cases even in an observational scheme (see below for a discussion of how the idea ofan observational scheme can be generalized to quantum mechanics).In the present work we consider the same setup as in [2], and allow arbitrary convexcombinations of the two scenarios: The common-cause scenario is realized with probability p , the cause-eﬀect scenario with probability − p . Our main result are statements about theranges of the parameter p for which observed correlations can be explained with either oneof the scenarios, or both. For this, we cast the problem in the language of aﬃne representa-tions of unital positive qubit maps [3] in which all the information is encoded in a × realmatrix, as is standard in quantum information theory for completely positive unital qubitmaps [4]. 2he paper is structured as follows: In section II we introduce causal models for classicalrandom variables and for quantum systems. Therein we deﬁne what we consider a quantumobservational scheme . Section III introduces the mathematical framework of ellipsoidalrepresentations of qubit quantum-channels and qubit steering-maps. In section IV we deﬁneour problem mathematically and prove the main results, which we then comment in the lastsection V. II. CAUSAL INFERENCE: CLASSICAL VERSUS QUANTUMA. Classical causal inference

At the heart of a classical causal model is a set of random variables X , X , ..., X N . Theobservation of a speciﬁc value of a variable, X i = x i , is associated with an event . Correla-tions between events hint at some kind of causal mechanism that links the events [1] . Sucha mechanism can be a deterministic law as for example x i = f ( x j ) or can be a probabilisticprocess described by conditional probabilities P ( x i | x j ) , i.e. the probability to ﬁnd X i = x i given X j = x j was observed. The causal mechanism may not be merely a direct causalinﬂuence from one observed event on the other, but may be due to common causes that leadwith a certain probability to both events — or a mixture between both scenarios. Hence,by merely analysing correlations P ( x , x , . . . , x n ) , i.e. the joint probability distribution ofall events, one can, in general, without prior knowledge of the data generating process , notuniquely determine the causal mechanism that leads to the observed correlations (purely observational scheme). To remedy this, an intervention is often necessary, where the valueof a variable X i whose causal inﬂuence one wants to investigate, is set by an experimentalistto diﬀerent values, trying to see whether this changes the statistics of the remaining events( interventionist scheme). One strategy for reducing the inﬂuence of other, unknown factors,is to randomize the samples. This is for example a typical approach in clinical studies, whereone group of randomly selected probands receives a treatment whose eﬃciency one wantsto investigate, and a randomly selected control group receives a placebo. If the percentageof cured people in the ﬁrst group is signiﬁcantly larger than in the second group, one canbelieve in a positive causal eﬀect of the treatment. The probabilities obtained in this inter-3entionist scheme are so-called “do-probabilities” (or “causal conditional probabilities”) [5]: P ( x i | do ( x j )) is the probability to ﬁnd X i = x i if an experimentalist intervened and set thevalue of X j to the value x j . This is diﬀerent from P ( x i | x j ) , as a possible causal inﬂuencefrom some other unknown event on X j = x j is cut, i.e. one deliberately modiﬁes the under-lying causal structure for better understanding a part of it. If X j = x j was the only directcause of X i = x i then P ( x i | x j ) = P ( x i | do ( x j )) . If instead the event X i = x i was a cause of X j = x j , then intervening on X j cannot change X i : P ( x i ) = P ( x i | do ( x j )) = P ( x i | do ( ¯ x j )) ,where ¯ x j is a value diﬀerent from x j . If the correlation between X i = x i and X j = x j is purelybecause of a common cause, then no intervenion on X i or X j will change the probability toﬁnd a given value of the other: P ( x i ) = P ( x i | do ( x j )) for all x j , and P ( x j ) = P ( x j | do ( x i )) for all x i . Observing these do-probabilities one can hence draw conclusions about the causalinﬂuences behind the correlations observed in the occurence of X i = x i and X j = x j .In practice, direct causation in one direction is often excluded by time-ordering and neednot to be investigated. For example, when doubting that one can conclude that smokingcauses lung cancer from the observed correlations between these two events, it does not makesense to claim that having lung cancer causes smoking, as usually smoking comes before de-veloping lung cancer. But even dividing a large number of people randomly into two groupsand forcing one of them to smoke and the other not to smoke in order to ﬁnd out if thereis a common cause for both would be ethically inacceptable. The needed do-probabilitiescan therefore not always be obtained by experiment. Interestingly, the causal-probabilitycalculus allows one in certain cases, depending notably on the graph structure, to calculate do-probabilities from observed correlations without having to do the intervention. Inversely,apart from only predicting the conditional probabilities for a random variable say X i giventhe observation of X j = x j , denoted as P ( x i | x j ) , a causal model can also predict the do-probabilities, i.e. the distribution of X i if one would intervene on the variable X j and set itsvalue to x j . This is crucial for deriving informed recommendations for actions targeted atmodifying certain probabilities, e.g. recommending not to smoke in order to reduce the riskfor cancer.The structure of a causal model can be depicted by a graph. Each random variableis represented by a vertex of the graph. Causal connections are represented by directed4rrows and imply that signaling along the direction of the arrow is possible. In a classicalcausal model it is assumed that events happen at speciﬁc points in space and time, thereforebidirectional signaling is not possible as it would imply signaling backward in time. Hencethe graph cannot contain cycles and is therefore a directed acyclic graph (DAG) [5], see FIG.1.The set of parents P A j of the random variable X j is deﬁned as the set of all variables thathave an immediate arrow pointing towards X j , and pa j denotes a possible value of P A j .The causal model is then deﬁned through its graph with random variables X i at its verticesand the weights P ( x j | pa j ) of each edge, i.e. the probabilities that X j = x j happens underthe condition that P a j = pa j occurred. The model generates the entire correlation functionaccording to P ( x , . . . , x n ) = n (cid:89) j =1 P ( x j | pa j ) , (1)which is referred to as causal Markov condition [5]. When all P ( x , . . . , x n ) are given, thenall conditional probabilities follow, hence all P ( x j | pa j ) that appear in a given graph, but ingeneral not all correlations nor all P ( x j | pa j ) are known (see below). The causal inferenceprobleme consists in ﬁnding a graph structure that allows one to satisfy eq.(1) for given data P ( x , . . . , x n ) and all known P ( x j | pa j ) , where the unknown P ( x j | pa j ) can be consideredﬁt-parameters in case of incomplete data. With access to the full joint probability distribu-tion, the causal inference only needs to determine the graph. In practice, however, one oftenhas only incomplete data: as long as a common cause has not been determined yet, onewill not have data involving correlations of the corresponding variable. For example, onemay have strong correlations between getting lung cancer (random variable X ∈ { , } )and smoking (random variable X ∈ { , } ), but if there is a unknown common cause X for both, one typically has no information about P ( x , x , x ) : One will only start collect-ing data about correlations between the presence of a certain gene, say, and the habit ofsmoking or developing lung cancer once one suspects that gene to be a cause for at leastone of these. In this case P ( x | x ) and P ( x | x ) are ﬁt parameters to the model as well.The possibility of extending a causal model through inclusion of unknown random variablesis one reason why in general there is no unique solution to the causal inference problembased on correlations alone. Interventions on X i make it possible, on the other hand, to cut X i from its parents and hence eliminate unknown causes one by one for all random variables.5IG. 1: Simple DAG in a four party scenario. The parental structure isPA A = {} , PA B = { A } , PA C = { A, B } , PA D = { C } . According to the causal Markovcondition, eq. (2), the probability distribution then factorizes as P ( a, b, c, d | i A , i B , i C , i D ) = P ( d | c, i D ) P ( c | a, b, i C ) P ( b | a, i B ) P ( a | i A ) .Once a causal model is known, one can calculate all distributions P ( x , ..., x n | i , ..., i n ) = n (cid:89) j =1 P ( x j | pa j , i j ) , (2)for all possible combinations of interventions and observations, where the i j are the val-ues of the intervention variable I j for the event X j , i j = idle or i j = do ( x j ) . Here, P ( x j | pa j , i j = do (˜ x j )) = δ x j , ˜ x j reﬂects that an intervention on X j deterministically sets itsvalue, independently of the observed values of its causal parents. If I j = idle then the valueof X j only depends on its causal parents P A j , i.e. P ( x j |{ x i } i (cid:54) = j , i j = idle ) = P ( x j | pa j , i j = idle ) .The ﬁeld of causal discovery or causal inference aims at providing methods to determinethe causal model, that is the DAG and the joint-probability distributions entering (1) fora given scenario. Diﬀerent combinations of the I j correspond to diﬀerent strategies. If allthe interventions are set to idle, and hence all the outcomes are determined by the causalparents, one has the purely observational approach. In multivariate scenarios, where morethan two random variables are involved, the observation of the joint probability distributionalone can still contain hints of the causal structure based on conditional independencies [5].Nevertheless, in the bivariate scenario, i.e. when only two random variables are involved,classical correlations obtained by observations do not comprise any causal information. Onlyif assumptions for example on the noise distribution are taken a priori, information on thecausal model can be obtained from observational data [6].6 . Quantum causal inference The notion of causal models does not easily translate to quantum mechanics. The mainproblem is that in quantum systems not all observables can have predeﬁned values indepen-dent of observation. Similiar to an operational formulation of quantum mechanics [7], the process matrix formalism was introduced [8] and a quantum version of an event deﬁned.In [9] this is reviewed for the purpose of causal models. In place of the random variablesin the classical case there are local laboratories . Within a process each laboratory obtainsa quantum system as input and produces a quantum system as output. A quantum eventcorresponds to information which is obtained within a laboratory and is associated with a completely positive (CP) map mapping the input Hilbert space to the output Hilbert spaceof the laboratory. The possible events depend on the choice of instrument . An instrumentis a set of CP maps that sum to a completely positive trace preserving (CPTP) map. Forexample an instrument can be a projective measurement in a speciﬁc basis, with the eventsthe possible outcomes. The possibility to choose diﬀerent instruments mirrors the possibil-ity of interventions in the classical case [9, 3.3]. The whole information about mechanisms,which are represented as CPTP maps, and the causal connections is contained in a so-called process matrix . Besides its analogy for a classical causal model, the process framework goesbeyond classical causal structures as it does not assume such a ﬁxed causal structure [8].This recently stirred a lot of research [10–13]. For a more detailed introduction we refer thereader especially to reference [9] where a comprehensive description is provided.The analogue of causal inference in the classical case is the reconstruction of a processmatrix. This can be done using informationally complete sets of instruments, theoret-ically described in [9, 4.1] and experimentally implemented in [2]. Deﬁning a quantumobservational scheme in analogy to the classical one is not straight forward. In general aquantum measurement destroys much of the states’ character and hence can almost neverbe considered a passive observation. For example if the system was initially in a purestate | ψ (cid:105) but one measures in a basis such that | ψ (cid:105) is not an eigenstate of the projectorsonto the basis states, then the measurement truly changes the state of the system and theoriginal state is not reproduced in the statistical average. In [9, sect. 5] an observationalscheme is simply deﬁned as projective measurements in a ﬁxed basis, in particular without7ssumptions about the incoming state of a laboratory and thus without assumptions aboutthe underlying process. Another possibility to deﬁne an observational scheme is based onthe idea that in the classical world observations reveal pre-existing properties of physicalsystems and that quantum observations should reproduce this. As a consequence, if onemixes the post-measurement states with the probabilities of the corresponding measurementoutcomes, one should obtain the same state as before the measurement. That is ensured ifand only if operations that do not destroy the quantum character of the state are allowed,as coherences cannot be restored by averaging. Ried et al. [2] formalized this notion as“informational symmetry”, but considered only preservation of local states. For the specialcase of locally completely mixed states, they showed that projective measurements in ar-bitrary bases possess informational symmetry. This deﬁnition of a quantum observationalscheme is problematic due to two reasons: Firstly, the allowed class of instruments dependson the incoming state, i.e. one can only apply projective measurements that are diagonalin the same basis as the state itself. This is at variance with the typical motivation for anobservational scheme, namely that the instruments are restricted a-priori due to practicalreasons. Moreoever, having measurements depend on the state requires prior knowledgeabout the state of the system, but ﬁnding out the state of the system is part of the causalinference (e.g.: are the correlations based on a state shared by Alice and Bob?). Hence, ingeneral one cannot assume suﬃcient knowledge of the state for restricting the measurementssuch that they do not destroy coherences.Secondly, the deﬁnition is unnaturally restrictive as it only considers the local state andnot the global state. For example if Alice and Bob share a singlet state | ψ (cid:105) = | (cid:105)−| (cid:105)√ ,then both local states are completely mixed. Hence according to the informational symme-try, they are allowed to perform projective measurements in arbitrary bases. If Alice andBob now both measure in the computational basis, they will each obtain both outcomeswith probability / and their local states will remain invariant in the statistical average ρ (cid:48) A = ρ A = = ρ (cid:48) B = ρ B . However, the global state does not remain intact. The post-measurement state is given as ρ (cid:48) AB = ( | (cid:105)(cid:104) | + | (cid:105)(cid:104) | ) which is not even entangledanymore. But even deﬁning a “global informational symmetry”, i.e. requiring the globalstate to remain invariant, does not settle the issue in a convenient way, as this would notallow any local measurements of Alice and Bob.8 rbitraryinstruments arbitraryprojections ﬁxed basisprojection signaling causalinferenceQ-interventionist √ √ √ √ √ Active Q-observational X √ √ √ ( √ ) Passive Q-observational X X √ X X TABLE I:

Quantum schemes for causal inference : An overview of instrumentsallowed within diﬀerent quantum schemes deﬁned in this section. √ indicatesallowed/possible, X indicates not allowed/impossible. In the active quantumobservational scheme signaling is possible in principle. However, in the scenariosconsidered in this work signaling is not possible, and still causal inference can besuccessful. The potential of causal inference in the active quantum-observational schemeis discussed in the main part of this paper. In the passive quantum-observational schemeno more causal inference than classical is possible.Here we propose three diﬀerent schemes ranging from full quantum interventions over aquantum-observational scheme with the possibility of an active choice of measurements, toa passive quantum observational scheme in a ﬁxed basis that comes closest to the classicalobservational scheme.The deﬁnitions are based on restricting the allowed set of instruments. An instrumentis to be understood in the process-matrix context. In all three schemes the set of allowedinstruments is independent of the actual underlying processes, which is a reasonable assump-tion, since the motivation for causal inference comes from the fact that states or processesare not known in the ﬁrst place.

Quantum interventionist scheme:

Arbitrary instruments can be applied in locallaboratories. These include for example deterministic operations such as state prepa-rations or simply projective measurements. An appropriate choice of the instrumentsenables one to detect causal structure in arbitrary scenarios, i.e. to reconstruct theprocess matrix [9]. This scheme resembles most closely an interventionist scheme ina classical scenario but oﬀers additional quantum-mechanical possibilities of interven-tion. 9 ctive quantum-observational scheme:

Only projective measurements in arbi-trary orthogonal bases are allowed, but no post-processing of the state after the mea-surement. The latter request translates the idea of not intervening in the quantumrealm, as it is not possible to deterministically change the state by the experimenterschoice. Depending on the state and the instrument, the state may change during themeasurement, hence the scheme is invasive, but the diﬀerence to the classical obser-vational scheme arises solely from the possible destruction of quantum coherences.This is a quantum eﬀect without classical correspondence and hence opens up a newpossibility of deﬁning an observational scheme that has no classical analogue. Repet-itive application of the same measurement within a single run always gives the sameoutput. Furthermore, we allow projective measurements in diﬀerent bases in diﬀerentruns of the experiment. This freedom allows one to completely characterize the in-coming state.This scheme allows for signaling, i.e. there exist processes for which Alice’s choice ofinstrument changes the statistics that Bob observes. As an example consider the pro-cess, where Alice always obtains a qubit in the state | (cid:105) . She applies her instrument onit, and then the outcome is propagated to Bob by the identity channel. Bob measuresin the basis where | (cid:105) is an eigenstate. If Alice measured in the same basis as Bob,then both of them deterministically obtain 1 as result. If Alice instead measures in thebasis (cid:110) |±(cid:105) = √ ( | (cid:105) ± | (cid:105) ) (cid:111) , then Bob would obtain 1 only with probability . Thisis considered as signaling according to the deﬁnition in [9]. Clearly, signaling presentsa direct quantum advantage for causal inference compared to a classical observationalscheme, and motivates the attribute “active” of the scheme. In the present work wefocus on this scheme, but exclude such a direct quantum advantage by considering ex-clusively unital channels and a completely mixed incoming state for Alice, as was donealso in [2]. It is then impossible for Alice to send a signal to Bob if her instrumentsare restricted to quantum observations, even if she is allowed to actively set her mea-surement basis. One might wonder whether the quantum-observational scheme can begeneralized to POVM measurements. However, these do not ﬁt into the framework ofinstruments that transmit an input state to an output state, as POVM measurementsdo not specify the post-measurement state.10 assive quantum-observational scheme: For the whole setup a ﬁxed basis isselected. Only projective measurements with respect to this basis are permitted, andit is forbidden to change the basis in diﬀerent runs of the experiment. This is also whatis used in [9] to obtain classical causal models as a limit of quantum causal models.Since the basis is ﬁxed independently of the underlying process, the measurementcan still be invasive in the sense that it can destroy coherences, and hence it is stillnot a pure observational scheme in the classical sense. Nevertheless, Alice cannotsignal to Bob here as she has no possibility of actively encoding information in thequantum state, regardless of the nature of the state, which motivates the name “passivequantum-observational scheme”. As without any change of basis it is impossible toexploit stronger-than-classical quantum correlations, this scheme comes closest to aclassical observational scheme. And due to the restriction to observing at most classicalcorrelations, it is not possible to infer anything more about the causal structure thanclassically possible.

III. AFFINE REPRESENTATION OF QUANTUM CHANNELS AND STEERINGMAPS

In this section we introduce the tools of quantum information theory that we need toanalyze the problem of causal inference in section IV.

A. Bloch-sphere representation of qubits

A qubit is a quantum system with a two-dimensional Hilbert space with basis statesdenoted as | (cid:105) and | (cid:105) . An arbitrary state of the qubit is described by a density operator ρ ,a positive linear operator with unit trace, ρ ≥ , tr [ ρ ] = 1 . Every single-qubit state can berepresented geometrically by its Bloch-vector r = tr [ ρ σ ] , with | r | ≤ as ρ = + r · σ , (3)where σ = ( σ , σ , σ ) T denotes the vector of Pauli matrices.11 . Channels A quantum channel E is a completely positive trace preserving map (CPTP map). Aquantum channel maps a density operator in the space of linear operators ρ ∈ L ( H ) onthe Hilbert space H to a density operator in the space of linear operators ρ (cid:48) ∈ L ( H (cid:48) ) on a(potentially diﬀerent) Hilbert space H (cid:48) . E : ρ → E ( ρ ) ≡ ρ (cid:48) , ρ, ρ (cid:48) ≥ , tr [ ρ ] = tr [ ρ (cid:48) ] = 1 . This formalism describes any physical dynamics of a quantum system. Every quantumchannel can be understood as the unitary evolution of the system coupled to an environ-ment [4]. The constraint of complete positivity can be understood the following way. Ifwe extend the map E with the identity operation of arbitrary dimension, the composedmap E ⊗ , which acts on a larger system, should still be positive. An example of a mapthat is positive but not completely positive is the transposition map, that, if extended to alarger system, maps entangled states to non-positive-semi-deﬁnite operators [3, chapter 11.1]. Geometrical representation of qubit maps

Every qubit channel (a quantum channel mapping a qubit state onto a qubit state) E canbe described completely by its action on the Bloch sphere, see [14–16] and is completelydescribed by the matrix Θ E mapping the 4D Bloch vector (1 , r ) , Θ E =  t E T E  , (4)where the upper left 1 ensures trace preservation. A state ρ described by its Bloch vector r is then mapped by the quantum channel E to the new state ρ (cid:48) with Bloch vector r (cid:48) = T E r + t E . A qubit channel is called unital if it leaves the completely mixed state invariant: E ( ρ mixed ) = ρ mixed , with ρ mixed = , i.e. r mixed = . For unital channels t E vanishes. The whole infor-mation is then contained in the 3x3 real matrix T E , which we refer to as correlation matrix of the channel. The matrix T (from now on we drop the index E ) can be expressed bywriting it in its signed singular value decomposition [15, eq. (9)], [3, eq. (10.78)] (see also12he appendix around equation (44)), T = R ηR . (5)Here, R and R are proper rotations (elements of the SO (3) group), corresponding tounitary channels, that is R i R Ti = with det( R i ) = 1 , and η = diag ( η , η , η ) is a realdiagonal matrix. This can be interpreted rather easily. A unital qubit channel maps theBloch sphere onto an ellipsoid, centered around the origin, that ﬁts inside the Bloch sphere.First the Bloch sphere is rotated by R than it is compressed along the coordinate axis byfactors η i . The resulting ellipsoid is then again rotated. Hence, apart from unitary freedomin the input and output, the unital quantum channel is completely characterized by its signedsingular values (SSV) [15, II.B]. The CPTP property gives restrictions to the allowed valuesof η ≡ ( η , η , η ) T . These are commonly known as the Fujiwara-Algoet conditions [14, 15] η ≥ | η + η | , − η ≥ | η − η | . (6)The allowed values for η lie inside a tetrahedron T CP (the index CP stands for completelypositive), T CP ≡ Conv (cid:0)(cid:8) v CP i (cid:9) i (cid:1) , (7)where Conv ( { x i } i ) ≡ { (cid:80) i p i x i | p i ≥ , (cid:80) i p i = 1 } denotes the convex hull of the set { x i } i and the vertices are deﬁned as, v CP = (1 , , T , v CP = ( − , − , T , v CP = ( − , , − T , v CP = (1 , − , − T . (8)For a more detailed discussion of qubit maps we refer the reader to chapter 10.7 of [3]. C. Steering

In quantum mechanics, measurement outcomes on two spatially separated partitions of acomposed quantum system can be highly correlated [17], and further the choice of measure-ment operator on one side can strongly inﬂuence or even determine the outcome on the other13ide [18], a phenomenon known as “steering”. Suppose Alice and Bob share the two qubitstate ρ AB . If Alice performs a measurement on it, leaving her qubit in the state ρ A then Bob’squbit is steered to the state ρ B proportional to the (unnormalized) state tr A [ ρ AB ( ρ A ⊗ )] [19, p.2]. This deﬁnes a positive linear trace preserving map S : ρ A → S ( ρ A ) = ρ B , called steering map , that depends on the state ρ AB .Steering maps have been intensely studied especially in terms of entanglement characteri-zation [19, 20]. In analogy to the treatment of qubit channels, we can associate an uniqueellipsoid inside the Bloch sphere with a two-qubit state, known as steering ellipsoid, thatencodes all the information about the bipartite state [19].Every bipartite two qubit state can be expanded in the Pauli basis as ρ AB = 14 (cid:88) µ,ν =0 Θ µν σ ν ⊗ σ µ , where Θ µν = tr [ ρ AB σ ν ⊗ σ µ ] . (9)Note that we deﬁned Θ to be the transposed of the one deﬁned in [19], since we want totreat steering from Alice to Bob. The matrix contains all the information about the bipartitestate and can be written as Θ =  a T b T S  , where a ( b ) denotes the Bloch vector of Alice’s (Bob’s) reduced state. T S is a 3x3 realorthogonal matrix and encodes all the information about the correlations, and we will referto it as correlation matrix of the steering map.In this work we only consider bipartite qubit states which have completely mixed reducedstates tr A [ ρ AB ] = tr B [ ρ AB ] = / or equivalently a = b = . In analogy to unital chan-nels we call such states unital two-qubit states and the corresponding maps unital steeringmaps . Up to local unitary operations on the two partitions, the correlation matrix T S ischaracterized by its signed singular values η , η , η . The allowed values of these are giventhrough the positivity constraint on the density operator ρ AB deﬁned up to local unitariesas (cf. equation (6) in [20]) ρ AB = 14 (cid:32) ⊗ + (cid:88) i =1 η i σ i ⊗ σ i (cid:33) . (10)14he positivity of ρ AB implies the conditions (the derivation is analogue to the derivation of(10)-(15) in [15]) η ≥ | η − η | , − η ≥ | η + η | . (11)These are the same as for unital qubit channels (eq. (6)) up to a sign ﬂip, and deﬁne thetetrahedron T CcP of unital completely co-positive trace preserving maps (CcPTP) [3, 15], T CcP ≡ Conv (cid:0)(cid:8) v CcP i (cid:9) i (cid:1) , (12)with the vertices v CcP = ( − , − , − T , v CcP = ( − , , T , v CcP = (1 , − , T , v CcP = (1 , , − T . (13)CcPTP maps are exactly CPTP maps with a preceding transposition map, i.e. for everysteering map S there exists a quantum channel E such that S = E ◦ T , where T is thetransposition map with respect to an arbitrary but ﬁxed basis (see e.g. [3]). D. Positive maps

We have seen that a quantum channel is a CPTP map and that a steering map is aCcPTP map. Both of them are necessarily positive maps. But are there positive maps thatare neither CcP nor CP? Or are there maps that are even both? This issue is nicely workedout in [3, chapter 11]. We shortly review this for unital qubit maps. Since we still dealwith linear maps, it is straightforward that also every unital positive one-qubit map can bedescribed by a × correlation matrix. Hence we can also analyze its SSV. The allowedSSV are inside the cube C deﬁned by [3, FIG.11.3] C ≡ { x | − ≤ x i ≤ for i = 1 , , } . (14)This is illustrated in FIG.2. Note again that we only treat unital maps.We see that there are positive maps which are neither CP nor CcP. According to the Størmer-Woronowicz theorem (see e.g. [3, p. 258]) every positive qubit map is decomposable , i.e. it15IG. 2:

Geometry of positive maps : For positive trace preserving single-qubit maps,the allowed signed singular values lie within a cube C deﬁned in (14). Quantum channelscorresponding to CPTP maps lie within the blue tetrahedron T CP deﬁned in (7), steeringmaps corresponding to CcPTP maps lie within the yellow tetrahedron T CcP deﬁned in (12).The maps with SSV inside the intersection of T CP and T CcP (green octahedron) are calledsuperpositive. These maps only produce classical correlations corresponding to separablestates or entanglement breaking channels, but can also be generated by mixtures ofquantum correlations.can be written as a convex combination of a CP and a CcP map. Maps that are both CPand CcP are called super positive (SP). The set of allowed SSV of the correlation matricesof these maps forms an octahedron (green region in FIG.2) given as O SP = Conv ( {± ˆ e i | i ∈ x, y, z } ) , (15)where ˆ e i denotes the unit vector along the i -axis. These correlations are generated byentanglement breaking quantum channels [21] and steering maps based on separable states[19]. When such classical correlations are observed one cannot infer anything about thecausal structure [2, p.10 of supplementary information].For higher dimensional systems things change. Already for three dimensional maps,i.e. qutrit maps, there exist positive maps, that cannot be represented as a convex combi-nation of a CP and a CcP map [3, chapter 11.1]. In the next section we discuss how muchinformation about causal inﬂuences we can obtain by looking only at the SSV related to thecorrelations Alice and Bob can observe in a bipartite experiment.16 V. CAUSAL EXPLANATION OF UNITAL POSITIVE MAPSA. Setting

We now tackle the problem of causal inference in the two-qubit scenario [2]. The settingis as follows. An experimenter, Alice, sits in her laboratory. She opens her door just longenough to obtain a qubit in a (locally) completely mixed state and closes the door again.She performs an projective measurement in any of the Pauli-states, records her outcome,opens her door again and puts the qubit in the now collapsed state outside. Apart fromthe qubit she has no way of interacting with the environment. Some time later anotherexperimenter, Bob, opens the door of his laboratory and obtains a qubit. Also he measuresin the eigenbasis of one of the Pauli matrices and records the outcome. They repeat thisprocedure a large (ideally: an inﬁnite) number of times. Then they meet and analyze theirjoint measurement outcomes. These deﬁne the probabilities P ( a, b | j, i ) for the outcomes a ∈ {− , } and b ∈ {− , } of Alice’s and Bob’s measurements, given they measured inthe eigenbasis of the j th and i th Pauli matrix, respectively. For the marginals we assume P ( a | j, i ) = (cid:80) b P ( a, b | j, i ) = 1 / ∀ a ∈ {− , } and accordingly for Bob. They are thus ableto deﬁne a correlation matrix M with elements M ij = 2 P ( b = 1 | j, i, a = 1) − (cid:104) σ j σ i (cid:105) , (16)where P ( b = 1 | j, i, a = 1) is the probability that Bob obtains outcome when measuring theobservable σ i , conditioned on Alice’s measurement of σ j with outcome , and (cid:104) σ j σ i (cid:105) denotesthe expectation value of the product of Alice’s σ j and Bob’s σ i measurement outcomes.The correlation matrix deﬁnes a unique positive trace preserving unital map M : ρ A (cid:55)→ ρ B .They are guaranteed one of the following three possibilities: either they measured the samequbit, which was propagated in terms of a unital quantum channel E from Alice to Bob; orthat they each measured one of the two qubits in a unital bipartite state ρ AB acting as acommon cause, and hence the correlations where caused by the corresponding steering map S ; or that the map from ρ A to ρ B is a probabilistic mixture where with probability p thesteering map S was realized and with probability (1 − p ) the quantum channel E , that is M = (1 − p ) E + p S , (17)with the “causality parameter” p ∈ [0 , . The task of Alice and Bob is now to ﬁnd the true17IG. 3: DAG:

The DAGs of our setting. On the left side with probability (1 − p ) aquantum channel E is realized, causing correlations between Alice ( A ) and Bob ( B ). Onthe right side, occuring with probability p , the correlations are caused by an unobservedsource C that outputs the state ρ AB generating correlations through the steering map S .value of p and possibly also the nature of S and E . In general there does not exist a uniquesolution and in this case they want to ﬁnd the values of p for which maps of the form (17)explain the observed correlations.As we mentioned in the previous section, every positive one qubit map is decomposable,so a possible explanation always exists. The decomposition (17) can be given a causalinterpretation, where E is considered to be a cause-eﬀect explanation of the correlations and S a common-cause.In the following subsections we give bounds on the causality parameter p and then considersome extremal cases. In subsection IV D we generalize a part of the work of Ried et al. [2]and see how additional assumptions on the nature of E and S can lead to a unique solution. B. Possible causal explanations

Deﬁnition IV.1 p -causality/ p -decomposability: A single qubit unital positive trace pre-serving map M is called p -causal/ p -decomposable with p ∈ [0 , , if it can be written as M = (1 − p ) E + p S , (18) with E ( S ) being a CPTP (CcPTP) unital qubit map. Eq. (18) is called a p -decomposition of M . In the following let

M, E, S denote the correlation matrices of M , E , S , and η M , η E , η S theSSV of M, E, S , respectively. We ﬁrst investigate for a ﬁxed p what the possible SSV of the18orrelation matrix of a map M are, such that M is p -causal. This leads to the followingtheorem: Theorem IV.1

Signed singular values of p -causal maps Let M be a positive unital trace preserving qubit map with associated SSV given by η M . Let p ∈ [0 , be ﬁxed. Then the following statement holds: M is p -causal ⇔ η M ∈ C p , (19) where C p = Conv (cid:0)(cid:8) (1 − p ) v CP i + p v CcP j | i, j ∈ { , , , } (cid:9)(cid:1) , (20) where the vertices v CP i of CP maps are given in (8) , and the vertices v CcP j of CcP maps in (13) . Proof. " ⇐ ": From (20) we see that η M ∈ C p ⇔ ∃ (cid:32) p ij ≥ , (cid:88) i,j =1 p ij = 1 (cid:33) : η M = (cid:88) i,j =1 p ij (cid:0) (1 − p ) v CP i + p v CcP j (cid:1) . Now deﬁne q i ≡ (cid:80) j p ij and r j ≡ (cid:80) i p ij . Clearly q i , r j ≥ and (cid:80) i q i = (cid:80) j r j = 1 . We canthen write η M = (1 − p ) (cid:88) i q i v CP i + p (cid:88) j r j v CcP j = (1 − p ) η E + p η S , with η E ≡ (cid:80) i q i v CP i ∈ T CP and η S ≡ (cid:80) j r j v CcP j ∈ T CcP . We herewith explicitly con-structed a p -decomposition of M where the correlation matrices of E and S have theirSSV-decomposition involving the same rotations as the SSV-decomposition of the correla-tion matrix of M ." ⇒ ": Let p be ﬁxed. Suppose that E and S are both extremal maps, i.e. η E and η S aregiven by one of the vertices deﬁned in (8) and (13), respectively, and without loss of generalitywe assume that these are v CP and v CcP (this is justiﬁed as taking another vertex leads tothe same result). Deﬁne A = (1 − p ) E and B = pS , where A has SSV (1 − p, − p, − p ) and B has SSV ( − p, − p, − p ) . In the Appendix we prove theorem VI.1 that restricts thepossible SSV of A + B . For our case it gives SSV ( M ) ∈ C p . Signed singular values of p -causal maps : Set of attainable vectors of signedsingular values associated with M in (17) for diﬀerent values of p . By theorem IV.1, forﬁxed p there exists a CPTP map E and a CcPTP map S such that M is given by (17) ifand only if the vector of signed singular values η M of the correlation matrix of M is in C p deﬁned in (20).Now suppose E and S are not extremal maps. Since the SSV of those are simply convexcombinations of the SSV of the extremal maps, it follows that also for such maps the signedsingular values of M lie within C p .We have seen that for a given value of p the allowed SSV associated with a positive map M that is p -causal lie within C p given in (20). We now turn the task around and go backto the causal inference scenario. Given a positive map M we want to tell if we can boundthe causality parameter p . We will do this based on the following deﬁnition: Deﬁnition IV.2

Causal interval I M : or a given positive unital qubit map M we deﬁne the interval of possible causal explanations(for short: the causal interval) I M , such that M is p -causal if and only if p ∈ I M . Since every qubit map is decomposable [3, p.258] the causal interval is always non empty, I M (cid:54) = ∅ . Theorem IV.2

Let M be a positive unital qubit map, with associated signed singular values η M (we assume η M i ≥ for i = 1 , ). Then the causal interval of M is given by p max = min (cid:18) − η M · v CP , (cid:19) , (21) p min = max (cid:18) η M · v CcP − , (cid:19) , (22) with v CP = (1 , , T ( v CcP = (1 , , − T ) deﬁning a vertex of the CPTP (CcPTP) tetrahe-dron T CP ( T CcP ). Note that the assumption η M i ≥ for i = 1 , can always be met, using the unitary freedomin the decomposition in the right way. Proof.

We show the theorem for p max , the determination of p min can be treated in ananalogue way.First we check if M is a CcPTP map, by checking if η M ∈ T CcP . If it is CcPTP then p max = 1 , trivially.Now suppose it is not CcPTP. p max is then given such that η M ∈ C p max but η M / ∈ C p (cid:48) with p (cid:48) ∈ ( p max , . This implies that η M lies on the surface of C p max . Since we assumed η M i ≥ for i = 1 , , the critical facet of C p max is the one which is perpendicular to v CP and has the vertices (1 , , − p max ) T , (1 , − p max , T , (1 − p max , , T (see FIG.5). Sincethis facet is perpendicular to v CP , η M lies on this facet if its projection onto v CP equalsthe vector pointing from the origin to the intersection of the facet and v CP , given as u ≡ (1 − (2 / p max ) v CP , see Fig.5. Hence we get the following equation v CP ( v CP · η M ) ! = u (23) ⇔ v CP · η M = 3 − p max (24) ⇔ p max = 3 − η M · v CP . (25)21IG. 5: Sketch for proof of theorem IV.2 : The value of p max is determined throughthe projection of η M onto v CP , which is given by u . The red triangle is one of the facets of C p max . C. Extremal cases

In the previous section we found the general form of the causal interval I M for an observedmap M . We now analyze the extremal cases where the interval reduces to a single value oron the other hand the interval is given as I M = [0 , .As already noted in [2, Table 1.] there are extremal cases that allow for a complete solutionof the problem even without any additional constraints. This is the case if η M equals oneof the vertices of the cube of positive maps, see Fig. 2. The solution is then either p = 0 (pure cause-eﬀect) if the SSV are all positive or exactly two are negative or p = 1 (purecommon-cause) if the SSV are all negative or exactly one positive. The exact reconstructionof E or S in this cases is trivial.Interestingly, with theorem IV.2 we can show that every point on the edges of the cube C deﬁned in (14) gives us a unique solution without additional constraints: Proof.

Let M be a positive map and M be the corresponding correlation matrix with M = R η M R where η M = diag ( η M ) with the signed singular values η M = (1 , , − p ) T , p ∈ [0 , , and two rotations R , R ∈ SO (3) . Due to the freedom in R and R this describes all maps with corresponding vector of SSV on one of the edges of the cube C p max = min (cid:18) − η M · v , (cid:19) = 3 − (2 + (1 − p ))2 = p, (26) p min = max (cid:18) η M · v CcP − , (cid:19) = 2 − (1 − p ) −

12 = p. (27)By theorem VI.1 it follows, that the maps E and S in the decomposition (17) necessarilycorrespond to extremal points in T CP and T CcP deﬁned in (7) and (12) (unitary channel andmaximally entangled state). It is then obvious that M = R (cid:0) (1 − p ) diag ( v CP ) + p diag ( v CcP ) (cid:1) R (28)is the only possible solution.In the other extreme case, if the map M is superpositive, i.e. CP and CcP (see Figure2), it could be explained by a pure CPTP, a pure CcPTP map, or any convex combinationof those two. Therefore one cannot give any restrictions of possible values of p [2, III.E ofsupplementary information]. Proof.

Let M be a superpositive map. There exists a SSV decomposition of its correlationmatrix for which η M ∈ O SP , deﬁned in (15), and for which η M i ≥ for i = 1 , . Hence wecan write η M = p ˆ e x + p ˆ e y + p ˆ e z + p ( − ˆ e z ) , with (cid:80) i p i = 1 . The scalar product of eachcomponent of η M with v CP = (1 , , T is upper bounded by 1. Hence we have η M · v CP ≤ and with that eq. (21) evaluates to p max = 1 . Analogously one ﬁnds p min = 0 . D. Additional assumptions / Causal inference with constrained classical correla-tions

So far we only assumed that our data is generated by a unital channel and a unitalstate (a state whose local partitions are completely mixed). We have seen that in someextreme cases a unique solution to the problem can be found. Ried et al. showed thatone can always ﬁnd a unique solution for p if one restricts the channel to unitary channelsand the bipartite states to maximally entangled pure states [2]. Furthermore, it is thenpossible to reconstruct the channel and the state up to binary ambiguity, meaning there aretwo explanations leading to the same observed correlations. The ellipsoids associated withunitary channels and maximally entangled states are spheres with unit radius and the SSVof their correlation matrices correspond to the vertices of T CP and T CcP respectively .23n the following we investigate this scenario again, but add a known amount of noise in thechannel or in the bipartite state. For the channel this is done by mixing the unitary evolutionwith a completely depolarizing channel [4]. The completely depolarizing channel maps everyBloch-vector to the origin, ρ (cid:55)→ and hence is represented by the zero matrix. The ellipsoidassociated with the mixture of a completely depolarizing channel with a unitary channelthus results in a shrinked sphere. For strong enough noise the result eventually becomesan entanglement breaking channel, which only produces “classical” correlations [21]. Dueto the unitary freedom compared to standard depolarizing channels, we call these channels generalized depolarizing channel . For the state we mix a pure maximally entangled statewith the completely mixed state, whose correlation matrix is given by the zero-matrix. Wecall the state a generalized Werner state , in the sense that instead of a convex combination ofa singlet and a completely mixed state [22] we allow the convex combination of an arbitrarymaximally entangled state with the completely mixed state. States at a certain thresholdof noise become separable and the correlations become “classical” [19]. We will then seethat even when confronted with purely classical correlations, if we have enough a-priori-knowledge about the data generation, i.e. we know the amount of noise, we can still ﬁnda solution analogous to [2], in the sense of determining uniquely the parameter p , and thechannel and the state up to binary ambiguity Note1 . We will ﬁrst keep the unitary channeland start with a generalized Werner-state and show how one can recreate the scenario ofRied et al. Then we will add the noise in the channel.

1. Solution of the causal inference problem using generalized Werner states

The analysis follows closely in spirit section III.D in the supplementary information in[2]. We start again with equation (17) and assume that the steering map S is generated bya shared generalized Werner state ρ AB = (cid:15) + (1 − (cid:15) ) | ψ (cid:105)(cid:104) ψ | , where the parameter (cid:15) ∈ [0 , is known and ﬁxed in advance and | ψ (cid:105) is an unknown maximally entangled pure state. Themap E is generated by an unknown unitary channel U .Since (cid:15) is ﬁxed, the class of allowed explanations is completely deﬁned up to unitary freedom Note1

Strictly speaking, only for p (cid:54) = 1 / one can always determine the unitary and the state. For p = 1 / there is an inﬁnite number of channels and states (all those where every point is diametrically opposedfor the unitary channel and the state.), for which the ellipsoid reduces to a single point, and hence thecorrelation matrix is the zero matrix. The parameter p = 1 / can then be restored but not the unitary andthe state.

24n the channel and in the state. Hence the number of free parameters is the same as inthe case considered in [2], which coincides with the case (cid:15) = 0 . For (cid:15) > / the state ρ AB becomes separable, i.e. is not entangled anymore, see [22] and Fig.5 in the supplementaryinformation of [19]. But the reconstruction works independently of (cid:15) . Hence, we see herethat the possibility of reconstruction hinges not on the entanglement in ρ AB but on the priorknowledge we have about ρ AB .The correlation matrix corresponding to the generalized Werner-state is simply the oneof a maximally entangled state shrinked by a factor − (cid:15) and will thus be denoted (1 − (cid:15) ) S ,where S is the correlation matrix corresponding to a maximally entangled state. Thus inour scenario the information Alice and Bob obtain characterizes the matrix M = p (1 − (cid:15) ) S + (1 − p ) E. (29)The ellipsoid is described by the eigenvalues and -vectors of M M T . The eigenvectors cor-respond to the direction of the semi axes and the squareroots of the eigenvalues are theirlengths. There is one degenerate pair and another single one. The eigenvector correspond-ing to the non-degenerate semi axis is parallel to n which is deﬁned as the axis on whichthe images of S and E are diametrically opposed. Hence the length of this semi axis is l = | − p − p (1 − (cid:15) ) | . Furthermore we havesign (det M ) = sign (1 − p − p (1 − (cid:15) )) , if l > and det M = 0 if l = 0 . Thus if we calculate the length of this semi axis we canalready determine the causality parameter p as l = | − p + p(cid:15) | ⇔ p = 1 ∓ l − (cid:15) , (30)where the ambiguity is solved by considering the sign of det M .Now that we have p and (cid:15) at hand we can deﬁne a new map with correlation matrix M (cid:48) ≡ − p(cid:15) M = p (1 − (cid:15) )1 − p(cid:15) S + 1 − p − p(cid:15) E ≡ p (cid:48) S + (1 − p (cid:48) ) E, (31)where we deﬁned p (cid:48) ≡ p (1 − (cid:15) )1 − p(cid:15) , (32) − p (cid:48) = 1 − p (1 − (cid:15) )1 − p(cid:15) = 1 − p(cid:15) − p (1 − (cid:15) − p(cid:15) = 1 − p − p(cid:15) . (33)25he properties of the ellipsoid can also be found in the SSV decomposition of the correlationmatrix M = R DR , with D = diag ( η M ) and R , R ∈ SO (3) . (34)The absolute values of the entries of η M equal the lengths of the semi axes of the ellipsoidand we choose R and R such that η M = η M . The axis on which the images of S and E are diametrically opposed is then given by the last column of R , i.e. ˆ n = R ˆ e . The lengthof this axis is l = | η M | .In (31) the promise is given that S is the correlation matrix of a maximally entangled stateand that E is the correlation matrix of a unitary channel. The reconstruction of those isextensively studied in the supplementary information of [2]. With the method presentedthere we ﬁnd the value of p (cid:48) and can restore the correlation matrices corresponding to U and | ψ (cid:105) up to a binary ambiguity, and hence solve the causal inference problem. We reviewthis in terms of SSV and discuss where the binary ambiguity arises.Starting from the l.h.s. of (31) the goal is to determine p (cid:48) , S, and M on the r.h.s. Considerthe SSV decomposition of the correlation matrix M (cid:48) = R (cid:48) D (cid:48) R (cid:48) , with D (cid:48) = diag ( η M (cid:48) ) and R (cid:48) , R (cid:48) ∈ SO (3) . (35)The absolute values of the entries of η M (cid:48) equal the lengths of the semi axes of the ellipsoidand we choose R (cid:48) , R (cid:48) s.t. η M (cid:48) = η M (cid:48) . The axis on which the images of S and E arediametrically opposed is then given by the last column of R (cid:48) , i.e. ˆ n (cid:48) = R (cid:48) ˆ e . The length ofthis axis is l (cid:48) = | η M (cid:48) | . However, the direction of ˆ n (cid:48) , depending on the choice of R (cid:48) and R (cid:48) ,cannot be determined uniquely and allows two possible solutions ± ˆ n (cid:48) . The parameter p (cid:48) isdetermined by the length l (cid:48) and can be calculated as p (cid:48) = 1 − ( sign det( M (cid:48) )) l (cid:48) , (36)and if det( M (cid:48) ) = 0 we have p (cid:48) = 1 / . If p (cid:48) = 0 or p (cid:48) = 1 the reconstruction is trivial (ofcourse in these cases one cannot reconstruct S or E , respectively). If p (cid:48) ∈ (0 , , we can26eﬁne [2] r (cid:48) = | η M (cid:48) | , (37) γ (cid:48) = 2 arcsin (cid:32)(cid:115) − r (cid:48) p (cid:48) − p (cid:48) ) (cid:33) , (38) γ (cid:48) = arccos  r (cid:48) − (cid:104) p (cid:48) sin γ (cid:48) (cid:105) r (cid:48)  . (39)The reconstruction of the correlation matrices S and E can then be done, c.f. eq. (58) and(59) in the supplementary information of [2]: E = R ˆ n (cid:48) ,γ (cid:48) S ⊥ ˆ n (cid:48) , /r (cid:48) S ˆ n (cid:48) , / (1 − p (cid:48) ) M (cid:48) , (40) S = R ˆ n (cid:48) , − γ (cid:48) + γ (cid:48) S ⊥ ˆ n (cid:48) , /r (cid:48) S ˆ n (cid:48) , / (2 p (cid:48) − M (cid:48) , (41)where R ˆ n,α indicates a rotation about axis ˆ n with rotation angle α , S ˆ n (cid:48) , / (1 − p (cid:48) ) a scalingalong ˆ n (cid:48) by a factor / (1 − p (cid:48) ) and S ⊥ ˆ n (cid:48) , /r (cid:48) a scaling of the plane perpendicular to ˆ n (cid:48) by afactor /r (cid:48) . From (40) and (41) we see that a reconstruction of E and S is not possible if p (cid:48) = 1 / .Let us summarize what we can infer about the causation of M given in (29): • The causality parameter p can be determined uniquely in all cases, see eq.(30). • If r (cid:48) = 0 or p (cid:48) = 1 / then S and E cannot be determined, • else we can determine two sets of solutions for E and S given by (40) and (41),distinguished by the choice of direction of ˆ n (cid:48) .On the other hand, if we do not have prior knowledge of (cid:15) , then in general we cannotdetermine p with (30). This ambiguity can easily be illustrated by looking at an example:Take U = σ x and | ψ (cid:105) = | (cid:105)−| (cid:105)√ . We then have: E = diag (1 , − , − , S = diag ( − , , . Combining this for arbitrary (cid:15) and p gives M = diag (1 − p(cid:15), − (1 − p(cid:15) ) , − (1 − p(cid:15) )) . p(cid:15) = cons., the measurement statistics for Aliceand Bob are exactly the same and there is no way to distinguish diﬀerent pairs of values.Analogously to using a generalized Werner state for the steering map, we can also usea generalized depolarizing channel. Then, with prior knowledge of the amount of noise, wecan still ﬁnd a complete solution even though the resulting channel might be entanglementbreaking.

2. Generalized depolarizing channel and generalized Werner state

We shall now consider the case where both the channel as well as the state are mixedwith a known amount of noise. Therefore we take S (cid:48) = (1 − (cid:15) c ) S for a generalized Wernerstate (thus S corresponds again to a rotated and inverted Bloch-sphere) and E (cid:48) = (1 − (cid:15) e ) E for a generalized depolarizing channel. We again assume (cid:15) e ∈ (0 , and (cid:15) c ∈ (0 , to beknown. We then have M = (1 − p )(1 − (cid:15) e ) E + p (1 − (cid:15) c ) S. (42)The reconstruction works as follows. Without loss of generality we assume (cid:15) e ≤ (cid:15) c (in theother case we just have to make the reconstruction discussed in the previous subsection forthe entanglement breaking channel and not for the Werner-state). The only thing we haveto do is to divide by (1 − (cid:15) e ) to restore the problem of the previous section M (cid:48) = M − (cid:15) e = (1 − p ) E + p − (cid:15) c − (cid:15) e S ≡ (1 − p ) E + p (1 − (cid:15) ) S, with − (cid:15) ≡ − (cid:15) c − (cid:15) e . The rest can then be solved as in the previous subsection.Again we remark that nothing changes if we have (cid:15) c > / and (cid:15) e > / even though at thattransition the states become separable and the channels entanglement-breaking, respectively. V. DISCUSSION

In this work we extended the results initially found by Ried et al. [2]. We introducedan active and a passive quantum-observational scheme as analogies to the classical observa-tional scheme. The passive quantum-observational scheme does not allow for an advantage28ver classical casual inference. In the active quantum observational scheme Alice and Bobcan freely choose their measurement bases, which in principle allows for signaling. However,we investigated the quantum advantage over classical causal inference in a scenario wheresignaling is not possible in the active quantum observation scheme, as Alice’ incoming stateis completely mixed.We showed how the geometry of the set of signed singular values (SSV) of correlationmatrices representing positive maps of the density operator ρ A (cid:55)→ ρ B determines the possi-bility to reconstruct the causal structure linking ρ A and ρ B . We showed that there are morecases than previously known for which a complete solution of the causal inference problemcan be found without additional constraints, namely all correlations created by maps whosesigned singular values of the correlation matrix lie on the edges of the cube of positive maps C deﬁned in (14). A necessary and suﬃcient condition for this is that the state is maximallyentangled, that the channel is unitary, and that the corresponding correlation matrices havea SSV decomposition involving the same rotations.For correlations guaranteed to be produced by a mixture of a unital channel and a unitalbipartite state, we quantiﬁed the quantum advantage by giving the intervals for possiblevalues of the causality parameter p . Here, in order to constrain p , and hence have an ad-vantage over classical causal inference, it is necessary that the correlations were caused byan entangled state and/or an entanglement preserving channel. This is because correlationscaused by any mixture of a separable state and an entanglement breaking quantum channelalways describe super-positive maps. According to theorem IV.2 the causal interval for anysuper-positive map M is I M = [0 , . Hence, super-positive maps do not allow any causalinference.Things change when we further strengthen the assumptions on the data generating pro-cesses and allow only unitary freedom in the state, corresponding to a generalized Wernerstate with given degree of noise (cid:15) c , or unitary freedom in the channel, corresponding toa generalized depolarizing channel with given degree of noise (cid:15) e . We showed that in thisscenario the causality parameter p can always be uniquely determined and in most casesthe state and the channel can be reconstructed up to binary ambiguity. For (cid:15) c > / thestate becomes separable and for (cid:15) e > / the channel entanglement breaking but still causalinference is feasible. Therefore entanglement and entanglement preservation are not a nec-29ssary condition in this scenario. The assumptions on the data generating processes, i.e.a-priori knowledge of (cid:15) c and (cid:15) e , are strong enough, such that even correlations correspondingto super-positive maps reveal the underlying causal structure.30 I. APPENDIXSigned singular values of sums of matrices

Let A be a n × n real matrix. A possible singular value decomposition (SVD) of A isgiven as A = O DO , (43)where O , are orthogonal matrices ( O i O Ti = ) and D is a positive semi-deﬁnite diagonalmatrix D = diag ( σ A , ..., σ An ) , with σ Ai called the (absolute) singular values (SV) of A . Thematrices in (43) are not uniquely deﬁned and all possible permutations of the singular valueson the diagonal of D are possible for diﬀerent orthogonal matrices O and O . We use thisfreedom to write the SV in canonical order, σ A ≥ σ A ≥ ... ≥ σ An . Example:

We give two diﬀerent SVDs of a × matrix BB ≡  − −

30 2 0  =  −

10 1 0     −  =  − −      . The last decomposition gives the singular values of B in canonical order σ B = 3 , σ B = 2 and σ B = 1 .Next we call A = R D (cid:48) R (44)the signed singular value decomposition (also called real singular values [23]) of A , where R i ∈ SO ( n ) are orthogonal matrices with determinant equal to one. In the × scenario thesecorrespond to proper rotations in R . The diagonal matrix D (cid:48) contains the signed singularvalues (SSV) of A. The SSV have the same absolute values as the SV but additionally canhave negative signs. Concretely, the freedom in choosing R and R allows one to get anypermutations of the SV on the diagonal of D (cid:48) together with an even or odd number of minussigns, depending on whether A has positive or negative determinant, respectively. If at least31ne singular value equals 0, the number of signs becomes completely arbitrary. Using thesame matrix B as above we give two diﬀerent signed singular value decompositions as an example: B ≡  − −

30 2 0  =  −

10 1 0   −    (45) =  − −   −   −  . (46)For the SSV decomposition we deﬁne a canonical order with the absolute values of the singu-lar values sorted in decreasing order and only a negative sign on the last entry if the matrixhas negative determinant, as in (46). The rotational freedom in (44) allows for arbitrarypermutations of the order of singular values and addition of any even number of minus signs.Confusion may arise since for example an R permutation matrix corresponding to a permu-tation of exactly two coordinates has determinant -1, so why would it be allowed? The pointis, that we not only want to permute elements of a vector, but the diagonal elements of amatrix. We illustrate that by permuting two components of i) a vector and ii) a diagonalmatrix. P yz ≡   , det P yz = − , (47) i) P yz ·  abc  =  acb  , (48) ii) P yz · diag ( a, b, c ) · P yz = diag ( a, c, b ) = ( − P yz ) diag ( a, b, c ) ( − P yz ) . (49)I.e. as − P yz = R ˆ x ( π/ · R ˆ y ( π ) the eﬀect of permuting the second and third diagonal entryof a diagonal matrix can also be obtained by proper rotations, and correspondingly for otherpermutations of the SSV. Hence all permutations of the SSV are allowed.Fan [24] gave bounds on the SV of A + B given the SV of two real matrices A and B ,derived from the corresponding results for eigenvalues of hermitian matrices and using that32he matrix ˜ A ≡  n × n AA T n × n  has the singular values of A and their negatives as eigenvalues[25, p.243 for review]. In the main part of this work we need a more constraining statementusing the SSV, and thus taking the determinant of A , B , and A + B into account as well.This leads to theorem VI.1. In the following we will denote with ˜ σ ( A ) the vector of canonicalSSV of the n × n real matrix A. Since the product of two rotations is again a rotation itfollows directly from (44) that ˜ σ ( Q AQ ) = ˜ σ ( A ) , ∀ Q , Q ∈ SO ( n ) . (50)Let w be a n -dimensional vector. We deﬁne ∆ w ≡ Conv (cid:32)(cid:40) (cid:0) s w π (1) , ..., s n w π ( n ) (cid:1) T (cid:12)(cid:12)(cid:12) s ν ∈ {− , } : (cid:89) ν s ν = 1 , π ∈ S n (cid:41)(cid:33) (51)as the convex hull of all possible permutations π ∈ S n of the components of w multipliedwith an even number of minus signs. Let now w and w be two n -dimensional vectors. Wedeﬁne Σ w , w ≡ { a + b | a ∈ ∆ w , b ∈ ∆ w } . (52)Figure 6 presents an illustration of the case n = 2 . Theorem VI.1

Let A and B be two n × n real matrices whose SSV are known. Then ˜ σ ( A + B ) ∈ Σ ˜ σ ( A ) , ˜ σ ( B ) . (53) Proof.

Let A be a n × n real matrix and let d ( A ) denote the vector of diagonal entries of A . Thompson showed the following two statements about the diagonal elements of A [26,theorems 7 and 8] i ) d ( A ) ∈ ∆ ˜ σ ( A ) , (54) ii ) ∀ d ∈ ∆ ˜ σ ( A ) ∃ R , R ∈ SO ( n ) : d = d ( R AR ) . (55)Now let A and B be two n × n real matrices. Let R , R ∈ SO ( n ) such that d ( R ( A + B ) R ) =˜ σ ( A + B ) . We then have ˜ σ ( A + B ) = d ( R ( A + B ) R ) = d ( R AR ) + d ( R BR ) ∈ Σ ˜ σ ( R AR ) , ˜ σ ( R BR ) = Σ ˜ σ ( A ) , ˜ σ ( B ) , w , w - - - FIG. 6:

Illustration of theorem VI.1:

Suppose we have two × matrices A and B with SSV w and w respectively. The red and the yellow sets correspond to Σ w and Σ w deﬁned by (51). By theorem VI.1 the vector of SSV of A + B then lies within the blue set,deﬁned by (52).where the second equation follows from the linearity of matrix addition in every elementand the last equality from (50).As mentioned above, results for the absolute singular values of A + B have been knownbefore. To complete, we show that the above proof works analogously for the correspondingstatement on absolute singular values: Let σ ( A ) denote the vector of canonical absolutesingular values of an n × n real matrix A , σ ( A ) ≥ σ ( A ) ≥ ... ≥ σ n ( A ) . Let B be another n × n real matrix. Then [25, chapter 9 G.1.d.] σ ( A + B ) ≺ w σ ( A ) + σ ( B ) , (56)i.e. the vector of canonical singular values of A + B is weakly majorized by the sum of thevectors of canonical singular values of A and B . Weak majorization for two vectors x and34 with x ≥ x ≥ ... ≥ x n and y ≥ y ≥ ... ≥ y n is deﬁned as x ≺ w y ⇔ k (cid:88) i =1 x i ≤ k (cid:88) i =1 y k ∀ k ∈ { , , ..., n } . (57)To see (56) deﬁne ∆ (cid:48) w analogously to (51) but without the constraint (cid:81) ν s ν = 1 , i.e. allowingarbitrary sign ﬂips. The analogue statements of (54) and (55) hold if we exchange the SSVwith the absolute singular values, proper rotations (elements of SO ( n ) ) with orthogonalmatrices (elements of O ( n ) ), and ∆ w with ∆ (cid:48) w . We then ﬁnd, that σ ( A + B ) ∈ Σ (cid:48) σ ( A ) , σ ( B ) ,with Σ (cid:48) w , w ≡ (cid:8) a + b | a ∈ ∆ (cid:48) w , b ∈ ∆ (cid:48) w (cid:9) . Since per deﬁnition the absolute singular valuesare non-negative, we can further restrict Σ (cid:48) to the ﬁrst hyperoctant. On the other hand, fortwo vectors x , y ∈ R n + we have (proposition C.2. of chapter 4 in [25]) x ≺ w y ⇔ x ∈ Conv (cid:0)(cid:8) s y π (1) , ..., s n y π ( n ) | s ν ∈ { , } , π ∈ S n (cid:9)(cid:1) . (58)The set on the r.h.s. coincides with the restriction of Σ (cid:48) to the ﬁrst hyperoctant if we take y = σ ( A ) + σ ( B ) . Taking x = σ ( A + B ) , eq. (56) follows. [1] H. Reichenbach, The direction of time (University of California Press, Berkeley, 1971).[2] K. Ried, M. Agnew, L. Vermeyden, D. Janzing, R. W. Spekkens, and K. J. Resch, Nat. Phys. , 414 (2015), arXiv:1406.5036.[3] I. Bengtsson and K. Życzkowski, Geometry of quantum states: an introduction to quantumentanglement (Cambridge University Press, Cambridge, 2006).[4] M. A. Nielsen and I. L. Chuang,

Quantum Computation and Quantum Information , 10th ed.(Cambridge University Press, Cambridge, 2010).[5] J. Pearl,

Causality , 2nd ed. (Cambridge University Press, Cambridge, 2009).[6] J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf, J. Mach. Learn. Res. , 1 (2016), arXiv:1412.3773.[7] G. Chiribella, G. M. D’Ariano, and P. Perinotti, Phys. Rev. A , 012311 (2011),arXiv:1011.6451.[8] O. Oreshkov, F. Costa, and Č. Brukner, Nat. Commun. , 1092 (2012), arXiv:1105.4464v3.[9] F. Costa and S. Shrapnel, New J. Phys. , 063032 (2016), arXiv:1512.07106.[10] O. Oreshkov and C. Giarmatzi, New J. Phys. , 093020 (2016), arXiv:1506.05449.

11] L. M. Procopio, A. Moqanaki, M. Araujo, F. Costa, I. Alonso Calafell, E. G. Dowd,D. R. Hamel, L. A. Rozema, C. Brukner, and P. Walther, Nat. Commun. , 7913 (2015),arXiv:1412.4006.[12] G. Chiribella, Phys. Rev. A , 040301 (2012), arXiv:1109.5154v3.[13] P. A. Guérin, A. Feix, M. Araújo, and Č. Brukner, Phys. Rev. Lett. , 100502 (2016),arXiv:1605.07372.[14] A. Fujiwara and P. Algoet, Phys. Rev. A , 3290 (1999).[15] D. Braun, O. Giraud, I. Nechita, C. E. Pellegrini, and M. Znidaric, J. Phys. A Math. Theor. , 135302 (2014), arXiv:1306.0495v2.[16] M. Beth Ruskai, S. Szarek, and E. Werner, Linear Algebra Appl. , 159 (2002),arXiv:0101003v2 [quant-ph].[17] J. S. Bell, Physics , 195 (1964).[18] E. Schrödinger, Math. Proc. Cambridge Phil. Soc. , 555 (1935).[19] S. Jevtic, M. Pusey, D. Jennings, and T. Rudolph, Phys. Rev. Lett. , 020402 (2014),arXiv:1303.4724.[20] A. Milne, S. Jevtic, D. Jennings, H. Wiseman, and T. Rudolph, New J. Phys. , 083017(2014), arXiv:1403.0418.[21] M. B. Ruskai, Rev. Math. Phys. , 643 (2003), arXiv:0302032 [quant-ph].[22] R. F. Werner, Phys. Rev. A , 4277 (1989).[23] A. R. Amir-Moez and A. Horn, Am. Math. Mon. , 742 (1958).[24] K. Fan, Proceedings of the National Academy of Sciences of the United States of America ,760 (1951).[25] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and Its Applications (Aca-demic Press, New York, 1979).[26] R. Thompson, SIAM J. Appl. Math. , 39 (1977)., 39 (1977).