[PDF] Certifying the Classical Simulation Cost of a Quantum Channel

Abstract

A fundamental objective in quantum information science is to determine the cost in classical resources of simulating a particular quantum system. The classical simulation cost is quantified by the signaling dimension which specifies the minimum amount of classical communication needed to perfectly simulate a channel's input-output correlations when unlimited shared randomness is held between encoder and decoder. This paper provides a collection of device-independent tests that place lower and upper bounds on the signaling dimension of a channel. Among them, a single family of tests is shown to determine when a noisy classical channel can be simulated using an amount of communication strictly less than either its input or its output alphabet size. In addition, a family of eight Bell inequalities is presented that completely characterize when any four-outcome measurement channel, such as a Bell measurement, can be simulated using one communication bit and shared randomness. Finally, we bound the signaling dimension for all partial replacer channels in d dimensions. The bounds are found to be tight for the special case of the erasure channel.

Full PDF

CCertifying the Classical Simulation Cost of a Quantum Channel

Brian Doolittle and Eric Chitambar (Dated: February 26, 2021)A fundamental objective in quantum information science is to determine the cost in classicalresources of simulating a particular quantum system. The classical simulation cost is quantiﬁed bythe signaling dimension which speciﬁes the minimum amount of classical communication neededto perfectly simulate a channel’s input-output correlations when unlimited shared randomness isheld between encoder and decoder. This paper provides a collection of device-independent teststhat place lower and upper bounds on the signaling dimension of a channel. Among them, a singlefamily of tests is shown to determine when a noisy classical channel can be simulated using anamount of communication strictly less than either its input or its output alphabet size. In addition,a family of eight Bell inequalities is presented that completely characterize when any four-outcomemeasurement channel, such as a Bell measurement, can be simulated using one communication bitand shared randomness. Finally, we bound the signaling dimension for all partial replacer channelsin d dimensions. The bounds are found to be tight for the special case of the erasure channel. I. INTRODUCTION

The transmission of quantum states between devices iscrucial for many quantum network protocols. In the near-term, quantum memory limitations will restrict quan-tum networks to “prepare and measure” functionality[1], which allows for quantum communication betweenseparated parties but requires measurement immediatelyupon reception. Prepare and measure scenarios exhibitquantum advantages for tasks that involve distributedinformation processing [2] or establishing nonlocal corre-lations which cannot be reproduced by bounded classicalcommunication and shared randomness [3]. These non-local correlations lead to quantum advantages in randomaccess codes [4, 5], randomness expansion [6], device self-testing [7], semi-device-independent key distribution [8],and dimensionality witnessing [9, 10].The general communication process is depicted in Fig.1(a) with Alice (the sender) and Bob (the receiver) con-nected by some quantum channel N A → B . Alice encodesa classical input x ∈ X into a quantum state ρ x andsends it through the channel to Bob, who then mea-sures the output using a positive-operator valued mea-sure (POVM) { Π y } y ∈Y to obtain a classical message y ∈ Y . The induced classical channel, denoted by P N ,has transition probabilities P N ( y | x ) = Tr (cid:104) Π y N (cid:0) ρ x (cid:1)(cid:105) . (1)A famous result by Holevo implies that the communica-tion capacity of P N is limited by log d , where d is theinput Hilbert space dimension of N [11]; hence a noiselessclassical channel transmitting d messages has a capacityno less than P N .However, channel capacity is just one ﬁgure of merit,and there may be other features of a P N that do not read-ily admit a classical simulation. The strongest form ofsimulation is an exact replication of the transition proba-bilities P N ( y | x ) for any set of states { ρ x } x ∈X and POVM { Π y } y ∈Y . This problem falls in the domain of zero-error quantum information theory [12–16], which considers theclassical and quantum resources needed to perfectly sim-ulate a given channel. Unlike the capacity, a zero-errorsimulation of P N typically requires additional communi-cation beyond the input dimension of N . For example,a noiseless qubit channel id can generate channels P id that cannot be faithfully simulated using a one bit ofclassical communication [3].The simulation question becomes more interesting if“static” resources are used for the channel simulation[17, 18], in addition to the “dynamic” resource of noise-less classical communication. For example, shared ran- FIG. 1. A general classical communication process. We repre-sent classical information as blue double lines, quantum infor-mation as black solid lines, and shared randomness as dottedred lines. (a) A classical channel P N is generated from a quan-tum channel N via Eq. (1). A classical-quantum encoder Ψmaps the classical input x ∈ X into a quantum state ρ x . Aquantum-classical decoder Π implements POVM { Π y } y ∈Y .(b) Channel P N is simulated using shared randomness and anoiseless classical channel via Eq. (2). Alice encodes input x into classical message m with probability T λ ( m | x ) while Bobdecodes message m into output y with probability R λ ( y | m ).The protocol is coordinated using a shared random value λ drawn from sample space Λ with probability q ( λ ). a r X i v : . [ qu a n t - ph ] F e b domness is a relatively inexpensive classical resource thatAlice and Bob can use to coordinate their encoding anddecoding maps used in the simulation protocol shownin Fig. 1(b). Using shared randomness, a channel canbe exactly simulated with a forward noiseless communi-cation rate that asymptotically approaches the channelcapacity; a fact known as the Classical Reverse ShannonTheorem [19]. More powerful static resources such asshared entanglement or non-signaling correlations couldalso be considered [15, 20, 21].While the Classical Reverse Shannon Theorem de-scribes many-copy channel simulation, this work focuseson zero-error channel simulation in the single-copy case.The minimum amount of classical communication (withunlimited shared randomness) needed to perfectly simu-late every classical channel P N having the form of Eq.(1) is known as the signaling dimension of N [22]. Sig-niﬁcant progress in understanding the signaling dimen-sion was made by Frenkel and Weiner who showed thatevery d -dimensional quantum channel requires no morethan d classical messages to perfectly simulate [23]. Thisresult is a “ﬁne-grained” version of Holevo’s Theoremfor channel capacity mentioned above. However, theFrenkel-Weiner bound is not tight in general. For ex-ample, consider the completely depolarizing channel on d dimensions, D ( ρ ) = I /d . For any choice of inputs { ρ x } x and POVM { Π y } y , the Frenkel-Weiner protocolyields a simulation of P D that uses a forward transmis-sion of d messages. However, this is clearly not optimalsince P D can be reproduced with no forward communi-cation whatsoever; Bob just samples from the distribu-tion P ( y ) = Tr[Π y ] /d . A fundamental problem is thento understand when a noisy classical channel sending d messages from Alice to Bob actually requires d noiselessclassical messages for zero-error simulation. As a mainresult of this paper, we provide a family of simple teststhat determine when this amount of communication isneeded. In other words, we characterize the conditionsin which the simulation protocol of Frenkel and Weineris optimal for the purposes of sending d messages over a d -dimensional quantum channel.This work pursues a device-independent certiﬁcation ofsignaling dimension similar to previous approaches usedfor the device-independent dimensionality testing of clas-sical and quantum devices [24–28]. Speciﬁcally, we ob-tain Bell inequalities that stipulate necessary conditionson the signaling dimension of N in terms of the prob-abilities P N ( y | x ), with no assumptions made about thequantum states { ρ x } x , POVM { Π y } y , or channel N [29].Complementary results have been obtained by Dall’Arno et al. who approached the simulation problem from thequantum side and characterized the set of channels P N that can be obtained using binary encodings for specialtypes of quantum channels N [29]. In this paper, wecompute a wide range of Bell inequalities using the adja-cency decomposition technique [30], recovering prior re-sults of Frenkel and Weiner [23] and generalizing workby Heinosaari and Kerppo [31]. For certain cases we prove that these inequalities are complete, i.e. provid-ing both necessary and suﬃcient conditions for signalingdimension. As a further application, we compute boundsfor the signaling dimension of partial replacer channels.Proofs for our main results are found in the Appendixwhile our supporting software is found on Github [32]. II. SIGNALING POLYTOPES

We begin our investigation by reviewing the structureof channels that use noiseless classical communicationand shared randomness. Let P n → n (cid:48) denote the familyof channels having input set X = [ n ] := { , · · · , n } andoutput set Y = [ n (cid:48) ]. A channel P ∈ P n → n (cid:48) is repre-sented by an n (cid:48) × n column stochastic matrix, and wethus identify P n → n (cid:48) as a subset of R n (cid:48) × n , the set ofall n (cid:48) × n real matrices. In general we refer to a col-umn (or row) of a matrix as being stochastic if its el-ements are non-negative and sum to unity, and a col-umn (resp. row) stochastic matrix has only stochasticcolumns (resp. rows). The elements of a real matrix G ∈ R n (cid:48) × n are denoted by G y,x , while those of a columnstochastic matrix P ∈ P n → n (cid:48) are denoted by P ( y | x ) toreﬂect their status as conditional probabilities. The Eu-clidean inner product between G , P ∈ R n (cid:48) × n is expressedas (cid:104) G , P (cid:105) := (cid:80) x,y G y,x P ( y | x ), and for any G ∈ R n (cid:48) × n and γ ∈ R , we let the tuple ( G , γ ) denote the linearinequality (cid:104) G , P (cid:105) ≤ γ .Consider now a scenario in which Alice and Bob haveaccess to a noiseless channel capable of sending d mes-sages. They can use this channel to simulate a noisychannel by applying pre- and post-processing maps. Ifthey coordinate these maps using a shared random vari-able λ with probability mass function q ( λ ), then they cansimulate any channel P that decomposes as P ( y | x ) = (cid:88) λ q ( λ ) (cid:88) m ∈ [ d ] R λ ( y | m ) T λ ( m | x ) , (2)where m ∈ [ d ] is the message sent from Alice to Bob and T ( m | x ) (resp. R ( y | m )) is an element of Alice’s encoder T ∈ P n → d (resp. Bob’s decoder R ∈ P d → n (cid:48) ). Deﬁnition 1.

For given positive integers n , n (cid:48) , and d ,the set of all channels satisfying Eq. (2) constitute the signaling polytope , denoted by C n → n (cid:48) d .The signaling polytope C n → n (cid:48) d is a convex polytope ofdimension n ( n (cid:48) −

1) whose vertices V ∈ P n → n (cid:48) have0/1 matrix elements and rank( V ) ≤ d . We deﬁne (cid:104) G , P (cid:105) ≤ γ as a Bell inequality for C n → n (cid:48) d if C n → n (cid:48) d ⊂{ P ∈ P n → n (cid:48) | (cid:104) G , P (cid:105) ≤ γ } , and it is a “tight” Bellinequality if the equation (cid:104) G , P (cid:105) = γ is also solved by n ( n (cid:48) −

1) aﬃnely independent vertices. When the latterholds, the solution space to (cid:104) G , P (cid:105) = γ is called a facetof C n → n (cid:48) d . The Weyl-Minkowski Theorem ensures that acomplete set of tight Bell inequalities {(cid:104) G k , P (cid:105) ≤ γ k } rk =1 exists such that P ∈ C n → n (cid:48) d iﬀ it satisﬁes all inequali-ties in this set [33]. Additional details about signalingpolytopes are found in Appendix B.Having introduced signaling polytopes, we can nowdeﬁne the signaling dimension of a channel. This ter-minology is adopted from recent work by Dall’Arno etal. [22] who deﬁned the signaling dimension of a systemin generalized probability theories; an analogous quan-tity without shared randomness has also been studiedby Heinosaari et al. [34]. In what follows, we assumethat N : S ( A ) → S ( B ) is a completely positive trace-preserving (CPTP) map, with S ( A ) denoting the set ofdensity operators (i.e. trace-one positive operators) onsystem A , and similarly for S ( B ). Deﬁnition 2.

Let P n → n (cid:48) N be the set of all classical chan-nels P N ∈ P n → n (cid:48) generated from N via Eq. (1). The n → n (cid:48) signaling dimension of N , denoted by κ n → n (cid:48) ( N ),is the smallest d such that P n → n (cid:48) N ⊂ C n → n (cid:48) d . The signal-ing dimension of a channel N , denoted by κ ( N ), is thesmallest d such that P n → n (cid:48) N ⊂ C n → n (cid:48) d for all n, n (cid:48) .For any channel N , a trivial upper bound on the n → n (cid:48) signaling dimension is given by κ n → n (cid:48) ( N ) ≤ min { n, n (cid:48) } . (3)Indeed, when this bound is attained, Alice and Bob cansimulate any P ∈ P n → n (cid:48) : either Alice applies channel P on her input and sends the output to Bob, or she sendsthe input to Bob and he applies P on his end. In Theo-rem 1 we provide necessary and suﬃcient conditions forwhen this trivial upper bound is attained. For a quantumchannel N , the trivial upper bound is κ ( N ) ≤ min { d A , d B } , (4)where d A and d B are the Hilbert space dimensions ofAlice and Bob’s systems. This bound is a direct conse-quence of Frenkel and Weiner’s result [23], which can berestated in our terminology as κ (id d ) = d , where id d isthe noiseless channel on a d -dimensional quantum sys-tem. To prove Eq. (4), Alice can either send the states { ρ x } x to Bob who then performs the POVM {N † (Π y ) } y ,or she can send the states {N ( ρ x ) } x to Bob who thenperfoms the POVM { Π y } y . Here N † denotes the adjointmap of N . Another relationship we observe is κ n → n (cid:48) ( N ) = κ n → d B ( N ) ∀ n (cid:48) ≥ d B . (5)This follows from Carath´eodory’s Theorem [35], whichimplies that every POVM on a d B -dimensional systemcan be expressed as a convex combination of POVMswith no more than d B outcomes [36]. Since shared ran-domness is free, Alice and Bob can always restrict theirattention to POVMs with no more than d B outcomes for the purposes of simulating any channel in P n → n (cid:48) N when n (cid:48) ≥ d B .The notion of signaling dimension also applies to noisyclassical channels. A classical channel from set X to Y can be represented by a CPTP map N : S ( C |X | ) →S ( C |Y| ) that completely dephases its input and output inﬁxed orthonormal bases {| x (cid:105)} x ∈X and {| y (cid:105)} y ∈Y , respec-tively. The transition probabilities of N are then givenby Eq. (1) as P N ( y | x ) = Tr (cid:2) | y (cid:105)(cid:104) y |N (cid:0) | x (cid:105)(cid:104) x | (cid:1)(cid:3) . The chan-nel N can be used to generate another channel N withinput and output alphabets X and Y by performing apre-processing map T : X → X and post-processing map R : Y → Y , thereby yielding the channel P N = RP N T .When this relationship holds, P N is said to be ultra-weakly majorized by P N [31, 34], and the signaling di-mension of P N is no greater than that of P N [15].In practice, the channel connecting Alice and Bob maybe unknown or not fully characterized. This is the casein most experimental settings where unpredictable noiseaﬀects the encoded quantum states. In such scenariosit is desirable to ascertain certain properties of the chan-nel without having to perform full channel tomography, aprocedure that requires trust in the state preparation de-vice on Alice’s end and the measurement device on Bob’sside. A device-independent approach infers propertiesof the channel by analyzing the observed input-outputclassical correlations P ( y | x ) obtained as sample averagesover many uses of the memoryless channel [29]. The Bellinequalities introduced in the next section can be usedto certify the signaling dimension of the channel: if thecorrelations P ( y | x ) are shown to violate a Bell inequalityof C n → n (cid:48) d , then the signaling dimension κ n → n (cid:48) ( N ) > d .If these correlations arise from some untrusted quan-tum channel N A → B , by Eq. (4) it then follows thatmin { d A , d B } > d . Hence a device-independent certiﬁca-tion of signaling dimension leads to a device-independentcertiﬁcation of the physical input/output Hilbert spacesof the channel connecting Alice and Bob. III. BELL INEQUALITIES FOR SIGNALINGPOLYTOPES

In this section we discuss Bell inequalities for signalingpolytopes. Since signaling polytopes are invariant underthe relabelling of inputs and outputs, all discussed in-equalities describe a family of inequalities where each el-ement is obtained by a permutation of the inputs and/oroutputs. Additionally, a Bell inequality for one signal-ing polytope can be lifted to a polytope having moreinputs and/or outputs [37, 38] (see Fig. 2). Formally, aBell inequality (cid:104) G , P (cid:105) ≤ γ is said to be input lifted to (cid:104) G (cid:48)(cid:48) , P (cid:105) ≤ γ if G (cid:48)(cid:48) ∈ R n (cid:48) × m is obtained from G ∈ R n (cid:48) × n by padding it with ( m − n ) all-zero columns. On theother hand, a Bell inequality (cid:104) G , P (cid:105) ≤ γ is said to beoutput lifted to (cid:104) G (cid:48) , P (cid:105) ≤ γ if G (cid:48) ∈ R m (cid:48) × n is obtainedfrom G ∈ R n (cid:48) × n by copying rows; i.e. , there exists a sur-(a) G =   → G (cid:48)(cid:48) =   (b) G =   → G (cid:48) =   FIG. 2. (a) Input and (b) output liftings of G = I . jective function f : [ m (cid:48) ] → [ n (cid:48) ] such that G (cid:48) y,x = G f ( y ) ,x for all y ∈ [ m (cid:48) ] and x ∈ [ n ]. Note that m (cid:48) > n (cid:48) and m > n in these examples.To obtain polytope facets, it is typical to ﬁrst enu-merate the vertices, then use a transformation techniquesuch as Fourier-Motzkin elimination to derive the facets[33]. Software such as PORTA [39, 40] assists in thiscomputation, but the large number of vertices leads toimpractical run times. To improve eﬃciency, we utilizethe adjacency decomposition technique which heavily ex-ploits the permutation symmetry of signaling polytopes[30] (see Appendix C). Our software and computed facetsare publicly available on Github [32] while a catalog ofgeneral tight Bell inequalities is provided in AppendixD. We now turn to a speciﬁc family of Bell inequalitiesmotivated by our computational results. A. Ambiguous Guessing Games

For k ∈ [0 , n (cid:48) ] and d ≤ min { n, n (cid:48) } , let G n,n (cid:48) k,d be any n (cid:48) × n matrix such that (i) k rows are stochastic with0/1 elements, and (ii) the remaining ( n (cid:48) − k ) rows have1 / ( n − d + 1) in each column. As explained below, it willbe helpful to refer to rows of type (i) as “guessing rows”and rows of type (ii) as “ambiguous rows.” For example,if n = n (cid:48) = 6, k = 5, and d = 2, then up to a permutationof rows and columns we have G , , = 

15 15 15 15 15 15  . (6)For any channel P ∈ C n → n (cid:48) d , the Bell inequality (cid:104) G n,n (cid:48) k,d , P (cid:105) ≤ d (7)is satisﬁed. To prove this bound, suppose without lossof generality that the ﬁrst k rows of G n,n (cid:48) k,d are guessingrows. Let V be any vertex of C n → n (cid:48) d where t of its ﬁrst k rows are nonzero. If t = d , then clearly Eq. (7) holds.Otherwise, if t < d , then (cid:104) G n,n (cid:48) k,d , V (cid:105) ≤ t + ( n − t ) / ( n − d + 1) ≤ d , where the last inequality follows after somealgebraic manipulation.Equation (7) can be interpreted as the score of a guess-ing game that Bob plays with Alice. Suppose that Alicechooses a channel input x ∈ [ n ] with uniform probabilityand sends it through a channel P . Based on the channeloutput y , Bob guesses the value of x . Formally, Bob com-putes ˆ x = f ( y ) for some guessing function f , and if ˆ x = x then he receives one point. In this game, Bob may alsodeclare Alice’s input as being ambiguous or indistinguish-able, meaning that f : [ d ] → [ n ] ∪ { ? } with “?” denot-ing Bob’s declaration of the ambiguous input. However,whenever Bob declares “?” he only receives 1 / ( n − d + 1)points. Then, Eq. (7) says that whenever P ∈ C n → n (cid:48) d Bob’s average score is bounded by dn . Note, there is aone-to-one correspondence between each G n,n (cid:48) k,d and theparticular guessing function f that Bob performs. If y labels a guessing row of G n,n (cid:48) k,d , then f ( y ) = ˆ x , where ˆ x labels the only nonzero column of row y . On the otherhand, if y labels an ambiguous row, then f ( y ) = “?”.We deﬁne the ( k, d )- ambiguous polytope A n → n (cid:48) k,d as thecollection of all channels P ∈ P n → n (cid:48) satisfying Eq. (7)for every G n,n (cid:48) k,d . Naturally, C n → n (cid:48) d ⊂ A n → n (cid:48) k,d for all k ∈ [0 , n (cid:48) ], therefore, if P / ∈ A n → n (cid:48) k,d , then P / ∈ C n → n (cid:48) d . Basedon the discussion of the previous paragraph, it is easy todecide membership of A n → n (cid:48) k,d . Proposition 1.

A channel P ∈ P n → n (cid:48) belongs to A n → n (cid:48) k,d iﬀ max π ∈ S n (cid:48) k (cid:88) i =1 (cid:107) r π ( i ) (cid:107) ∞ + 1 n − d + 1 n (cid:48) (cid:88) i = k +1 (cid:107) r π ( i ) (cid:107) ≤ d, (8)where the maximization is taken over all permutationson [ n (cid:48) ], r i denotes the i th row of P , (cid:107) r i (cid:107) ∞ is the largestelement in r i , and (cid:107) r i (cid:107) is the row sum of r i .The maximization on the LHS of Eq. (8) can be per-formed eﬃciently using the following procedure. For eachrow r i we assign a pair ( a i , b i ) where a i = (cid:107) r i (cid:107) ∞ and b i = n − d +1 (cid:107) r i (cid:107) . Deﬁne δi = a i − b i , and relabel therows of P in non-increasing order of the δ i . Then accord-ing to this sorting, we have an ambiguous guessing gamescore of (cid:80) ki =1 a i + (cid:80) n (cid:48) i = k +1 b i , which we claim attains themaximum on the LHS of Eq. (8). Indeed, for any otherrow permutation π , the guessing game score is given by (cid:88) i ∈{ , ··· ,k } π ( i ) ∈{ , ··· ,k } a i + (cid:88) i ∈{ , ··· ,k } π ( i ) ∈{ k +1 , ··· ,n (cid:48) } b i + (cid:88) i ∈{ k +1 , ··· ,n (cid:48) } π ( i ) ∈{ , ··· ,k } a i + (cid:88) i ∈{ k +1 , ··· ,n (cid:48) } π ( i ) ∈{ k +1 , ··· ,n (cid:48) } b i . (9)Hence the diﬀerence in these two scores is (cid:88) i ∈{ , ··· ,k } π ( i ) ∈{ k +1 , ··· ,n (cid:48) } ( a i − b i ) − (cid:88) i ∈{ k +1 , ··· ,n (cid:48) } π ( i ) ∈{ , ··· ,k } ( a i − b i ) ≥ , (10)where the inequality follows from the fact that we haveordered the indices in non-increasing order of ( a i − b i ),and the number of terms in each summation is the samesince π is a bijection.A special case of the ambiguous guessing games ariseswhen k = n (cid:48) . Then up to a normalization factor n , weinterpret the LHS of Eq. (8) as the success probabilitywhen Bob performs maximum likelihood estimation ofAlice’s input value x given his outcome y (i.e. he choosesthe value x that maximizes P ( y | x )). We hence deﬁne M n → n (cid:48) d := A n → n (cid:48) n (cid:48) ,d as the maximum likelihood (ML) esti-mation polytope . Using Proposition 1 we see that P ∈ M n → n (cid:48) d ⇔ n (cid:48) (cid:88) y =1 max x ∈ [ n ] P ( y | x ) ≤ d. (11)An important question is whether the ambiguousguessing Bell inequalities of Eq. (7) are tight for a sig-naling polytope C n → n (cid:48) d . In general this will not be case.For instance, (cid:104) G n,n (cid:48) k,d , P (cid:105) ≤ d is trivially satisﬁed when-ever k = 0. Nevertheless, in many cases we can establishtightness of these inequalities. A demonstration of thefollowing facts is carried out in Appendix E. Proposition 2. (i) For min { n, n (cid:48) } > d > k = n (cid:48) , Eq. (7) isa tight Bell inequality of C n → n (cid:48) d iﬀ G n,n (cid:48) k,d can beobtained by performing input/output liftings androw/column permutations on an m × m identitymatrix I m , with min { n, n (cid:48) } ≥ m > d .(ii) For n (cid:48) > k ≥ n > d >

1, Eq. (7) is a tight Bellinequality of C n → n (cid:48) d iﬀ G n,n (cid:48) k,d can be obtained fromthe ( n + 1) × n matrix (cid:20) I n n − d +1 · · · n − d +1 · · · n − d +1 (cid:21) (12)by performing output liftings and row/column per-mutations.Note that the input/output liftings are used to manipu-late the identity matrix I m and the matrix of Eq. (12)into an n (cid:48) × n matrix G n,n (cid:48) k,d . The tight Bell inequali-ties described in Proposition 2(i) completely characterizethe ML polytope M n → n (cid:48) d . For this reason, we refer toany G n,n (cid:48) k,d satisfying the conditions of Proposition 2(i) asa maximum likelihood (ML) facet (see Appendix D 2).Likewise, we refer to any G n,n (cid:48) k,d satisfying the conditions of Proposition 2(ii) as an ambiguous guessing facet (seeAppendix D 3). B. Complete Sets of Bell Inequalities

In general, we are unable to identify the complete set oftight Bell inequalities that bound each signaling polytope C n → n (cid:48) d . However, we analytically solve the problem inspecial cases. Theorem 1.

Let n and n (cid:48) be arbitrary integers.(i) If d = n (cid:48) −

1, then C n → n (cid:48) d = M n → n (cid:48) d .(ii) If d = n −

1, then C n → n (cid:48) d = n (cid:48) (cid:92) k = n A n → n (cid:48) k,d .In other words, to decide whether a channel can be sim-ulated by an amount of classical messages strictly lessthan the input/output alphabets, it suﬃces to considerthe ambiguous guessing games. Moreover, by Eq. (8)it is simple to check if these conditions are satisﬁed fora given channel P . A proof of Theorem 1 is found inAppendix F.We also characterize the C n → signaling polytope. Asan application, this case can be used to understand theclassical simulation cost of performing Bell measurements(a) 2 ≥   (b) 2 ≥   (c) 3 ≥   (d) 5 ≥   (e) 4 ≥   (f) 4 ≥   (g) 4 ≥   (h) 4 ≥   FIG. 3. Generator facets for the C → signaling polytope.Each inequality is expressed as γ ≥ G where the inner prod-uct (cid:104) G , P (cid:105) is implied. (a) ML facet input/output lifted from C , . (b) ML facet output lifted from C → . (c) Anti-guessingfacet output lifted from C → . (d) k -guessing facet of C → .(e) Ambiguous guessing facet output lifted from C → . (f-h) Rescalings of the C → ambiguous guessing facet outputlifted to C → . General forms of these tight Bell inequalitiesare derived in Appendix D. on a two-qubit system, since this process induces a clas-sical channel with four outputs. Theorem 2.

For any integer n , a channel P ∈ P n → belongs to C n → iﬀ it satisﬁes the eight Bell inequalitiesdepicted in Fig. 3 and all their input/output permuta-tions.Remarkably, this result shows that no new facet classesfor C n → are found when n >

6. Consequently, to demon-strate that a channel P ∈ P n → requires more than onebit for simulation, it suﬃces to consider input sets ofsize no greater than six. For n <

6, the facet classes of C n → are given by the facets in Fig. 3 having (6 − n ) all-zero columns. We conjecture that in general, no morethan (cid:0) n (cid:48) d (cid:1) inputs are needed to certify that a channel P ∈ P n → n (cid:48) has a signaling dimension larger than d . Aproof of Theorem 2 is found in Appendix G. C. The Signaling Dimension of Replacer Channels

In the device-independent scenario, Alice and Bobmake minimal assumptions about the channel N A → B connecting them; they simply try to lower bound the di-mensions of N using input-output correlations P N ( y | x ).Applying the results of the previous section, if (cid:104) G , P (cid:105) ≤ γ is a Bell inequality for C n → n (cid:48) d andmax { ρ x } x , { Π y } y (cid:88) y,x G y,x Tr (cid:104) Π y N (cid:0) ρ x (cid:1)(cid:105) > γ, (13)then min { d A , d B } ≥ κ ( N ) > d . Eq. (13) describesa conic optimization problem that can be analyticallysolved only in special cases [41]. Hence deciding whethera given quantum channel can violate a particular Bellinequality is typically quite challenging.Despite this general diﬃculty, we nevertheless establishbounds for the signaling dimension of partial replacerchannels. A d -dimensional partial replacer channel hasthe form R µ ( X ) = µX + (1 − µ )Tr[ X ] σ, (14)where 1 ≥ µ ≥ σ is some ﬁxed density matrix. Thepartial depolarizing channel D µ corresponds to σ beingthe maximally mixed state whereas the partial erasurechannel E µ corresponds to σ being an erasure ﬂag | E (cid:105)(cid:104) E | with | E (cid:105) being orthogonal to {| (cid:105) , · · · , | d (cid:105)} . Theorem 3.

The signaling dimension of a partial re-placer channel is bounded by (cid:100) µd + 1 − µ (cid:101) ≤ κ ( R µ ) ≤ min { d, (cid:100) µd + 1 (cid:101)} . (15)Moreover, for the partial erasure channel, the upperbound is tight for all µ ∈ [0 , Proof.

We ﬁrst prove the upper bound in Eq. (15). Thetrivial bound κ ( R µ ) ≤ d was already observed in Eq. (4). To show that κ ( R µ ) ≤ (cid:100) µd + 1 (cid:101) , let { ρ x } x be anycollection of inputs and { Π y } y a POVM. Then P R µ ( y | x ) = µP ( y | x ) + (1 − µ ) S ( y ) , (16)where P ( y | x ) = Tr[Π y ρ x ] and S ( y ) = Tr[Π y σ ]. FromRef. [23], we know that P ( b | x ) can be decomposed likeEq. (2). Substituting this into Eq. (16) yields P R µ ( u | x ) = (cid:88) λ q ( λ ) d (cid:88) m =1 R λ ( m | x ) × [ µT λ ( y | m ) + (1 − µ ) S ( y )] . (17)For r = (cid:100) µd + 1 (cid:101) , let ν be a random variable uniformlydistributed over { (cid:0) dr − (cid:1) } , which is the collection of allsubsets of [ d ] having size r −

1. For a given λ , ν , andinput x , Alice performs the channel T λ . If m ∈ ν , Alicesends message m (cid:48) = m ; otherwise, Alice sends message m (cid:48) = 0. Upon receiving m (cid:48) , Bob does the following: if m (cid:48) (cid:54) = 0 he performs channel R λ with probability µdr − and samples from distribution S ( y ) with probability 1 − µdr − ; if m (cid:48) (cid:54) = 0 he samples from S ( y ) with probabilityone. Since P r { m ∈ ν } = r − d , this protocol faithfullysimulates P R µ . To establish the lower bound in Eq. (16),suppose that Alice sends orthogonal states {| (cid:105) , · · · , | d (cid:105)} and Bob measures in the same basis. Then d (cid:88) i =1 (cid:104) i |R µ ( | i (cid:105)(cid:104) i | ) | i (cid:105) = dµ + (1 − µ ) , (18)which will violate Eq. (7) for the ML polytope M d → dr whenever r < µd + (1 − µ ). Hence any zero-error simula-tion will require at least (cid:100) µd + 1 − µ (cid:101) classical messages.For the erasure channel, this lower bound can be tight-ened by considering the score for other ambiguous games,as detailed in Appendix H. DISCUSSION

In this work, we have presented the signaling dimen-sion of a channel as its classical simulation cost. In doingso, we have advanced a device-independent framework forcertifying the signaling dimension of a quantum channelas well as its input/output dimensions. While this workfocuses on communication systems, our framework alsoapplies to computation and memory tasks.The family of ambiguous guessing games includesthe maximum likelihood facets, which say that (cid:80) n (cid:48) y =1 max x ∈ [ n ] P ( y | x ) ≤ d for all P ∈ C n → n (cid:48) d . Since theresults of Frenkel and Weiner imply that P n → n (cid:48) N ⊂ C n → n (cid:48) d whenever d ≥ min { d A , d B } for channel N A → B [23], it fol-lows that max { ρ x } x ∈ [ n ] { Π y } y ∈ [ n ] n (cid:88) x =1 Tr (cid:104) Π x N (cid:0) ρ x (cid:1)(cid:105) ≤ d, (19)an observation also made in Ref. [27]. Despite the sim-plicity of this bound, in general it is too loose to certifythe input/output Hilbert space dimensions of a channel.For example, consider the 50 : 50 erasure channel E / acting on a d A = 3 system. It can be veriﬁed that P n → n (cid:48) E / ⊂ M n → n (cid:48) , i.e. (cid:80) x Tr (cid:2) Π x E / ( ρ x ) (cid:3) ≤ { ρ x } x and { Π y } y . Hence maximum likelihood estima-tion yields the lower bound κ ( E / ) ≥

2. On the otherhand, the classical channel P E / =  . . . . . .  (20)generated by orthonormal input states {| (cid:105) , | (cid:105) , | (cid:105)} and ameasurement in the orthonormal basis {| (cid:105) , | (cid:105) , | (cid:105) , | E (cid:105)} violates Eq. (8) for the A → , ambiguous polytope.Hence P E / / ∈ A → , and it follows that κ → ( E / ) ≥ Supporting Software

This work is supported by SignalingDimension.jl [32].This software package includes our signaling polytopecomputations, numerical facet veriﬁcation, and signal-ing dimension certiﬁcation examples. SignalingDimen-sion.jl is publicly available on Github and written in theJulia programming language [42]. The software is docu-mented, tested, and reproducible on a laptop computer.The interested reader should review the software docu-mentation as it elucidates many details of our work.

Acknowledgements

We thank Marius Junge for enlightening discussionsduring the preparation of this paper. We acknowledgeNSF Award [1] S. Wehner, D. Elkouss, and R. Hanson, Science (2018).[2] H. Buhrman, R. Cleve, S. Massar, and R. de Wolf, Rev.Mod. Phys. , 665 (2010).[3] J. I. de Vicente, Physical Review A , 012340 (2017).[4] A. Ambainis, D. Leung, L. Mancinska, and M. Ozols,arXiv preprint arXiv:0810.2937 (2008).[5] A. Tavakoli, A. Hameedi, B. Marques, and M. Bouren-nane, Physical Review Letters , 10.1103/phys-revlett.114.170502 (2015).[6] H.-W. Li, Z.-Q. Yin, Y.-C. Wu, X.-B. Zou, S. Wang,W. Chen, G.-C. Guo, and Z.-F. Han, Phys. Rev. A ,034301 (2011).[7] A. Tavakoli, J. m. k. Kaniewski, T. V´ertesi, D. Rosset,and N. Brunner, Phys. Rev. A , 062307 (2018).[8] M. Paw(cid:32)lowski and N. Brunner, Physical Review A ,010302 (2011).[9] N. Brunner, S. Pironio, A. Acin, N. Gisin, A. A. M´ethot,and V. Scarani, Physical review letters , 210503(2008).[10] M. Hendrych, R. Gallego, M. Miˇcuda, N. Brunner,A. Ac´ın, and J. P. Torres, Nature Physics , 588 (2012).[11] A. S. Holevo, Problemy Peredachi Informatsii , 3 (1973).[12] J. K¨orner and A. Orlitsky, IEEE Transactions on Infor-mation Theory , 2207 (1998). [13] R. Duan, Super-activation of zero-error capacity of noisyquantum channels (2009), arXiv:0906.2527.[14] T. S. Cubitt, J. Chen, and A. W. Harrow, IEEE Trans-actions on Information Theory , 8114 (2011).[15] T. S. Cubitt, D. Leung, W. Matthews, and A. Win-ter, IEEE Transactions on Information Theory , 5509(2011).[16] R. Duan and A. Winter, IEEE Transactions on Informa-tion Theory , 891 (2016).[17] I. Devetak and A. Winter, IEEE Transactions on Infor-mation Theory , 3183 (2004).[18] I. Devetak, A. W. Harrow, and A. J. Winter, IEEE Trans-actions on Information Theory , 4587 (2008).[19] C. Bennett, P. Shor, J. Smolin, and A. Thapliyal, IEEETransactions on Information Theory , 2637 (2002).[20] X. Wang and M. M. Wilde, Phys. Rev. Lett. , 040502(2020).[21] K. Fang, X. Wang, M. Tomamichel, and M. Berta, IEEETransactions on Information Theory , 2129 (2020).[22] M. Dall’Arno, S. Brandsen, A. Tosini, F. Buscemi, andV. Vedral, Phys. Rev. Lett. , 020401 (2017).[23] P. E. Frenkel and M. Weiner, Communications in Math-ematical Physics , 563 (2015).[24] R. Gallego, N. Brunner, C. Hadley, and A. Ac´ın, Phys.Rev. Lett. , 230501 (2010). [25] M. Dall’Arno, E. Passaro, R. Gallego, and A. Ac´ın, Phys.Rev. A , 042312 (2012).[26] J. Ahrens, P. Badzia ‘ g, A. Cabello, and M. Bourennane,Nature Physics , 592 (2012).[27] N. Brunner, M. Navascu´es, and T. V´ertesi, Phys. Rev.Lett. , 150501 (2013).[28] M. Dall’Arno, S. Brandsen, F. Buscemi, and V. Vedral,Phys. Rev. Lett. , 250501 (2017).[29] M. Dall’Arno, S. Brandsen, and F. Buscemi, Proceed-ings of the Royal Society A: Mathematical, Physical andEngineering Sciences , 20160721 (2017).[30] T. Christof and G. Reinelt, International Journal of Com-putational Geometry & Applications , 423 (2001).[31] T. Heinosaari and O. Kerppo, Journal of Physics A:Mathematical and Theoretical , 395301 (2019).[32] B. Doolittle, SignalingDimension.jl, https://github.com/ChitambarLab/SignalingDimension.jl (2020). [33] G. Ziegler, Lectures on Polytopes , Graduate Texts inMathematics (Springer New York, 2012).[34] T. Heinosaari, O. Kerppo, and L. Lepp¨aj¨arvi, Journalof Physics A: Mathematical and Theoretical , 435302(2020).[35] A. Barvinok, A Course in Convexity (American Mathe-matical Society, 2002).[36] E. Davies, Information Theory, IEEE Transactions on , 596 (1978).[37] S. Pironio, Journal of Mathematical Physics , 062112(2005).[38] D. Rosset, J.-D. Bancal, and N. Gisin, Journal of PhysicsA: Mathematical and Theoretical , 424022 (2014).[39] T. Christof and A. L¨obel, Porta, http://porta.zib.de/ (1997).[40] B. Doolittle and B. Legat, XPORTA.jl, https://github.com/JuliaPolyhedra/XPORTA.jl (2020).[41] Manuscript in preparation (2021).[42] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah,SIAM Review , 65 (2017). Appendix A: Notation Glossary

Notation Terminology Deﬁnition P n → n (cid:48) Set of Classical Channels The subset of R n (cid:48) × n containing column stochastic matrices. P Classical Channel An element of P n → n (cid:48) that represents a classical channel with n inputs and n (cid:48) outputs. N Quantum Channel A completely positive trace-preserving map. P n → n (cid:48) N Set of Classical ChannelsGenerated from N The subset of P n → n (cid:48) which decomposes as Eq. (1) for somequantum channel N . C n → n (cid:48) d Signaling Polytope The subset of P n → n (cid:48) containing channels that decomposes asEq. (2) (see Def. 1).( G , γ ) Linear Bell Inequality A tuple describing the linear inequality (cid:104) G , P (cid:105) ≤ γ where G ∈ R n (cid:48) × n , γ ∈ R , and P ∈ P n → n (cid:48) . κ n → n (cid:48) ( N ) The n → n (cid:48) SignalingDimension of N The smallest integer d such that P n → n (cid:48) N ⊂ C n → n (cid:48) d (see Def. 2). κ ( N ) The Signaling Dimension of N The smallest integer d such that P n → n (cid:48) N ⊂ C n → n (cid:48) d for all positiveintegers n and n (cid:48) (see Def. 2).( G n,n (cid:48) k,d , d ) Ambiguous Guessing Game A signaling polytope Bell inequality where G n,n (cid:48) k,d ∈ R n (cid:48) × n has k rows that are row stochastic with 0/1 elements and ( n (cid:48) − k )rows with each column containing 1 / ( n − d + 1). A n → n (cid:48) k,d Ambiguous Polytope The subset of P n → n (cid:48) which is tightly bound by inequalities ofthe form ( G n,n (cid:48) k,d , d ). M n → n (cid:48) d Maximum LikelihoodEstimation Polytope The subset of P n → n (cid:48) deﬁned as the ambiguous polytope A n → n (cid:48) k,d where k = n (cid:48) . R µ Partial Replacer Channel A quantum channel that replaces the input state ρ x withquantum state σ with probability (1 − µ ). E µ Partial Erasure Channel A partial replacer channel that replaces the input with σ = | E (cid:105)(cid:104) E | where | E (cid:105) is orthogonal to the input Hilbert space. V n → n (cid:48) d Signaling Polytope Vertices The subset of C n → n (cid:48) d containing classical channels with 0/1elements. F n → n (cid:48) d Signaling Polytope Facets The complete set of tight Bell inequalities for C n → n (cid:48) d . G n → n (cid:48) d Signaling Polytope GeneratorFacets The subset of F n → n (cid:48) d containing a representative of each facetclass in F n → n (cid:48) d (see Appendix B 5).( G n (cid:48) ,k K , γ n (cid:48) ,k,d K ) k -Guessing Facet Tight Bell inequality for signaling polytopes (see Appendix D 1).( G n (cid:48) ML , d ) Maximum Likelihood Facet Tight Bell inequality for signaling polytopes (see Appendix D 2).( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) Ambiguous Guessing Facet Tight Bell inequality for signaling polytopes (see Appendix D 3).( G ε,m (cid:48) A , γ ε,d A ) Anti-Guessing Facet Tight Bell inequality for signaling polytopes (see Appendix D 4). TABLE I. Notation used throughout this work. Appendix B: Signaling Polytope Structure

In this section we provide details about the structure of signaling polytopes (see Deﬁnition 1). The signalingpolytope, denoted by C n → n (cid:48) d , is a subset of P n → n (cid:48) . Therefore, a channel P ∈ C n → n (cid:48) d has matrix elements P ( y | x )subject to the constraints of non-negativity P ( y | x ) ≥ (cid:80) y ∈ [ n (cid:48) ] P ( y | x ) = 1 for all y ∈ [ n (cid:48) ] and x ∈ [ n ]. Furthermore, since channels P ∈ C n → n (cid:48) d are permitted the use of shared randomness, the set C n → n (cid:48) d is convex.In the two extremes of communication, the signaling polytope admits a simple structure. For maximum communi-cation, d = min { n, n (cid:48) } , any channel P ∈ P n → n (cid:48) can be realized, hence C n → n (cid:48) min { n,n (cid:48) } = P n → n (cid:48) . For no communication, d = 1, Bob’s output y is independent from Alice’s input x meaning that P ( y | x ) = P ( y | x (cid:48) ) for any choice of x, x (cid:48) ∈ [ n ]and y ∈ [ n (cid:48) ]. This added constraint simpliﬁes the signaling polytope C n → n (cid:48) to P → n (cid:48) which is formally an n (cid:48) -simplex[33]. For all other cases, min { n, n (cid:48) } > d >

1, the signaling polytope C n → n (cid:48) d takes on a more complicated structure.

1. Vertices

The vertices of the signaling polytope are denoted by V n → n (cid:48) d . Signaling polytopes are convex and therefore describedas the convex hull of their vertices, C n → n (cid:48) d = conv( V n → n (cid:48) d ). As noted in the main text, a vertex V ∈ V n → n (cid:48) d is an n (cid:48) × n column stochastic matrices with 0/1 elements and rank rank( V ) ≤ d . For instance, V =   (B1)is a vertex V ∈ V → . Naturally, each vertex V ∈ V n → n (cid:48) d has no more than d nonzero rows. A straightforwardcounting argument shows that V n → n (cid:48) d contains (cid:80) dc =1 (cid:8) nc (cid:9)(cid:0) n (cid:48) c (cid:1) c ! vertices (see Supplemental Material of Ref. [22]),where (cid:8) nc (cid:9) denotes Stirling’s number of the second kind and (cid:0) n (cid:48) c (cid:1) a binomial coeﬃcient. An important observation isthat number of vertices in V n → n (cid:48) d grows exponentially in the number of inputs, n , and factorially in the number ofoutputs, n (cid:48) . The large number of vertices represents a key challenge in characterizing the signaling polytope.

2. Polytope Dimension

The dimension of the signaling polytope dim( C n → n (cid:48) d ) ≤ dim( P n → n (cid:48) ) = n ( n (cid:48) − C n → n (cid:48) d ⊆ P n → n (cid:48) and any P ∈ P n → n (cid:48) must satisfy n normalization constraints, one for each column of P . Naively, P n → n (cid:48) ⊂ R n (cid:48) × n where dim( R n (cid:48) × n ) = n ( n (cid:48) ), however, the n normalization constraints restrict P n → n (cid:48) todim( P n → n (cid:48) ) = n ( n (cid:48) − C n → n (cid:48) d with greater precision, the number of aﬃnely independentvertices in V n → n (cid:48) d can be counted where dim( C n → n (cid:48) d ) is one less than the number of aﬃnely independent vertices. When d ≥

2, one can count n ( n (cid:48) −

1) + 1 aﬃnely independent vertices in V n → n (cid:48) d , therefore, dim( C n → n (cid:48) d ) = n ( n (cid:48) − d = 1, each of the n (cid:48) vertices are aﬃnely independent and dim( C n → n (cid:48) ) = n (cid:48) −

1. This result is notsurprising because, as noted before, C n → n (cid:48) = P → n (cid:48) and dim( P → n (cid:48) ) = n (cid:48) −

3. Facets

A linear Bell inequality is represented as a tuple ( G , γ ) with G ∈ R n (cid:48) × n and γ ∈ R where the inequality (cid:104) G , P (cid:105) = (cid:80) x,y G y,x P ( y | x ) ≤ γ is formed by the Euclidean inner product with a channel P ∈ P n → n (cid:48) . For convenience, weidentify two polyhedra of channels C ( G , γ ) := { P ∈ P n → n (cid:48) | (cid:104) G , P (cid:105) ≤ γ } , (B2) F ( G , γ ) := { P ∈ P n → n (cid:48) | (cid:104) G , P (cid:105) = γ } . (B3) Lemma 1.

An inequality ( G , γ ) is a tight Bell inequality of the C n → n (cid:48) d signaling polytope iﬀ C n → n (cid:48) d ⊂ C ( G , γ );2. dim (cid:16) C n → n (cid:48) d ∩ F ( G , γ ) (cid:17) = dim (cid:16) C n → n (cid:48) d (cid:17) − G , γ ) contains all channels P ∈ C n → n (cid:48) d while Condition 2 requires thatinequality ( G , γ ) is both a proper half-space and a facet of C n → n (cid:48) d . Tight Bell inequalities and facets are closelyrelated and described by the same inequality ( G , γ ). The key diﬀerence is that a tight Bell inequality is a half-spaceinequality (cid:104) G , P (cid:105) ≤ γ whereas a facet is the polytope C n → n (cid:48) d ∩F ( G , γ ). The complete set of signaling polytope facets isdenoted by F n → n (cid:48) d and the signaling polytope is simply the intersection of all tight Bell inequalities ( G m , γ m ) ∈ F n → n (cid:48) d , C n → n (cid:48) d = r (cid:92) m =1 C ( G m , γ m ) , (B4)The number of facet inequalities r is typically larger than the set of vertices V n → n (cid:48) d presenting another challenge inthe characterization of signaling polytopes. Remark.

A given Bell inequality ( G , γ ) ∈ F n → n (cid:48) d does not have a unique form. Therefore, it is convenient to establisha normal form for a given facet inequality [30]. First, observe that multiplying an inequality ( G , γ ) by a scalar a ∈ R does not change the inequality, that is, C ( G , γ ) = C ( a G , a ( γ )). Second, observe that the vertices in V n → n (cid:48) d have 0/1elements and the rational arithmetic in Fourier-Motzkin elimination [33, 39] results in the matrix coeﬃcients of G being rational. Therefore, there exists a rational scalar a such that aG y,x and aγ are integers for all x ∈ [ n ] and y ∈ [ n (cid:48) ]. Third, observe that the normalization and non-negativity constraints for channels P ∈ P n → n (cid:48) allows theequivalence between the following two inequalities γ ≥ (cid:104) G , P (cid:105) ⇐⇒ γ + 1 ≥ (cid:104) G , P (cid:105) + (cid:88) y ∈ [ n (cid:48) ] G y,x (cid:48) P ( y | x (cid:48) ) (B5)for any x (cid:48) ∈ [ n ]. Therefore, it is always possible to ﬁnd a form of inequality ( G , γ ) where G y,x ≥ y ∈ [ n (cid:48) ] and x ∈ [ n ]. Hence we deﬁne a normal form for any tight Bell inequality ( G , γ ) ∈ F n → n (cid:48) d : • Inequality ( G , γ ) is scaled such that γ and all G y,x are integers with a greatest common factor of 1. • Normalization constraints are added or subtracted from all columns using Eq. (B5) such that G y,x ≥ G is zero.

4. Permutation Symmetry

The input and output values x and y are merely labels for a channel P ∈ P n → n (cid:48) , therefore, swapping labels x ↔ x (cid:48) and y ↔ y (cid:48) where x, x (cid:48) ∈ [ n ] and y, y (cid:48) ∈ [ n (cid:48) ] does not aﬀect P n → n (cid:48) [38]. The relabeling operation is implementedusing elements from the set of doubly stochastic k × k permutation matrices S k . For example, P (cid:48) = π Y P π X , where P , P (cid:48) ∈ P n → n (cid:48) , (B6) π X ∈ S n , and π Y ∈ S n (cid:48) . Note that permuting the rows or columns of a matrix cannot change the rank of a matrix,therefore, if V ∈ V n → n (cid:48) d and V (cid:48) = π Y V π X , then V (cid:48) ∈ V n → n (cid:48) d . It follows that this permutation symmetry holds forany channel in the signaling polytope, P , P (cid:48) ∈ C n → n (cid:48) d where P (cid:48) is a permutation of P . Likewise, a facet inequality( G , γ ) ∈ F n → n (cid:48) d can be permuted into a new facet inequality ( G (cid:48) , γ ) ∈ F n → n (cid:48) d where G (cid:48) = π Y G π X .

5. Generator Facets

Permutation symmetry motivates the notion of a facet class deﬁned as a collection of facet inequalities formed bytaking all permutations of a canonical facet ( G (cid:63) , γ ) ∈ F n → n (cid:48) d which we refer to as a generator facet . The canonicalfacet is arbitrary thus we deﬁne the generator facet as the lexicographic normal form [30, 38] of the facet class. The2set of generator facets, denoted by G n → n (cid:48) d := { ( G (cid:63)i , γ i ) } r (cid:48) i =1 , is the subset of F n → n (cid:48) d containing the generator facet ofeach facet class bounding C n → n (cid:48) d . Since the number of input and output permutations scale as factorials of n and n (cid:48) respectively, the set of generator facets is considerably smaller than F n → n (cid:48) d and therefore, provides a convenientsimpliﬁcation to F n → n (cid:48) d . To recover the complete set of facets from G n → n (cid:48) d , we take all row and column permutationsof each generator facet ( G (cid:63) , γ ) ∈ G n → n (cid:48) d . As a ﬁnal remark, we note that V n → n (cid:48) d can also be reduced to a set ofgenerator vertices, however, this set is not required for our current discussion of signaling polytopes. Appendix C: Adjacency Decomposition

This section provides an overview of the adjaceny decomposition technique [30]. In our work, we use an adjacencydecomposition algorithm to compute the generator facets of the signaling polytope. Our implementation can be foundin our supporting software [32]. The adjacency decomposition provides a few key advantages in the computation ofBell inequalities:1. The algorithm stores only the generator facets G n → n (cid:48) d instead of the complete set of facets F n → n (cid:48) d . This consid-erably reduces the required memory.2. New generator facets are derived in each iteration of the computation, hence, the algorithm does not need torun to completion to provide value.3. The algorithm can be widely parallelized [30].

1. Adjacency Decomposition Algorithm

The adjacency decomposition is an iterative algorithm which requires as input the signaling polytope vertices V n → n (cid:48) d and a seed generator facet ( G (cid:63) seed , γ seed ) ∈ G n → n (cid:48) d . The algorithm maintains a list of generator facets G list where eachfacet ( G (cid:63) , γ ) ∈ G list is marked either as considered or unconsidered . The generator facet is deﬁned as the lexicographicnormal form of the facet class [30, 38]. Before the algorithm begins, ( G (cid:63) seed , γ seed ) is added to G list and marked as unconsidered . In each iteration, the algorithm proceeds as follows [30]:1. An unconsidered generator facet ( G (cid:63) , γ ) ∈ G list is selected.2. All facets adjacent to ( G (cid:63) , γ ) are computed.3. Each adjacent facet is converted into its lexicographic normal form.4. Any new generator facets identiﬁed are marked as unconsidered and added to G list .5. Facet ( G (cid:63) , γ ) is marked as considered .The procedure repeats until all facets in G list are marked as considered . If run to completion, then G list = G n → n (cid:48) d andall generator facets of the signaling polytope C n → n (cid:48) d are identiﬁed. The algorithm is guaranteed to ﬁnd all generatorfacets due to the permutation symmetry of the signaling polytope. By this symmetry, any representative of a givenfacet class has the same ﬁxed set of facet classes adjacent to it. For the permutation symmetry to hold for all facetsin the signaling polytope, there cannot be two disjoint sets of generator facets where the members of one set do notlie adjacent to the members of the other.The inputs of the adjacency decomposition are easy to produce computationally. A seed facet can always beconstructed using the lifting rules for signaling polytopes (see Fig. 2) and the signaling polytope vertices V n → n (cid:48) d canbe easily computed (see supporting software [32]). Note, however, that the exponential growth of V n → n (cid:48) d eventuallyhinders the performance of the adjacency decomposition algorithm.

2. Facet Adjacency

A key step in the adjacency decomposition algorithm is to compute the set of facets adjacent to a given facet ( G , γ ).In this section, we deﬁne facet adjacency and outline the method used to compute the adjacent facets. Lemma 2.

Two facets ( G , γ ) , ( G , γ ) ∈ F n → n (cid:48) d are adjacent iﬀ they share a ridge H deﬁned as:31. H := F ( G , γ ) ∩ F ( G , γ ) ∩ C n → n (cid:48) d ,2. where dim( H ) = dim( C n → n (cid:48) d ) − C n → n (cid:48) d ∩ F ( G , γ ). Therefore, to compute the ridgesof a given facet ( G , γ ) ∈ F n → n (cid:48) d we take the typical approach for computing facets. Namely, the set of vertices { V ∈ V n → n (cid:48) d | (cid:104) G , V (cid:105) = γ } is constructed and PORTA [39, 40] is used to compute the ridges of ( G , γ ). A facetadjacent to ( G , γ ) is computed from each ridge using a rotation algorithm described by Christof and Reinelt [30].Given the signaling polytope vertices V n → n (cid:48) d , this procedure computes the complete set of facets adjacent to ( G , γ ). Appendix D: Tight Bell Inequalities

In this section we discuss the general forms for each of the signaling polytope facets in Fig. 3. Each facet class isdescribed by a generator facet (see Appendix B 4) where all permutations and input/output liftings of these generatorfacets are also tight Bell inequalities. To prove that an inequality ( G , γ ) is a facet of C n → n (cid:48) d , both conditions ofLemma 1 must hold. The proofs contained by this section verify Condition 2 of Lemma 1 by constructing a set ofdim( C n → n (cid:48) d ) = n ( n (cid:48) −

1) aﬃnely independent { V ∈ V n → n (cid:48) d | (cid:104) G , V (cid:105) = γ } . These enumerations are veriﬁed numericallyin our supporting software [32]. To assist with the enumeration of aﬃnely independent vertices, we introduce a simpleconstruction for aﬃnely independent vectors with 0/1 elements. Lemma 3.

Consider an n -element binary vector (cid:126)b k ∈ { , } n with n null elements and n unit elements where n + n = n . A set of n aﬃnely independent vectors { (cid:126)b k } nk =1 is constructed as follows: • Let (cid:126)b be the binary vector where the ﬁrst n elements are null and the next n elements are unit values. • For k ∈ [2 , n + 1], (cid:126)b k is derived from (cid:126)b by swapping the unit element at index ( n + 1) with the null elementat index ( k − • For k ∈ [ n + 2 , n ], (cid:126)b k is derived from (cid:126)b by swapping the null element at index n with the unit element atindex k .For example, when n = 5, n = 2, and n = 3 the enumeration yields (cid:110) (cid:126)b = [0 , , , , , (cid:126)b = [1 , , , , , (cid:126)b = [0 , , , , , (cid:126)b = [0 , , , , , (cid:126)b = [0 , , , , (cid:111) . (D1) Proof.

To verify the aﬃne independence of { (cid:126)b } nk =1 it is suﬃcient to show the linear independence of { (cid:126)b − (cid:126)b k } nk =2 .Note that each ( (cid:126)b − (cid:126)b k ) has two nonzero elements, one of which occurs at an index that is zero for all ( (cid:126)b − (cid:126)b k (cid:48) ) where k (cid:54) = k (cid:48) . Therefore, the vectors in { (cid:126)b − (cid:126)b k } nk =2 are linearly independent and { (cid:126)b k } nk =1 is aﬃnely independent. k -Guessing Facets Consider a guessing game with k correct answers out of n (cid:48) possible answers. In this game, Alice has n = (cid:0) n (cid:48) k (cid:1) inputswhere each value x corresponds to a unique set of k correct answers. Given an input x ∈ [ n ], Alice signals to Bobusing a message m ∈ [ d ] and Bob makes a guess y ∈ [ n (cid:48) ]. A correct guess scores 1 point while an incorrect guess scores0 points. This type of guessing game is described by Heinosaari et al. [31, 34] and used to test the communicationperformance of a particular theory. In this work, we treat this k -guessing game as a Bell inequality ( G n (cid:48) ,k K , γ n (cid:48) ,k,d K ) ofthe signaling polytope C n → n (cid:48) d where γ n (cid:48) ,k,d K = (cid:18) n (cid:48) k (cid:19) − (cid:18) n (cid:48) − dk (cid:19) (D2)and G n (cid:48) ,k K ∈ R n (cid:48) × ( n (cid:48) k ) is a matrix with each column containing a unique distribution of k unit elements and ( n (cid:48) − k )null elements. For example,4 G , =   . (D3)This general Bell inequality for signaling polytopes was identiﬁed by Frenkel and Weiner [23], who showed that givena channel P ∈ C n → n (cid:48) d , the bounds of this inequality are (cid:18) n (cid:48) k (cid:19) − (cid:18) n (cid:48) − dk (cid:19) ≥ (cid:104) G n (cid:48) ,k K , P (cid:105) ≥ (cid:18) n (cid:48) − dn (cid:48) − k (cid:19) . (D4)However, we only focus on the upper bound γ n (cid:48) ,k,d K . We now show conditions for which ( G n (cid:48) ,k K , γ n (cid:48) ,k,d K ) ∈ F n → n (cid:48) d . Proposition 3.

The inequality ( G n (cid:48) ,k K , γ n (cid:48) ,k,d K ) is a facet of C n → n (cid:48) d with n = (cid:0) n (cid:48) k (cid:1) , n (cid:48) − ≥ k ≥

1, and d = n (cid:48) − k . Proof.

To prove that ( G n (cid:48) ,k K , γ n (cid:48) ,k,d K ) is a facet of C n → n (cid:48) d we construct a set of dim( C n → n (cid:48) d ) = ( n (cid:48) − (cid:0) n (cid:48) k (cid:1) aﬃnelyindependent vertices { V ∈ V n → n (cid:48) d | γ n (cid:48) ,k,d K = (cid:104) G n (cid:48) ,k K , V (cid:105)} . Observe that separating the ﬁrst row from the rest of G n (cid:48) ,k K results in a block matrix of form, G n (cid:48) ,k K = (cid:34) (cid:126) (cid:126) G ( n (cid:48) − , ( k − G ( n (cid:48) − ,k K (cid:35) , e.g. G , =   = (cid:20) (cid:126) (cid:126) G , G , (cid:21) , (D5)where (cid:126) (cid:126) G ( n (cid:48) − , ( k − and G ( n (cid:48) − ,k K as left and right k -guessing blocks respectively. The left and right k -guessing blocks suggest a recursive approach to our construction ofaﬃnely independent vertices. Namely, we construct (cid:0) n (cid:48) k (cid:1) vertices by targeting the ﬁrst row of G n (cid:48) ,k K while Proposition3 is recursively applied to enumerate the remaining vertices using the left and right k -guessing blocks. The recursionrequires two base cases to be addressed:1. When d = 2 and n (cid:48) = k + d , the construction of aﬃnely independent vertices is described in Proposition 4.2. When k = 1, the construction of aﬃnely independent vertices is described in Proposition 5.An iteration of this recursive construction proceeds as follows.First, we construct an aﬃnely independent vertex for each of the (cid:0) n (cid:48) k (cid:1) elements in the ﬁrst row of G n (cid:48) ,k K . For eachindex x (cid:48) in the (cid:126) V is constructed by setting all V (1 | x ) = 1 where x (cid:54) = x (cid:48) and V ( y | x (cid:48) ) = 1 where y > G y,x (cid:48) = 1. The remaining rows of V are ﬁlled to maximize the right k -guessing block. Then, for each index x (cid:48) in the (cid:126) V is constructed by setting V (1 | x (cid:48) ) = 1 and all V (1 | x ) = 1 where G ,x = 1. The remaining ( d −

1) rows of V are ﬁlled to maximize the right k -guessing block. Thisprocedure enumerates (cid:0) n (cid:48) k (cid:1) aﬃnely independent vertices.Then, the remaining ( n (cid:48) − (cid:0) n (cid:48) k (cid:1) vertices are found by individually targeting the left and right k -guessing blocks.To construct a vertex V L using the left block G ( n (cid:48) − , ( k − , the ﬁrst row of V L is not used. The left block is thena ( k − n (cid:48) −

1) outputs where d = ( n (cid:48) − − ( k −

1) = n (cid:48) − k , hence, Proposition 3 holdsand ( n (cid:48) − (cid:0) n (cid:48) − k − (cid:1) aﬃnely independent vertices are enumerated using the described recursive process. Note that foreach vertex of form V L , the remaining elements are ﬁlled to maximize the right k -guessing block G ( n (cid:48) − ,k K . Similarly,to construct a vertex V R using the right block G ( n (cid:48) − ,k K , we set all elements V R (1 | x ) = 1 where G n (cid:48) ,k ,x = 1. Theremaining ( d −

1) rows of V R are ﬁlled by optimizing the G ( n (cid:48) − ,k K block. Since d = n (cid:48) − k and ( d −

1) = ( n (cid:48) − − k ,5Proposition 3 holds, and recursively applying this procedure constructs ( n (cid:48) − (cid:0) n (cid:48) − k (cid:1) vertices of form V R using theright k -guessing block.Finally, vertices of forms V , V , V L and V R are easily veriﬁed to be aﬃnely independent. Summing these verticesyields ( n (cid:48) − (cid:0) n (cid:48) − k − (cid:1) + ( n (cid:48) − (cid:0) n (cid:48) − k (cid:1) + (cid:0) n (cid:48) k (cid:1) = ( n (cid:48) − (cid:0) n (cid:48) k (cid:1) aﬃnely independent vertices, therefore, the k -guessing Bellinequality is proven to be tight when n (cid:48) = k + d . Proposition 4.

The k -guessing game Bell inequality ( G n (cid:48) ,k K , γ n (cid:48) ,k,d K ) is a tight Bell inequality of all signaling polytopes C n → n (cid:48) d with n = (cid:0) n (cid:48) k (cid:1) , d = 2, and k = n (cid:48) − Proof.

To prove the tightness we construct a set containing ( n (cid:48) − (cid:0) n (cid:48) k (cid:1) aﬃnely independent vertices { V ∈V n → n (cid:48) | (cid:104) G n (cid:48) , ( n (cid:48) − , V (cid:105) = (cid:0) n (cid:48) n (cid:48) − (cid:1) − } . To help illustrate this proof, we use the example of ( G , , γ , , ) where G , =   . (D6)and γ , , = (cid:0) (cid:1) −

1. Since d = 2, we consider vertices V ∈ V n → n (cid:48) d with rank( V ) = 2 where each vertex uses tworows y and y (cid:48) where y < y (cid:48) . In general, each of the (cid:0) n (cid:48) (cid:1) two-row selections from G n (cid:48) , ( n (cid:48) − have a unique column x containing null elements both rows y and y (cid:48) . Therefore, for each unique pair y and y (cid:48) , two aﬃnely independentvertices V and V are constructed by setting V ( y | x ) = 1 and V ( y (cid:48) | x ) = 1 while the remaining terms are arrangedsuch that all unit elements in row y and the remaining elements in row y (cid:48) are selected to achieve the optimal score.Performing this procedure for the ﬁrst two rows of G , ( y = 1 and y (cid:48) = 2) constructs the vertices V =   , V =   (D7)where x = 10 in this example. Repeating this procedure for each of the (cid:0) n (cid:48) (cid:1) row selections produces, 2 (cid:0) n (cid:48) (cid:1) = 2 (cid:0) n (cid:48) k (cid:1) aﬃnely independent vertices, one for each null element in G n (cid:48) , ( n (cid:48) − .The remaining vertices are constructed by selecting a target row y ∈ [ n (cid:48) − x (cid:48) where G y,x (cid:48) = 1 a vertex V is constructed by setting V ( y | x ) = 1 for all x (cid:54) = x (cid:48) that satisfy G y,x = 1. A secondary row y (cid:48) > y of V is chosen where y (cid:48) is the smallest index satisfying G y (cid:48) ,x (cid:48) = 1. We then set V ( y (cid:48) | x (cid:48) ) = 1 while theremaining elements of V are set to achieve the optimal score. For selected rows y and y (cid:48) , the null column at index x is set in the target row as V ( y | x ) = 1. For example, consider G , K with the target row as y = 1 and x (cid:48) = 4 weconstruct the vertex, V =   (D8)Note that all secondary row indices y (cid:48) ≤ y + 3 are required to construct a vertex V for each unit element in thetarget row y . Let ∆ y = y (cid:48) − y , then (cid:80) y =1 (cid:0) n (cid:48) − − ∆ yd +1 − ∆ y (cid:1) vertices are constructed for target row y . For y = n (cid:48) − y = n (cid:48) −

1, the sum terminates at ∆ y = 2 and ∆ y = 1 respectively because the vertices are only aﬃnely independentif the secondary row has index y (cid:48) > y . Thus, this process produces (cid:88) ∆ y =1 ( n (cid:48) − ∆ y ) (cid:18) n (cid:48) − − ∆ yd + 1 − ∆ y (cid:19) = ( n (cid:48) − (cid:18) n (cid:48) d (cid:19) (D9)6aﬃnely independent vertices where the identities lm (cid:0) lm (cid:1) = (cid:0) l − m − (cid:1) and l +1 − mm (cid:0) lm (cid:1) = (cid:0) lm − (cid:1) are used to convert thebinomial coeﬃcients to the form (cid:0) n (cid:48) d (cid:1) = (cid:0) n (cid:48) k (cid:1) . Combining the vertices of form V , V , and V yields a set of 2 (cid:0) n (cid:48) k (cid:1) +( n (cid:48) − (cid:0) n (cid:48) k (cid:1) = ( n (cid:48) − (cid:0) n (cid:48) k (cid:1) aﬃnely independent vertices. Therefore, when d = 2 and k = n (cid:48) −

2, ( G n (cid:48) , ( n (cid:48) − , (cid:0) n (cid:48) n (cid:48) − (cid:1) − C ( n (cid:48) k ) → n (cid:48) signaling polytope.

2. Maximum Likelihood Facets

In this section, we discuss the conditions for which maximum likelihood games (see main text) are tight Bellinequalities. The maximum likelihood Bell inequality ( G n (cid:48) ML , d ) is a ( k = 1)-guessing game where G n (cid:48) ML = G n (cid:48) , . Forsimplicity, this section considers unlifted forms of G n (cid:48) ML is a n (cid:48) × n (cid:48) doubly stochastic matrix with 0/1 elements suchas the n (cid:48) × n (cid:48) identity matrix. For any vertex V ∈ V n → n (cid:48) d , (cid:104) G n (cid:48) ML , V (cid:105) ≤ d, (D10)is satisﬁed because rank( V ) ≤ d and G n (cid:48) ML is doubly stochastic. By the convexity of C n → n (cid:48) d , inequality (D10) musthold for all P ∈ C n → n (cid:48) d . We now discuss the conditions for which ( G n (cid:48) ML , d ) is a tight Bell inequality. Proposition 5.

The maximum likelihood (ML) Bell inequality ( G n (cid:48) ML , d ) is a facet of all signaling polytopes C n → n (cid:48) d with n = n (cid:48) and n (cid:48) > d > Proof.

To prove that ( G n (cid:48) ML , d ) is a tight Bell inequality of C n → n (cid:48) d we construct a set of dim( C n (cid:48) → n (cid:48) d ) = n (cid:48) ( n (cid:48) − { V ∈ V n → n (cid:48) d | (cid:104) G n (cid:48) ML , P (cid:105) = d } . Taking G n (cid:48) ML to be the n (cid:48) × n (cid:48) identity matrix, a vertex V satisﬁes d = (cid:104) G n (cid:48) ML , V (cid:105) when d unit elements of V lie along the diagonal. In this case, ( n (cid:48) − d ) unit elements of V can be freely distributed in the remaining columns of the d selected rows. For simplicity, we place all free elements ina single row with index y ∈ [ n (cid:48) ] which we refer to as the target row. In the target row, we set V ( y | y ) = 1 while theoﬀ-diagonals, V ( y | x (cid:54) = y ) with x ∈ [ n (cid:48) ] contain ( n (cid:48) − d ) unit elements and ( d −

1) null elements. Lemma 3 describes aconstruction of ( n (cid:48) −

1) aﬃnely independent vectors { (cid:126)b k } k ∈ [ n (cid:48) − to set as the oﬀ-diagonals in the target row. Then,for each x ∈ [ n (cid:48) ] where V ( y | x (cid:54) = y ) = 0, we set V ( x | x ) = 1. This procedure obtains the upper bound in Eq. (D10)and constructs an aﬃnely independent vertex for each of the ( n (cid:48) −

1) binary vectors in { (cid:126)b k } k ∈ [ n (cid:48) − . For example,targeting row y = 3 of G when d = 3 yields four vertices, V ∈   ,   ,   ,   . (D11)Repeating the procedure for each y ∈ [ n (cid:48) ] results in n (cid:48) ( n (cid:48) −

1) aﬃnely independent vertices. The vertices enumeratedfor each target row y are aﬃnely independent from all other target rows because the free unit elements are onlyallowed in the target row. As a ﬁnal note, this procedure does not work in the case where d = 1 because there areonly n (cid:48) vertices in V n → n (cid:48) d or the case where d = n (cid:48) because only one vertex V = G n (cid:48) ML maximizes Eq. (D10). Since n (cid:48) ( n (cid:48) −

1) = dim( C n (cid:48) → n (cid:48) d ) aﬃnely independent vertices are constructed, ( G n (cid:48) ML , d ) is proven to be a tight Bell inequalityof all signaling polytopes with n (cid:48) > d >

3. Ambiguous Guessing Facets

In this section we discuss the conditions for which ambiguous guessing games (see main text) are tight Bell inequal-ities. Consider the ambiguous guessing

Bell inequality ( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) where G n (cid:48) ,d ? ∈ R n (cid:48) × ( n (cid:48) − , G n (cid:48) ,d ? = (cid:88) x ∈ [ n (cid:48) − ( n (cid:48) − d ) | x (cid:105)(cid:104) x | + | n (cid:48) (cid:105)(cid:104) x | , and γ n (cid:48) ,d ? = d ( n (cid:48) − d ) . (D12)7This Bell inequality is best considered as a combination between a 1-guessing game for which a correct answer provides( n (cid:48) − d ) points extended with an ambiguous row for which 1 point is scored for choosing the ambiguous output. Forexample when n (cid:48) = 6 and d = 2 we have G n (cid:48) ,d ? = (cid:34) ( n (cid:48) − d ) G ( n (cid:48) − (cid:126) (cid:35) , e.g. G , =   , (D13)where we refer to rows of the G ( n (cid:48) − block as guessing rows and (cid:126) ambiguous row . Note that G n (cid:48) ,d ? is a special case of the ambiguous guessing game G ( k ) (see main text), and withoutloss of generality, we express G n (cid:48) ,d ? in a normal form where all elements G y,x are non-negative integers. For any vertex V ∈ V n → n (cid:48) d , the inequality (cid:104) G n (cid:48) ,d ? , V (cid:105) ≤ d ( n (cid:48) − d ) (D14)is satisﬁed. We now prove the conditions for which inequality ( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) is a facet of C n → n (cid:48) d . Proposition 6.

The inequality ( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) is a facet of C n → n (cid:48) d when n = n (cid:48) − n (cid:48) − ≥ d ≥ Proof.

To prove that ( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) is a facet of C n → n (cid:48) d we construct a set of dim( C ( n (cid:48) − → n (cid:48) d ) = ( n (cid:48) − aﬃnelyindependent vertices { V ∈ V ( n (cid:48) − → n (cid:48) d | d ( n (cid:48) − d ) = (cid:104) G n (cid:48) ,d ? , V (cid:105)} . Using Proposition D 2 we can easily enumerate( n (cid:48) − n (cid:48) −

2) aﬃnely independent vertices that optimize the G ( n (cid:48) − block. The remaining vertices are constructedusing the ambiguous row and ( d −

1) guessing rows. In these vertices, the ambiguous row has ( d −

1) null elements and( n (cid:48) − d ) unit elements, hence, Lemma 3 can be used to ( n (cid:48) −

1) aﬃnely independent arrangements of the ambiguousrow. For each of the ( n (cid:48) −

1) arrangements, a vertex V ? is constructed by setting each V ? ( x | x ) = 1 where x ∈ [ n (cid:48) − V ? ( n (cid:48) | x ) = 0. Combining the ( n (cid:48) − n (cid:48) −

2) vertices from the G ( n (cid:48) − block and the ( n (cid:48) −

1) vertices from theambiguous row, a total of ( n (cid:48) − aﬃnely independent vertices are found. Therefore, ( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) is a tight Bellinequality of C ( n (cid:48) − → n (cid:48) d . The upper bound n (cid:48) − ≥ d follows from the fact that if d ≥ ( n (cid:48) − a. Rescalings of Ambiguous Guessing facets An ambiguous guessing facet ( G n (cid:48) ,d ? , γ n (cid:48) ,d ? ) as deﬁned in Proposition 6 can be rescaled to G (cid:48) , ∈ R n (cid:48) × ( n +1) bytaking a guessing row y where G y,x = ( n (cid:48) − d ) distributing the value between between two columns such that G (cid:48) y,x = 1and G (cid:48) y,x (cid:48) = ( n (cid:48) − d ) − x (cid:48) = n (cid:48) is a new column. This rescaling is a non-trivial input lifting rule. The boundof the input-lifted facet is the same as the unlifted version. For example, when n (cid:48) = 5 and d = 2, the G , is rescaledalong the 4th row as, G , =   → G (cid:48) , =   . (D15)This rescaling input lifting is a general trend observed in our computed signaling polytope facets [32], however, it isnot clear how broadly this lifting rule applies or generalizes.8

4. Anti-Guessing Facets

Another special case of the k -guessing game is the anti-guessing game Bell inequality ( G n (cid:48) A , n (cid:48) ) where G n (cid:48) A = G n (cid:48) , ( n (cid:48) − . For any channel P ∈ P n → n (cid:48) with n = n (cid:48) The anti-guessing Bell inequality (cid:104) G n (cid:48) A , P (cid:105) ≤ n (cid:48) is satisﬁed,therefore anti-guessing games are not very useful for witnessing signaling dimension. That said, the anti-guessinggame is signiﬁcant because it can be combined with a maximum likelihood game in block form to construct a facetof the d = ( n (cid:48) −

2) signaling polytope C n → n (cid:48) ( n (cid:48) − . We denote these anti-guessing facets by G ε,m (cid:48) A where the facet isconstructed as G ε,m (cid:48) A = (cid:20) G ε A ˆ0ˆ0 G m (cid:48) ML (cid:21) , e.g. G , =   (D16)where G ε,m (cid:48) A ∈ R n (cid:48) × n (cid:48) , n (cid:48) = ε + m (cid:48) , and ˆ0 is a matrix block of zeros. For channel P ∈ C n (cid:48) → n (cid:48) d , (cid:104) G ε,m (cid:48) A , P (cid:105) ≤ ε + d − γ ε,d A . (D17)This upper bound follows from the fact that no more than two rows are required to score ε in the G ε A block and theremaining d − G m (cid:48) ML block. Proposition 7.

The inequality ( G ε,m (cid:48) A , γ ε,d A ) is a facet of C n → n (cid:48) d where n = n (cid:48) , n (cid:48) − ≥ d ≥

2, and n (cid:48) − d + 1 ≥ ε ≥ Proof.

To prove the tightness of the anti-guessing Bell inequality we show a row-by-row construction of dim( C n → n (cid:48) d ) = n (cid:48) ( n (cid:48) −

1) aﬃnely independent vertices { V ∈ V n → n (cid:48) d |(cid:104) G ε,m (cid:48) A , V (cid:105) = γ ε,d A } . For convenience, we refer to the ﬁrst ε rowsof G ε,m (cid:48) A as anti-guessing rows and the remaining m (cid:48) rows as guessing rows. We treat anti-guessing and guessingrows individually because each admits its own vertex construction. To help illustrate this proof, we draw upon theexample where ε = m (cid:48) = d = 3, G , =   and γ , = 4 . (D18)For a target anti-guessing row y ∈ [1 , ε ] we construct ( n (cid:48) −

1) vertices where ( ε −

1) vertices are constructed usingthe G εA block and m (cid:48) vertices are constructed using the ˆ0 block in the top right. Note that a vertex achieves theupper bound γ ε,d A only if two or less anti-guessing rows are used. A vertex V A is constructed using the G ε,m (cid:48) A block bysetting V A ( y | x ) = 1 for all x that satisfy G ε,m (cid:48) y,x = 1 and selecting a secondary row y (cid:48) (cid:54) = y with y (cid:48) ∈ [1 , ε ] and setting V A ( y (cid:48) | x (cid:48) ) = 1 where x (cid:48) is the index of the null element in the target row G ε,m (cid:48) y,x (cid:48) = 0. All remaining elements of V A areset so that the ﬁrst ( d −

2) diagonal elements of the G m (cid:48) ML block are selected and any remaining terms are set as unitelements in the target row. An aﬃnely independent vertex is constructed for each of the ( ε −

1) choices of secondaryrow y (cid:48) . For example, when targeting row y = 1 we enumerate two vertices V A ∈   ,   . (D19)9For a target anti-guessing row y , an additional m (cid:48) vertices with form V A , are constructed using the ˆ0 block inthe top right. If m (cid:48) > ( d − V A , ( y | x ) = 1 where x ∈ [1 , ε ]. The remaining ( d −

1) rowsare then used to maximize the G m (cid:48) ML block. Using Lemma 3 a set of m (cid:48) aﬃnely independent vectors { (cid:126)b k } m (cid:48) k =1 } with( d −

1) null elements and ( m (cid:48) − d + 1) unit elements can be constructed and used in the ˆ0 block of V A , by setting V A , ( y | [ ε + 1 , n (cid:48) ]) = (cid:126)b k . All remaining null elements in the target row of V A , are then set along the diagonal ofthe G m (cid:48) ML block. Since there are m (cid:48) choices of (cid:126)b k , that many aﬃnely independent vertices can be constructed. Forexample, when targeting row y = 1 we enumerate 3 vertices, V A , ∈   ,   ,   . (D20)If m (cid:48) = ( d − y (cid:48) is selected where the anti-guessing rows are set as V A , ( y | x ) = 1and V A , ( y (cid:48) | x (cid:48) ) where x, x (cid:48) ∈ [1 , ε ] and G ε,m (cid:48) y,x = 1 and G ε,m (cid:48) y,x (cid:48) = 0. The remainder of the procedure is the same as the m (cid:48) > ( d −

1) case. Note that in the m (cid:48) = ( d −

1) case one of the V A , vertices is redundant of a V A vertex. To reconcilethis conﬂict another vertex must be added which maximizes G m (cid:48) ML with V ( y | x ) = 1 for all x ∈ [1 , ε ] and V ( x (cid:48) | x (cid:48) ) = 1for all x (cid:48) ∈ [ ε + 1 , n (cid:48) ]. By this procedure ( ε −

1) + m (cid:48) = ( n (cid:48) −

1) aﬃnely independent vertices are constructed for eachtarget row y ∈ [1 , ε ]. Thus, ε ( n (cid:48) −

1) aﬃnely independent vertices are constructed for the anti-guessing rows of G ε,m (cid:48) A .For a target guessing row y ∈ [ ε + 1 , n (cid:48) ] we construct ( n (cid:48) −

1) vertices where ε are constructed using the ˆ0 block inthe lower left and ( m (cid:48) −

1) vertices using the G m (cid:48) ML block. Starting with the lower left ˆ0 block we construct a vertex V ML , for each x ∈ [1 , ε ] by setting V ML , ( y | x ) = 1 and V ML , ( y | y ) = 1. Of the remaining ( d −

1) rows one is used tomaximize the G ε A block and ( d −

2) rows maximize the G m (cid:48) ML block. Any unspeciﬁed unit terms of V ML , are set inthe target row y . Since there are ε values of x to consider, this procedure produces ε aﬃnely independent vertices.For example, when targeting row y = 4 we enumerate 3 vertices, V ML , ∈   ,   ,   . (D21)Next, we use the G m (cid:48) ML block to a vertex V ML . If m (cid:48) > ( d − V ML (1 | x ) = 1 for all x ∈ [1 , ε ] and usethe procedure in Proposition 5 to enumerate ( m (cid:48) −

1) aﬃnely independent vertices that optimize the G m (cid:48) ML block in thetarget row. If m (cid:48) = ( d − G ε A block while the procedurein Proposition 5 is used for the remaining ( d −

2) rows are used to construct ( m (cid:48) −

1) aﬃnely independent verticesthat optimize the G m (cid:48) ML block in the target row. For example, when targetinng row y = 4 we enumerate 2 vertices, V ML ∈   ,   . (D22)Each guessing row produces ε + ( m (cid:48) −

1) = ( n (cid:48) −

1) aﬃnely independent vertices, thus in total, we have m (cid:48) ( n (cid:48) − ε ( n (cid:48) −

1) + m (cid:48) ( n (cid:48) −

1) = n (cid:48) ( n (cid:48) −

1) aﬃnely independent vertices. Therefore, we prove that ( G ε,m (cid:48) A , γ ε,d A ) is a tight Bell inequality. We nowaddress the bounds on d and ε . The lower bound ε ≥ G = G meaning the anti-guessing game is indistinguishable from the maximum likelihood game. The upper bound n (cid:48) − d + 1 ≥ ε follows from0the fact that m (cid:48) ≥ ( d −

1) must be satisﬁed or n (cid:48) ( n (cid:48) −

1) aﬃnely independent vertices cannot be found because theentire diagonal of the G m (cid:48) ML block must be used by every vertex to satisfy (cid:104) G ε,m (cid:48) A , V (cid:105) = ε + d −

2. The upper bound n (cid:48) − ≥ d results from the lower bound on ε and the fact that d cannot be so large the n (cid:48) − d + 1 < Appendix E: Proof of Proposition 2

In this section we prove the conditions for which the ambiguous guessing game ( G n,n (cid:48) k,d , d ) is a facet of C n → n (cid:48) d .

1. Proof of Proposition 2(i)

Proof.

To prove Proposition 2(i), we consider the general form of an ambiguous guessing Bell inequality ( G n,n (cid:48) k,d , d )where G n,n (cid:48) k,d ∈ R n (cid:48) × n is row stochastic and contains k = n (cid:48) guessing rows (see main text). Note that matrix G n,n (cid:48) k,d is row stochastic and therefore describes any input/output lifting and permutation of the maximum likelihood game G m (cid:48) ML = I m (cid:48) where min { n, n (cid:48) } ≥ m (cid:48) ≥

1. For example,   ,   , and   , (E1)are all instances of G , ,d . By Proposition 5, ( G m (cid:48) ML , d ) is a facet of C m (cid:48) → m (cid:48) d iﬀ m (cid:48) > d >

1, that is, rank( G m (cid:48) ML ) > d .When the trivial lifting rules (see Fig. 2) are applied to G m (cid:48) ML , the rank of the lifted matrix does not change.Therefore, any Bell inequality ( G n,n (cid:48) n (cid:48) ,d , d ) with rank( G n,n (cid:48) n (cid:48) ,d ) > d is a facet of C n → n (cid:48) d that has been lifted from C m (cid:48) → m (cid:48) d where min { n, n (cid:48) } ≥ m (cid:48) > d . Conversely, if rank( G n,n (cid:48) n (cid:48) ,d ) < d , then (cid:104) G n,n (cid:48) n (cid:48) ,d , V (cid:105) < d for any V ∈ V n → n (cid:48) d . Likewise,if rank( G n,n (cid:48) n (cid:48) ,d ) = d , then there are an insuﬃcient number of aﬃnely independent vertices V ∈ V n → n (cid:48) d which satisfy (cid:104) G n,n (cid:48) n (cid:48) ,d , V (cid:105) = d because d columns must have ﬁxed values in V . Thus we conclude that when min { n, n (cid:48) } > d > G n,n (cid:48) n (cid:48) ,d , d ) is a tight Bell inequality of C n → n (cid:48) d iﬀ rank( G n,n (cid:48) n (cid:48) ,d ) > d . Remark.

Proposition 2(i) is signiﬁcant because it allows one to easily ﬁnd a facet of any signaling polytope C n → n (cid:48) d .This enables the use of adjacency decomposition for any signaling polytope (see Appendix C).

2. Proof of Proposition 2(ii)

Proof.

To prove Proposition 2(ii), we consider the ambiguous guessing game Bell inequalities ( G n,n (cid:48) k,d , d ) with k guessingrows and ( n (cid:48) − k ) ambiguous rows (see main text). Note that the ambiguous rows of G n,n (cid:48) k,d span the entire width ofthe matrix. For example, 

13 13 13 1313 13 13 13  , 

13 13 13 1313 13 13 13  , and 

13 13 13 1313 13 13 13  , (E2)are all instances of G , , . Furthermore, any ambiguous guessing facet described by Proposition 6 ( G n (cid:48) ,d ? , d ( n (cid:48) − d )) of C ( n (cid:48) − → n (cid:48) d can be converted into an inequality ( G ( n (cid:48) − ,n (cid:48) ( n (cid:48) − ,d , d ) simply by dividing the inequality by ( n (cid:48) − d ), hence,these two matrices describe the same inequality. It follows that any ambiguous guessing facet G m (cid:48) ,d ? can be input1lifted from C ( m (cid:48) − → m (cid:48) d to C ( m (cid:48) − → n (cid:48) d where n (cid:48) ≥ m (cid:48) . Since the rank of the ( m (cid:48) −

1) guessing rows of G m (cid:48) ,d ? is( m (cid:48) −

1) and input liftings do not aﬀect the matrix rank, any G ( m (cid:48) − ,n (cid:48) k,d with n (cid:48) > k ≥ ( m (cid:48) −

1) with a similar rankfor its guessing rows must be a facet of C ( m (cid:48) − → n (cid:48) d . Finally, if the rank of the guessing rows of G n,n (cid:48) k,d ) is less than n , then G n,n (cid:48) k,d cannot be a facet of C n → n (cid:48) d because there is an insuﬃcient number of aﬃnely independent vertices in { V ∈ V n → n (cid:48) d | (cid:104) G n,n (cid:48) k,d , V (cid:105) = d } . This is true because Proposition 5 implies that we can enumerate ( k − n aﬃnelyindependent vertices using only guessing rows of G n,n (cid:48) k,d . This requires that the remaining ( n (cid:48) − k ) n aﬃnely independentvertices are enumerated using ( d −

1) guessing rows and one ambiguous row. However, as exempliﬁed in the proof ofProposition 6, this cannot be done unless there is a nonzero element in each column of the k guessing rows of G n,n (cid:48) k,d .Thus we conclude that for n (cid:48) > k ≥ n and n > d G n,n (cid:48) k,d is a facet of C n → n (cid:48) d iﬀ the rank of the guessing rows is n . Remark.

In our proof, we do not consider input liftings of G m (cid:48) ,d ? because it results in matrices which deviate inform from G n,n (cid:48) k,d . Input lifting append an all-zero column to G m (cid:48) ,d ? while G n,n (cid:48) k,d is deﬁned to have a nonzero elementin each column of an ambiguous row ambiguous. Therefore, input liftings of ambiguous guessing facets G n (cid:48) ,d ? areincompatible with the ambiguous guessing games G n,n (cid:48) k,d described in the main text. Appendix F: Proof of Theorem 1

Our proofs to parts (i) and (ii) of Theorem 1 follow the same approach. In both cases we want to show that thesignaling polytope C n → n (cid:48) d is equivalent to some convex polytope deﬁned by certain Bell inequalities. We establish thisby showing that the extreme points of the latter are also extreme points of the former; the converse has already beenshown in Eq. (7). Recall that the extreme points of C n → n (cid:48) d consist of all extreme points of P n → n (cid:48) having rank nogreater than d . In other words, P is extremal in C n → n (cid:48) d iﬀ it is column stochastic with 0 / d nonzero rows.We rely heavily on the following general characterization of extreme points. Proposition 8.

Let

S ⊂ R n (cid:48) × n be some convex polytope in R n (cid:48) × n . Then P is an extreme point of S iﬀ there doesnot exist some D ∈ R n (cid:48) × n such that P ± D ∈ S .In our application of Proposition 8, we will refer to D ∈ R n (cid:48) × n as a “valid” perturbation of P if P ± D ∈ S ; hence if D is a valid perturbation then P cannot be extremal.Some other terminology used in our proofs is the following. For a channel P ∈ P n → n (cid:48) , an element P ( y | x ) is called non-extremal if it lies in the open interval (0 , P ( y | x ) is a row maximizer if it attains the largest valuein row y of P . It is further called a unique row maximizer if there are no other elements in row y having this value.Finally, we deﬁne the maximum likelihood estimation (ML) sum φ ( P ) := n (cid:48) (cid:88) x =1 (cid:107) r y (cid:107) ∞ , (F1)where r y denotes row y of P and (cid:107) r y (cid:107) ∞ is its row maximizer. Then the maximum likelihood estimation (ML) polytopecan be expressed as M n → n (cid:48) d = { P ∈ P n → n (cid:48) | φ ( P ) ≤ d } . (F2)Note that φ is a convex function so that if φ ( P ) = d with P = (cid:80) λ p λ V λ for extreme points V λ ∈ M n → n (cid:48) d andnon-negative numbers p λ , then necessarily φ ( V λ ) = d for every λ .

1. Proof of Theorem 1(i)

The proof of Theorem 1(i) follows immediately from the following lemma due to the convexity of the ML andsignaling polytopes.

Lemma 4.

For arbitrary n and n (cid:48) , the extreme points of M n → n (cid:48) n (cid:48) − are extreme points of C n → n (cid:48) n (cid:48) − .2 Proof.

We ﬁrst show the conclusion of Lemma 4 is true for any extreme point V of M n → n (cid:48) n (cid:48) − having ML sum φ ( V )

1. If V is not extremal in P n → n (cid:48) , then V must have at least one column x with two non-extremal elements V ( y | x )and V ( y | x ). However, we could then take two perturbations V ( y | x ) → V ( y | x ) ± (cid:15) and V ( y | x ) → V ( y | x ) ∓ (cid:15) with (cid:15) chosen suﬃciently small so that the ML sum remains < n (cid:48) − V must be extremal in P n → n (cid:48) with rank clearly < n (cid:48) − V of M n → n (cid:48) n (cid:48) − for which φ ( V ) = n (cid:48) −

1. Since φ ( V ) = n (cid:48) − V has n (cid:48) rows, then V must have at least two non-extremal row maximizers (possibly in diﬀerent columns). Wewill again introduce perturbations, but care is needed to ensure that the perturbations are valid; i.e. the perturbedchannels must remain in M n → n (cid:48) n (cid:48) − . There are two cases to consider. Case (a) : Suppose that two non-extremal row maximizers occur in the same column: say V ( y | x ) and V ( y | x ) areboth row maximizers in column x . Since these values will account for the contributions of rows y and y in theML sum, and since there are only n (cid:48) total rows in this sum, we must have that all other row maximizers are +1.Hence we introduce perturbations V ( y | x ) → V ( y | x ) ± (cid:15) and V ( y | x ) → V ( y | x ) ∓ (cid:15) . If V ( y | x ) and V ( y | x ) areunique row maximizers, then this perturbation is valid. On the other hand, if there are columns x (cid:48) , x (cid:48)(cid:48) such that V ( y | x ) = V ( y | x (cid:48) ) and/or V ( y | x ) = V ( y | x (cid:48)(cid:48) ) (with possibly x (cid:48) = x (cid:48)(cid:48) ), then we must also introduce a correspondingperturbation V ( y | x (cid:48) ) → V ( y | x (cid:48) ) ± (cid:15) and/or V ( y | x (cid:48)(cid:48) ) → V ( y | x (cid:48)(cid:48) ) ∓ (cid:15) . To preserve normalization in columns x (cid:48) and/or x (cid:48)(cid:48) , we will have to introduce an oﬀ-setting perturbation to some other row in x (cid:48) and/or x (cid:48)(cid:48) . This can alwaysbe done since either x (cid:48) = x (cid:48)(cid:48) , or x (cid:48) and/or x (cid:48)(cid:48) have a non-extremal element in some other row which is not a rowmaximizer (since all other row maximizers are +1). Case (b) : No column has two non-extremal row maximizers, and V has at least two non-extremal row maximizersthat belong to diﬀerent columns. For each row y with a non-extremal row maximizer, add perturbations ± (cid:15) y toall the row maximizers in that row. Since each column has at most one row maximizer, a normalization-preservingperturbation ∓ (cid:15) y can be added to another non-extremal element in any column having a row maximizer in row y .Finally, choose the (cid:15) y so that (cid:80) n (cid:48) y =1 (cid:15) y = 0.

2. Proof of Theorem 1(ii)

We now turn to the ambiguous polytopes A n → n (cid:48) ∩ := ∩ n (cid:48) k = n A n → n (cid:48) k,n − . Recall that A n → n (cid:48) k,n − is the polytope of channels P ∈ P n → n (cid:48) satisfying all Bell inequalities of the form (cid:104) G n,n (cid:48) k,n − , P (cid:105) ≤ n − , (F3)with G n,n (cid:48) k,n − having k guessing rows and ( n (cid:48) − k ) ambiguous rows. In this case, all the elements in an ambiguous roware equal to n − d +1 = .To prove Theorem 1(ii) we apply the following lemma to show that the extreme points of A n → n (cid:48) ∩ are the same asthose of C n → n (cid:48) n − . Then by convexity of A n → n (cid:48) ∩ and C n → n (cid:48) n − we must have A n → n (cid:48) ∩ = C n → n (cid:48) n − . Lemma 5.

For arbitrary n (cid:48) ≥ n , the extreme points of A n → n (cid:48) ∩ are extreme points of C n → n (cid:48) n − . Proof.

We ﬁrst argue that the conclusion of Lemma 5 is true for any extreme point V of A n → n (cid:48) ∩ such that (cid:104) G n,n (cid:48) k,n − , V (cid:105)

1, and so V ∈ C n → n (cid:48) n − .It remains to prove the conclusion of Lemma 5 whenever Eq. (F3) is tight for some A n → n (cid:48) k,n − . The lengthiest partof this argument is when k = n (cid:48) and tightness in Eq. (F3) corresponds to the ML sum equaling n −

1. In this case,Proposition 10 below shows that V must be an extreme point of C n → n (cid:48) n − . However, before proving this result, we applyit to show that Lemma 5 holds whenever Eq. (F3) is tight for some other G n,n (cid:48) k,n − with k < n (cid:48) . Speciﬁcally, we willperform a lifting technique on any vertex V satisfying (cid:104) G n,n (cid:48) k,n − , V (cid:105) = n − n − φ ( V ) < n − G n,n (cid:48) k,n − such that (cid:104) G n,n (cid:48) k,n − , V (cid:105) = n −

1. The matrix G n,n (cid:48) k,n − identiﬁes ( n (cid:48) − k ) ambiguous rows, and suppose that y is an ambiguous row such that (cid:107) r y (cid:107) > (cid:107) r y (cid:107) ∞ , with r y being3the y th row of V . To be concrete, let us suppose without loss of generality that the components of row r y are arrangedin non-increasing order ( i.e. V ( y | x i ) ≥ V ( y | x i +1 )), and let k be the smallest index such that12 (cid:32) − k − (cid:88) i =1 V ( y | x i ) + n (cid:88) i = k V ( y | x i ) (cid:33) ≤ V ( y | x k ) . (F4)By the assumption (cid:107) r y (cid:107) > (cid:107) r y (cid:107) ∞ , we have k >

1. Also, since k is the smallest integer satisfying Eq. (F4), we have12 (cid:32) − k − (cid:88) i =1 V ( y | x i ) + n (cid:88) i = k − V ( y | x i ) (cid:33) > V ( y | x k − ) . (F5)Subtracting V ( y | x k − ) from both sides of this equation implies that the LHS of Eq. (F4) is strictly positive. Hence,there exists some λ ∈ (0 ,

00 0 · · · λV ( y | x k ) 0 · · ·  . (F7)Notice that we can obtain V from (cid:101) V by coarse-graining over these rows. Moreover, this decomposition was constructedso that k (cid:88) i =1 (cid:107) (cid:101) r y i (cid:107) ∞ = k − (cid:88) i =1 V ( y | x i ) + λV ( y | x k ) = (cid:107) r y (cid:107) , (F8)where the (cid:101) r y i are the rows in Eq. (F7). Essentially this transformation allows us to replace an ambiguous row witha collection of guessing rows so that the overall guessing score does not change.We perform this row splitting process on all ambiguous rows of V thereby obtaining a new matrix (cid:101) V such that φ ( (cid:101) V ) = n −

1. If m is the total number of rows in (cid:101) V , then (cid:101) V will be an element of A n → m ∩ . We decompose (cid:101) V into a convex combination of extremal points of A n → m ∩ as (cid:101) V = (cid:80) λ p λ (cid:101) V λ . By the convexity of φ , it follows that φ ( (cid:101) V λ ) = n −

1, and we can therefore apply Proposition 10 below on the channels (cid:101) V λ to conclude that they are extremepoints of C n → mn − . Consequently, each (cid:101) V λ has only one nonzero element per row. Let R denote the coarse-grainingmap such that V = R (cid:101) V , and apply V = R (cid:101) V = (cid:88) λ p λ R (cid:101) V λ . (F9)However, by the assumption that V is extremal, this is only possible if R (cid:101) V λ is the same for every λ . As a result,any two (cid:101) V λ and (cid:101) V λ (cid:48) can diﬀer only in rows that coarse-grain into the same rows by R . From this it follows that V can have no more than one nonzero element per column and rank( V ) ≤ n −

1. Hence we’ve shown that the extremepoints of A n → n (cid:48) ∩ are indeed extreme points of the signaling polytope C n → n (cid:48) n − .To complete the proof of Lemma 5, we establish the case when φ ( V ) = n −

1, as referenced above. We begin byproving the partial result provided by Proposition 9 and then, use this result to prove Proposition 10.

Proposition 9. If V is an extreme point of A n → n (cid:48) ∩ satisfying φ ( V ) = n −

1, then each column of V must have atleast one unique row maximizer or it has only one nonzero element. Proof.

Suppose on the contrary that some column x has more than one nonzero element yet no unique row maximizer.Let S x ⊂ [ n (cid:48) ] be the set of rows for which column x contains a row maximizer. Since only one row maximizer per4row contributes to the ML sum, and the elements of column x sum to one, we can satisfy φ ( V ) = n − iﬀ bothconditions hold:(i) each row y in S x has only two nonzero elements V ( y | x ) and V ( y | x y ) for some column x y (cid:54) = x ;(ii) every other nonzero element in V outside of column x and the rows in S x are unique row maximizers.With this structure, we introduce three cases of valid perturbations. Case (a) : V ( y | x ) and V ( y | x ) are non-extremal elements in column x with y , y (cid:54)∈ S x . Then V ( y | x ) → V ( y | x ) ± (cid:15) and V ( y | x ) → V ( y | x ) ∓ (cid:15) is a valid perturbation. Indeed, even if we consider y or y as ambiguous rows, there isat most one other element in each of these rows (property (i) above), and so this perturbation would not violate anyof the inequalities in (F3). Case (b) : V ( y | x ) and V ( y | x ) are non-extremal elements in column x with y ∈ S x and y (cid:54)∈ S x . Then V ( y | x ) = V ( y | x y ) for some other column x y (cid:54) = x . By normalization, there will be another element V ( y | x y ) in column x y that by property (ii) is a unique row maximizer. Hence, we introduce perturbations V ( y | x ) → V ( y | x ) ± (cid:15) V ( y | x y ) → V ( y | x y ) ± (cid:15)V ( y | x ) → V ( y | x ) ∓ (cid:15) V ( y | x y ) → V ( y | x y ) ∓ (cid:15). (F10)For clarity, the line spacing is chosen here so that elements on the same vertical line correspond to elements in thesame row of V . By properties (i) and (ii), these perturbations do not increase the ML sum, nor are they able toviolate any of the other inequalities in (F3). Case (c) : V ( y | x ) and V ( y | x ) are non-extremal elements in column x with y , y ∈ S x . Then V ( y | x ) = V ( y | x y )and V ( y | x ) = V ( y | x y ) for some other columns x y , x y (cid:54) = x (with possibly x y = x y ). By normalization, there willbe elements V ( y | x y ) and V ( y | x y ) in columns x y and x y respectively that are unique row maximizers (again byproperty (ii)). Note this requires that y , y , y , y are all distinct rows. Hence, we introduce perturbations V ( y | x ) → V ( y | x ) ± (cid:15) V ( y | x y ) → V ( y | x y ) ± (cid:15)V ( y | x ) → V ( y | x ) ∓ (cid:15) V ( y | x y ) → V ( y | x y ) ∓ (cid:15)V ( y | x y ) → V ( y | x y ) ∓ (cid:15) V ( y | x y ) → V ( y | x y ) ± (cid:15), (F11)Normalization is preserved under these perturbations and all the inequalities in (F3) are satisﬁed.As we have shown valid perturbations in all three cases under the assumption that some column has non-extremalelements with no unique row maximizer, the proposition follows. Proposition 10. If V is an extreme point of A n → n (cid:48) ∩ satisfying φ ( V ) = n −

1, then V is an extreme point of C n → n (cid:48) d . Proof.

Suppose that V has some column x containing more than one nonzero element (if no such column can befound, then the proposition is proven). Let V ( y | x ) ∈ (0 ,

1) denote a unique row maximizer, which is assured toexist by Proposition 9. We again proceed by considering two cases.

Case (a) : Column x contains only one row maximizer V ( y | x ) and all other elements in the column are not rowmaximizers. Then there must exist another column x (cid:48) that also contains at least two nonzero elements. Indeed, ifon the contrary all other columns only had one nonzero element each, then it would be impossible for φ ( V ) = n − x (cid:48) only contains row maximizers, then proceed to case (b) and replace x with x (cid:48) . Otherwise, x (cid:48) does not onlycontain row maximizers; rather it has a unique row maximizer V ( y | x (cid:48) ) in row y and a nonzero element V ( y | x (cid:48) ) inrow y that is not a row maximizer. Thus, we can introduce the valid perturbations V ( y | x ) → V ( y | x ) ± (cid:15)V ( y | x ) → V ( y | x ) ∓ (cid:15) V ( y | x (cid:48) ) → V ( y | x (cid:48) ) ∓ (cid:15)V ( y | x (cid:48) ) → V ( y | x (cid:48) ) ∓ (cid:15) (F12)where V ( y | x ) denotes another nonzero element in x (with possibly y = y , y and/or y = y ). It can be veriﬁedthat all inequalities in (F3) are preserved under these perturbations.5 Case (b) : Column x only contains row maximizers, with V ( y | x ) being another one in addition to V ( y | x ). If V ( y | x ) is a unique row maximizer, then valid perturbations can be made to both V ( y | x ) and V ( y | x ). On theother hand, suppose that V ( y | x ) is a non-unique row maximizer, and let V ( y | x ) = V ( y | x ) be another rowmaximizer in column x . There can be no other nonzero elements in row y . Indeed, if there were another column,say x , such that V ( y | x ) >

0, then we would have12 (cid:107) r y (cid:107) ≥ (cid:16) V ( y | x ) + V ( y | x ) + V ( y | x ) (cid:17) > V ( y | x ) = (cid:107) r y (cid:107) ∞ , (F13)and so (cid:104) G n,n (cid:48) n (cid:48) − ,n − , V (cid:105) > φ ( V ) = n − , (F14)where the one ambiguous row in G n,n (cid:48) n (cid:48) − ,n − is y . Hence, the only nonzero elements in row y are V ( y | x ) and V ( y | x ). Let V ( y | x ) be a unique row maximizer in column x .We must be able to ﬁnd another column x with more than one nonzero element, one of which is a unique rowmaximizer and the other which is a non-unique row maximizer. For if this were not the case, then any other columnin V would either have a unique row maximizer equaling one, or it would have at least two elements, one being aunique row maximizer and the others not being row maximizers. However, the latter possibility was covered in case(a) and was shown to be impossible for an extremal V . For the former, if all then other n − x and x contain unique row maximizers equaling one, then they would collectively contribute an amount of n − x is a row maximizer, and V ( y | x ) is a row maximizer in column x , wewould have φ ( V ) > ( n −

2) + 1 + V ( y | x ) > n −

1. Hence, there must exist another column x with a non-unique rowmaximizer V ( y | x ) that is shared with column x (which may be equivalent to either x or x ). Letting V ( y | x ) and V ( y | x ) denote unique row maximizers in columns x and x , respectively, we can perform the valid perturbations V ( y | x ) → V ( y | x ) ± (cid:15)V ( y | x ) → V ( y | x ) ∓ (cid:15) V ( y | x ) → V ( y | x ) ∓ (cid:15)V ( y | x ) → V ( y | x ) ± (cid:15) V ( y | x ) → V ( y | x ) ± (cid:15)V ( y | x ) → V ( y | x ) ∓ (cid:15) V ( y | x ) → V ( y | x ) ∓ (cid:15)V ( y | x ) → V ( y | x ) ± (cid:15). (F15)Note that y , y , y , y , y , y are all distinct rows since each row in V can have at most one pair of non-unique rowmaximizers while rows y , y , y , y contain unique row maximizers. This assures that the perturbations do not violatethe inequalities in (F3).As cases (a) and (b) exhaust all possibilities, we see that V can only have one nonzero element per column. Fromthis the conclusion of Proposition 10 follows.This completes the proof of Lemma 5. Appendix G: Proof of Theorem 2

In this section we analyze the C n → signaling polytope to prove the Theorem 2. To begin we deﬁne the polyhedronof channels C ( G , γ ) := { P ∈ P n → n (cid:48) | (cid:104) G , P (cid:105) = n (cid:88) x =1 n (cid:48) (cid:88) y =1 G y,x P ( y | x ) ≤ γ } (G1)for any Bell inequality ( G , γ ) with G ∈ R n (cid:48) × n and γ ∈ R . Since C n → n (cid:48) d is a convex polytope, there exists a ﬁnitenumber of polyhedra {C ( G m , γ m ) } rm =1 such that C n → n (cid:48) d = r (cid:92) m =1 C ( G m , γ m ) . (G2)6 Remark.

Without loss of generality, we can assume that the matrices G m contain non-negative elements. Indeed, if G y,x < x of G m , then we replace each element in column x as G y (cid:48) ,x → G y (cid:48) ,x + G y,x and shift γ → γ + G y,x . Hence the smallest element in column x of G m becomes G y,x = 0.The proof of Theorem 2 is a consequence of Lemmas 6 and 7 below and our numerical results for the C n → signalingpolytope [32] (see Fig. 3). First, by Lemma 6 we can reduce any Bell inequality ( G , γ ) bounding C n → to a newBell inequality ( ˆ G , ˆ γ ) having at most 2 nonzero elements in each column. The reduced inequality ( ˆ G , ˆ γ ) satisﬁes C n → ⊂ C ( ˆ G , ˆ γ ) ⊂ ( G , γ ) and thus bounds C n → more tightly than ( G , γ ). Next, we use Lemma 7 to show thatfor any integer n a tight Bell inequality of C n → has at most six nonzero columns. The presence of all-zero columnsimplies that this inequality is simply an input lifting of a tight bell inequality of C → . Therefore, the complete set oftight Bell inequalities bounding C n → is the set of all input liftings and permutations of the generator facets of C → shown in Fig. 3. Lemma 6. If C n → n (cid:48) d ⊂ C ( G , γ ), then there exists a polyhedron C ( ˆ G , ˆ γ ) with ˆ G having at most ( n (cid:48) − d ) nonzeroelements in each column and satisfying C n → n (cid:48) d ⊂ C ( ˆ G , ˆ γ ) ⊂ C ( G , γ ) . (G3) Proof.

Suppose C n → n (cid:48) d ⊂ C ( G , γ ) and consider an arbitrary x ∈ [ n ]. For convenience, let us relabel the elements ofthe x th column of G in non-increasing order; i.e G y,x ≥ G y +1 ,x . Every vertex V of C n → n (cid:48) d will satisfy γ ≥ (cid:88) x (cid:48) ,y G y,x (cid:48) V ( y | x (cid:48) ) = (cid:88) y G y,x V ( y | x ) + (cid:88) x (cid:48) (cid:54) = x,y G y,x (cid:48) V ( y | x (cid:48) )= (cid:88) y G y,x V ( y | x ) + f ( G , V , x ) , (G4)where f ( G , V , x ) := (cid:80) x (cid:48) (cid:54) = x,y G y,x (cid:48) V ( y | x (cid:48) ). A key observation is γ ≥ G d,x + f ( G , V , x ) for every vertex V of C n → n (cid:48) d . (G5)We prove this observation using Eq. (G4). First consider any vertex V such that V ( y | x ) = δ d (cid:48) y with d (cid:48) ≥ d . ThenEq. (G4) shows that γ ≥ G d (cid:48) ,x + f ( G , V , x ) ≥ G d,x + f ( G , V , x ), since we have labeled the elements in non-increasingorder. On the other hand, consider a vertex V for which V ( y | x ) = δ d (cid:48) y with d (cid:48) < d . Since vertices can be formed with d nonzero rows, we can choose another vertex V (cid:48) that is identical to V in all columns x (cid:48) (cid:54) = x , and yet for column x it satisﬁes V (cid:48) ( y | x ) = δ d (cid:48)(cid:48) y with d ≤ d (cid:48)(cid:48) . Hence applying Eq. (G4) to vertex V (cid:48) yields γ ≥ G d (cid:48)(cid:48) ,x + f ( V (cid:48) , x ) ≥ G d,x + f ( G , V (cid:48) , x ) = G d,x + f ( G , V , x ) , (G6)where the last line follows from the fact that V and V (cid:48) only diﬀer in column x .Having established Eq. (G5), we next form a new matrix ˆ G which is obtained from G by replacing its x th columnwith ( ˆ G y,x ) Ty := ( d (cid:122) (cid:125)(cid:124) (cid:123) , · · · , , G d +1 ,x − G d,x , · · · , G n (cid:48) ,x − G d,x ) T . (G7)Letting ˆ γ = γ − G d,x , for any vertex V we have (cid:88) x (cid:48) ,y ˆ G y,x (cid:48) V ( y | x (cid:48) ) = (cid:88) y ˆ G y,x V ( y | x ) + f ( G , V , x )= (cid:40) f ( G , V , x ) if V ( y | x ) = δ d (cid:48) y with d (cid:48) ≤ dG d (cid:48) ,x − G d,x + f ( G , V , x ) if V ( y | x ) = δ d (cid:48) y with d (cid:48) > d ≤ ˆ γ, (G8)where the last inequality follows from Eq. (G5) (in the ﬁrst case) and Eq. (G4) (in the second case). Hence, we have7that C n → n (cid:48) d ⊂ C ( ˆ G , ˆ γ ). Conversely, if P ∈ C ( ˆ G , ˆ γ ), then γ − G d,x ≥ (cid:88) x (cid:48) ,y ˆ G y,x (cid:48) P ( y | x (cid:48) ) = (cid:88) y ˆ G y,x P ( y | x ) + (cid:88) x (cid:48) (cid:54) = x,y ˆ G y,x (cid:48) P ( y | x (cid:48) )= n (cid:48) (cid:88) y = d +1 ( G y,x − G d,x ) P ( y | x ) + (cid:88) x (cid:48) (cid:54) = x,y ˆ G y,x (cid:48) P ( y | x (cid:48) )= − G d,x (1 − d (cid:88) y =1 P ( y | x )) + n (cid:48) (cid:88) y = d +1 G y,x P ( y | x ) + (cid:88) x (cid:48) (cid:54) = x,y ˆ G y,x (cid:48) P ( y | x (cid:48) ) ≥ − G d,x + n (cid:48) (cid:88) y =1 G y,x P ( y | x ) + (cid:88) x (cid:48) (cid:54) = x,y ˆ G y,x (cid:48) P ( y | x (cid:48) )= − G d,x + (cid:88) x (cid:48) ,y G y,x (cid:48) P ( y | x (cid:48) ) . (G9)Therefore, P ∈ C ( G , γ ) and so C n → n (cid:48) d ⊂ C ( ˆ G , ˆ γ ) ⊂ C ( G , γ ). Note that if G has only non-negative elements then sowill ˆ G . Lemma 7.

For any ﬁnite number of inputs n , C n → = s (cid:92) m =1 C ( G m , γ m ) (G10)with each G m having at most six nonzero columns. Proof.

As a consequence of Lemma 6, we can always ﬁnd a complete set of polyhedra {C ( ˆ G m , ˆ γ m ) } sm =1 such that C n → = s (cid:92) m =1 C ( ˆ G m , ˆ γ m )such that each ˆ G m has no more than positive elements in each column and the rest being zero. Our goal is to showthat the number of such columns can be reduced to six. The key steps in our reduction are given by the followingtwo propositions. Proposition 11.

Consider the matricesˆ G =  a b · · · · c d · · · · · · · · · · · ·  , ˆ G (cid:48) =  a − c b + c · · · · d + c · · · · · · · · · · · ·  , a ≥ c ≥ , (G11)which diﬀer only in the ﬁrst two columns. Then C n → ∈ C ( ˆ G , ˆ γ ) iﬀ C n → ∈ C ( ˆ G (cid:48) , ˆ γ ). Proof.

Every vertex V of C n → will have support in only two rows. If V has support in the ﬁrst two rows, thenits upper left corner will have one of the forms ( ), ( ), ( ), ( ). In each of these cases, (cid:104) ˆ G , V (cid:105) ≤ ˆ γ ⇔(cid:104) ˆ G (cid:48) , V (cid:105) ≤ ˆ γ .The other possibility is that V has support in only one of the ﬁrst two rows. This leads to upper left corners of theform ( ), ( ), ( ), ( ), ( ), ( ). Suppose now that C n → ∈ C ( ˆ G , ˆ γ ). If a vertex V of C n → has form ( )in the upper left corner, non-negativity of c implies that (cid:104) ˆ G (cid:48) , V (cid:105) ≤ ˆ γ . A somewhat less trivial case is any vertex V having form ( ) in the upper left corner. Here we need to use the fact that there exists a vertex V with ( ) inthe upper left corner but is identical to V in all other columns. Hence we haveˆ γ ≥ (cid:104) ˆ G , V (cid:105) = a + b + κ ⇒ (cid:104) ˆ G (cid:48) , V (cid:105) = b + c + κ ≤ a + b + κ ≤ ˆ γ, (G12)8where κ is the contribution of the other columns to the inner product, and we have used the assumption that a ≥ c . Similar reasoning shows that (cid:104) ˆ G (cid:48) , V (cid:105) ≤ ˆ γ for all other vertices V . Conversely, by an analogous case-by-caseconsideration, we can establish that C n → ∈ C ( ˆ G (cid:48) , ˆ γ ) implies (cid:104) ˆ G , V (cid:105) ≤ ˆ γ for all vertices V of C n → . Proposition 12.

Consider the matricesˆ G =  a b · · · · · · · · · · · · · · · ·  , ˆ G (cid:48) =  a + b · · · · · · · · · · · · · · · ·  , ˆ G (cid:48)(cid:48) =  a + b · · · · · · · · · · · · · · · ·  . (G13)which diﬀer only in the ﬁrst two columns. Then C n → ∈ C ( ˆ G , ˆ γ ) iﬀ C n → ∈ C ( ˆ G (cid:48) , ˆ γ ) ∩ C ( ˆ G (cid:48)(cid:48) , ˆ γ ). Proof.

This proof considers the vertices of C n → and applies the same reasoning as the proof of Proposition 11.Continuing with the proof of Lemma 7, suppose that C n → ∈ C ( ˆ G m , ˆ γ m ) with each column of ˆ G m having no morethan two nonzero rows. We can group the columns into six groups according to which two rows have zero (it may bethat a column has more than two zeros, in which case we just select one group to place it in). By repeatedly applyingProposition 11, we can replace ˆ G m with a matrix ˆ G (cid:48) m such that each group has at most one column with two nonzeroelements; the rest of the columns in that group have at most just one nonzero element. We then repeatedly applyProposition 12 to remove multiple columns with the same single nonzero row. In the end, we arrive at the following: C n → ∈ C ( ˆ G m , ˆ γ m ) ⇔ C n → ∈ (cid:92) j C ( ˆ G m,j , ˆ γ m ) , (G14)where each ˆ G m,j has at most ten nonzero columns corresponding to the diﬀerent ways that no more than two nonzeroelements can occupy a column. That is, up to a permutation of columns, each ˆ G m,j will have the formˆ G m,j =  a b c g · · · a d e h · · · b d f i · · · c e f j · · ·  . (G15)The ﬁnal step is to remove the block of diagonal elements [ g, h, i, j ]. To do this, observe that we absorb any of thesediagonal elements into an earlier column, provided that the row contains the largest element in that column. Forexample, if f > f , then we can replace ˆ G m,j withˆ G (cid:48) m,j =  a b c g · · · a d e h · · · b d f i · · · c e f + j · · ·  , (G16)and we can easily see that C n → ∈ C ( ˆ G m,j , ˆ γ m ) iﬀ C n → ∈ C ( ˆ G (cid:48) m,j , ˆ γ m ). By considering the maximum element ineach of the ﬁrst six columns, we can perform this replacement for at least three of the four elements [ g, h, i, j ]. If wecan do this for all four elements, then the proof is complete. On the other hand, if we can only remove three of theseelements, then we will obtain a matrix ˆ G (cid:48)(cid:48) m,j of the form (up to row/column permutations)ˆ G (cid:48)(cid:48) m,j =  a b c g · · · a d e · · · b d f · · · c e f · · ·  (G17)with a , b , c not having the largest values in their respective columns. In this case, we construct the matrixˆ G (cid:48)(cid:48)(cid:48) m,j =  a + g b + g c + g · · · a + g d e · · · b + g d f · · · c + g e f · · ·  , (G18)9from which it can be veriﬁed that C n → ⊂ C ( ˆ G (cid:48)(cid:48) m,j , ˆ γ m ) iﬀ C n → ⊂ C ( ˆ G (cid:48)(cid:48)(cid:48) m,j , ˆ γ m + 2 g ). Appendix H: Proof of Theorem 3

In this section we provide two propositions that support the proof of Theorem 3. Recall that a d -dimensional partialreplacer channel is a quantum channel having the form R µ ( X ) = µX + (1 − µ )Tr[ X ] σ, (H1)where 1 ≥ µ ≥ σ is some ﬁxed density matrix, and X is a quantum state on a d -dimensional Hilbert space.Note that the partial erasure channel E µ corresponds to σ being an erasure ﬂag | E (cid:105)(cid:104) E | , where | E (cid:105) is orthogonal to {| (cid:105) , · · · , | d (cid:105)} . We ﬁrst show that the lower bound of κ ( R µ ) ≥ (cid:100) µd + (1 − µ ) (cid:101) (see Eq. (15)) is not improved by anychoice of states { ρ x } x , POVM { Π y } y , or ambiguous guessing game G n,n (cid:48) n (cid:48) ,d with k = n (cid:48) . Proposition 13.

The maximum likelihood score for any classical channel P R µ generated using a partial replacerchannel R µ is bounded as (cid:104) G ML , P R µ (cid:105) ≤ µd + (1 − µ ) (H2)where G ML is any maximum likelihood facet satisfying Proposition 2(i). Proof.

In this proof, we ﬁrst consider the unlifted maximum likelihood G n (cid:48) ML = I n (cid:48) where n = n (cid:48) (see Appendix D 2),and then generalize across all input/output liftings taking G n (cid:48) ML → G ML ∈ R m (cid:48) × m where m (cid:48) , m ≥ n (cid:48) . To begin wemaximize (cid:104) G n (cid:48) ML , P R µ (cid:105) over the quantum states { ρ x } x and POVM { Π y } y ,max (cid:104) G n (cid:48) ML , P R µ (cid:105) = max { ρ x } x , { Π y } y (cid:88) x = y Tr (cid:104) Π y R µ (cid:0) ρ x (cid:1)(cid:105) (H3)= max { ρ x } x , { Π y } y (cid:88) x = y µ Tr (cid:104) Π y ρ x (cid:105) + (1 − µ )Tr (cid:104) Π y σ (cid:105) (H4) ≤ max { Π y } y (cid:88) y µ Tr (cid:104) Π y (cid:105) + (1 − µ )Tr (cid:104) Π y σ (cid:105) (H5)= µd + (1 − µ ) , (H6)where line (H5) uses the fact that Tr[Π y ρ x ] ≤ Tr[Π y ] for any choice of Π y and ρ x while line (H6) results from (cid:80) y Tr[Π y ] = d and (cid:80) y Tr[Π y σ ] = Tr[ σ ] = 1. A simple example that achieves this bound is the scenario where Alicesends orthogonal states {| x (cid:105)(cid:104) x |} dx =1 and Bob measures with a similar POVM {| y (cid:105)(cid:104) y |} dy =1 , then (cid:104) G d ML , P R µ (cid:105) = d (cid:88) y = x =1 Tr (cid:104) | y (cid:105)(cid:104) y |R µ (cid:0) | x (cid:105)(cid:104) x | (cid:1)(cid:105) (H7)= µ d (cid:88) y =1 Tr (cid:104) | y (cid:105)(cid:104) y | y (cid:105)(cid:104) y | (cid:105) + (1 − µ ) d (cid:88) y =1 Tr (cid:104) | y (cid:105)(cid:104) y | σ (cid:105) (H8)= µd + (1 − µ ) . (H9)In general, the upper bound is achieved whenever Π y ρ x = Π y for all x ∈ [ n ] and y ∈ [ n (cid:48) ]. Note that this requiresrank(Π y ) = rank( ρ x ) = 1 and Π y || ρ x .To extend the bound (cid:104) G n (cid:48) ML , P R µ (cid:105) ≤ µd + (1 − µ ) to all liftings of G n (cid:48) ML , we make two observations. First, notethat the input lifting taking G n (cid:48) ML → G (cid:48)(cid:48) ML ∈ R n (cid:48) × m contains ( m − n ) all-zero columns. These all-zero columns of G (cid:48)(cid:48) ML do not contribute to the inner product (cid:104) G (cid:48)(cid:48) ML , P R µ (cid:105) , and therefore, cannot increase the inner product beyond µd +(1 − µ ). Second, observe that the output lifting taking G n (cid:48) ML → G (cid:48) ML ∈ R ( n (cid:48) +1) × n requires a new POVM { Π (cid:48) y } n (cid:48) +1 y =1 (cid:80) n (cid:48) +1 y =1 Π (cid:48) y = I d . Furthermore, one column x of G (cid:48) ML has two nonzero elements in rows y and y (cid:48) where G (cid:48) y,x = G (cid:48) y (cid:48) ,x = 1. In this case, two POVM elements Π (cid:48) y and Π (cid:48) y (cid:48) are both optimized against the state ρ x .However, the constraint Tr[Π (cid:48) y (cid:48) ρ x ] + Tr[Π (cid:48) y ρ x ] ≤ ρ x and POVM. Therefore, the inner product (cid:104) G (cid:48) ML , P R µ (cid:105) ≤ µd + (1 − µ ). The argument applied for the output lifting holds in general where one or more columns x contain at least two non-zero elements. Thus, the upper bound in Eq. (H6) holds for any input/output liftingtaking G n (cid:48) ML → G ML ∈ R m (cid:48) × m where min { m, m (cid:48) } ≥ n (cid:48) . This concludes the proof.The upper bound on the maximum likelihood score from Proposition 13 serves as a lower bound on the signalingdimension of the partial replacer channel κ ( R µ ). This follows from the fact that if P R µ / ∈ M n → n (cid:48) r , then κ ( R µ ) > r Furthermore, the integer nature of the signaling dimension implies that κ ( R µ ) ≥ (cid:100) µd + (1 − µ ) (cid:101) . We now turn tocertify the signaling dimension of the partial erasure channel. Proposition 14.

The signaling dimension of of a d -dimensional partial erasure channel is, κ ( E µ ) = min { d, (cid:100) µd + 1 (cid:101)} . (H10) Proof.

Let the classical channel P E µ be induced by the partial erasure channel E µ via Eq. (1) for any collection ofquantum states { ρ x } x and POVM { Π y } y . The transition probabilities are then expressed P E µ ( y | x ) = µP id d ( y | x ) + (1 − µ ) P | E (cid:105) ( y ) , (H11)where P id d ( y | x ) = Tr[Π y ρ x ] and P | E (cid:105) ( y ) = Tr[Π y | E (cid:105)(cid:104) E | ]. Since the simulation protocol for partial replacer channelscan faithfully simulate P E µ , the upper bound κ ( R µ ) ≤ (cid:100) µd + 1 (cid:101) holds (see the proof of Theorem 3 in the main text).Therefore, min { d, (cid:100) µd + 1 (cid:101)} ≥ κ ( E µ ). To establish a lower bound on κ ( E µ ) we consider the channel P E µ ∈ P d → ( d +1) generated by the scenario where Alice sends the computational basis states {| x (cid:105)(cid:104) x |} di =1 and Bob measures with thePOVM {| y (cid:105)(cid:104) y |} d +1 y =1 where | d + 1 (cid:105) = | E (cid:105) , P E µ = d (cid:88) x =1 d +1 (cid:88) y =1 Tr (cid:104) | y (cid:105)(cid:104) y |E µ ( | x (cid:105)(cid:104) x | ) (cid:105) | y (cid:105)(cid:104) x | (H12)= d (cid:88) x =1 d +1 (cid:88) y =1 (cid:16) µ Tr (cid:104) | y (cid:105)(cid:104) y | x (cid:105)(cid:104) x | (cid:105) + (1 − µ )Tr (cid:104) | y (cid:105)(cid:104) y | E (cid:105)(cid:104) E | (cid:105)(cid:17) | y (cid:105)(cid:104) x | (H13)= µ d (cid:88) x =1 | x (cid:105)(cid:104) x | + (1 − µ ) d (cid:88) x =1 | E (cid:105)(cid:104) x | . (H14)As demonstrated in Proposition 13, P E µ achieves the maximum likelihood upper bound for partial replacer channels, (cid:104) G ML , P E µ (cid:105) = µd + (1 − µ ). In fact, this bound also holds for non-orthogonal quantum states { ρ x } x ∈ [ n ] where n > d .To improve the lower bound on κ ( E µ ) beyond Proposition 13, we consider the ambiguous polytope A ( n (cid:48) − → n (cid:48) ( n (cid:48) − ,r withambiguous guessing facets G n (cid:48) ,r ? that are tight Bell inequalities of C n → n (cid:48) r (see Appendix D 3). Our goal is to ﬁnd thesmallest integer r such that P E µ ∈ A ( n (cid:48) − → n (cid:48) ( n (cid:48) − ,r , that is, (cid:104) G n (cid:48) ,r ? , P E µ (cid:105) ≤ r ( n (cid:48) − r ) is satisﬁed. Consider the erasurechannel P E µ ∈ P d → ( d +1) described by Eq. (H14). We ﬁnd that the inequality r ( n (cid:48) − r ) ≥ (cid:104) G n (cid:48) ,r ? , P E µ (cid:105) = ( n (cid:48) − r ) µd + (1 − µ )( n (cid:48) − , (H15)is violated if P E µ / ∈ A ( n (cid:48) − → n (cid:48) ( n (cid:48) − ,r for n (cid:48) − ≥ r ≥

2. Note that in our example n (cid:48) = d + 1, however, this procedureholds for any n (cid:48) > d . Rearranging inequality (H15) into the form,0 ≥ r − r ( µd + n (cid:48) ) + µdn (cid:48) + (1 − µ )( n (cid:48) − , (H16)allows us to ﬁnd the values of r for which inequality (H15) is satisﬁed by solving for the zeros r ± of the quadratic onthe RHS of Eq. (H16),1 r ± = 12 ( µd + n (cid:48) ) ± (cid:112) ( n (cid:48) − µd ) − − µ )( n (cid:48) − . (H17)Since the parabola of Eq. (H16) is concave up, all integer values of r ∈ [ r − , r + ] satisfy inequality (H15). Furthermore,the smallest integer for which the inequality is satisﬁed is r = (cid:100) r − (cid:101) . Therefore, the signaling dimension is bounded as κ ( E µ ) ≥ r = (cid:24)

12 ( µd + n (cid:48) ) − (cid:112) ( n (cid:48) − µd ) − − µ )( n (cid:48) − (cid:25) . (H18)The value of r in Eq. (H18) satisﬁes the facet inequality (H15) for all allowed values of n (cid:48) , µ , and d . Note that n (cid:48) isa free parameter which we can choose as any integer n (cid:48) ≥ d + 1. In our example, P E µ has n (cid:48) = ( d + 1) which obtainsthe lower bound κ ( E µ ) ≥ (cid:100) µd + 1 (cid:101) . To see this, we substitute n (cid:48) = ( d + 1) into Eq. (H18) and perform some algebra, r = (cid:24) (cid:16) µd + d + 1 (cid:17) − (cid:114)(cid:16) d (1 − µ ) + 1 (cid:17) − d (1 − µ ) (cid:25) = (cid:38) (cid:16) µd + d + 1 (cid:17) − (cid:114)(cid:16) d (1 − µ ) − (cid:17) (cid:39) (H19)= (cid:24) (cid:16) µd + d + 1 (cid:17) − (cid:16) d (1 − µ ) − (cid:17)(cid:25) (H20)= (cid:24)

12 ( µd + 1) + 12 ( µd + 1) (cid:25) (H21)= (cid:100) µd + 1 (cid:101) . (H22)Hence κ ( E µ ) ≥ r = (cid:100) µd + 1 (cid:101) . Additionally, substituting n (cid:48) > d + 1 into Eq. (H18) results in a necessarily smallervalue or r therefore n (cid:48) = d + 1 is a maximum.It is important to note that the lower bound κ ( E µ ) ≥ (cid:100) µd + 1 (cid:101) only holds for r ≤ n (cid:48) − G n (cid:48) ,r ? is not a facetfor signaling polytopes C ( n (cid:48) − → n (cid:48) r with r > n (cid:48) −

2. Therefore, we must consider the edge case where r = n (cid:48) − n ,that is, the case where the trivial upper bound of Eq. (3) is obtained. From Theorem 1 Condition (ii) we know that C n → n (cid:48) n − = ∩ k = n n (cid:48) A n → n (cid:48) k,n − . It follows for the edge case r = n (cid:48) − n that if a channel P E µ / ∈ A ( n (cid:48) − → n (cid:48) ( n (cid:48) − ,r , then κ n → n (cid:48) ( E µ ) = min { n, n (cid:48) } = ( n (cid:48) − κ ( E µ ) is proven to be tight with the upper bound. To illustrate this casewe consider inequality (H16) and substitute r = n (cid:48) − ≥ ( n (cid:48) − n (cid:48) + 4) − ( n (cid:48) − µd + n (cid:48) ) + µdn (cid:48) + (1 − µ )( n (cid:48) −

1) (H23) ≥ − n (cid:48) + 4 + 2 µd + 2 n (cid:48) + (1 − µ ) n (cid:48) − (1 − µ ) . (H24)Next, we substitute d = n (cid:48) − ≥ − n (cid:48) + 4 + 2 µ ( n (cid:48) −

1) + 2 n (cid:48) + (1 − µ ) n (cid:48) − (1 − µ ) (H25) ≥ n (cid:48) ( µ −

1) + 3 − µ (H26) ≥ − n (cid:48) + µ ( n (cid:48) − . (H27)Rearranging inequality (H27), we ﬁnd that it is satisﬁed iﬀ , n (cid:48) − n (cid:48) − ≥ µ . Therefore, when µ > n (cid:48) − n (cid:48) − , inequality (H15) isviolated and, by Theorem 1(ii) we certify that κ ( E µ ) = d . Considering this edge case, we arrive at the conclusion that κ ( E µ ) ≥ min { d, (cid:100) µd +1 (cid:101)} which is exactly the upper bound min { d, (cid:100) µd +1 (cid:101)} ≥ κ ( E µ ). That is, the signaling dimensionof the erasure channel is bounded tightly from above and below from which it follows, κ ( E µ ) = min { d, (cid:100) µd + 1 (cid:101)}(cid:101)}