[PDF] Relating the Resource Theories of Entanglement and Quantum Coherence

Abstract

Full PDF

RRelating the Resource Theories of Entanglement and Quantum Coherence

Eric Chitambar and Min-Hsiu Hsieh Department of Physics and Astronomy, Southern Illinois University, Carbondale, Illinois 62901, USA Centre for Quantum Computation & Intelligent Systems (QCIS),Faculty of Engineering and Information Technology (FEIT),University of Technology Sydney (UTS), NSW 2007, Australia

Quantum coherence and quantum entanglement represent two fundamental features of non-classical systemsthat can each be characterized within an operational resource theory. In this paper, we unify the resource theoriesof entanglement and coherence by studying their combined behavior in the operational setting of local incoherentoperations and classical communication (LIOCC). Speciﬁcally we analyze the coherence and entanglementtrade-o ﬀ s in the tasks of state formation and resource distillation. For pure states we identify the minimumcoherence-entanglement resources needed to generate a given state, and we introduce a new LIOCC monotonethat completely characterizes a state’s optimal rate of bipartite coherence distillation. This result allows usto precisely quantify the di ﬀ erence in operational powers between global incoherent operations, LIOCC, andlocal incoherent operations without classical communication. Finally, a bipartite mixed state is shown to havedistillable entanglement if and only if entanglement can be distilled by LIOCC, and we strengthen the well-known Horodecki criterion for distillability. The ability for quantum systems to exist in “superpositionstates” reveals the wave-like nature of matter and representsa strong departure from classical physics. Systems in suchsuperposition states are often said to possess quantum coher-ence. There has currently been much interest in constructinga resource theory of quantum coherence [1–11], in part be-cause of recent experimental and numerical ﬁndings that sug-gest quantum coherence alone can enhance or impact physicaldynamics in biology [12–15], transport theory [2, 16, 17], andthermodynamics [18, 19].In a standard resource-theoretic treatment of quantum co-herence, the free (or “incoherent”) states are those that are di-agonal in some ﬁxed reference (or “incoherent”) basis Di ﬀ er-ent classes of allowed (or “incoherent”) operations have beenproposed in the literature [1, 3, 9–11] (see also [20, 21] forcomparative studies of these approaches), however an essen-tial requirement is that the incoherent operations act invari-antly on the set of diagonal density matrices. Incoherent op-erations can then be seen as one of the most basic generaliza-tions of classical operations (i.e. stochastic maps) since theiraction on diagonal states can always be simulated by classicalprocessing. Note also that most experimental setups will havea natural basis to work in, and arbitrary unitary time evolu-tions might be physically di ﬃ cult to implement. In these set-tings, there are practical advantages to identifying “diagonalpreserving” operations as being “free” relative to coherent-generating ones.One does not need to look far to ﬁnd an important con-nection between incoherent operations and quantum entangle-ment, the latter being one of the most important resources inquantum information processing [22]. Consider the task ofentanglement generation. This procedure is usually modeledby bringing together two or more quantum systems initiallyin a product state ρ ⊗ σ and then applying an entangling jointoperation. However, using only incoherent operations, thiswill not be possible unless either ρ or σ already possessescoherence. The reason is that when ρ ⊗ σ is an incoherent bipartite state, any incoherent operation acting on both sys-tems will leave the joint state incoherent (and hence unen-tangled). On the other hand, if the joint state is | + (cid:105) | (cid:105) , with |±(cid:105) = √ / | (cid:105) ± | (cid:105) ), then an application of CNOT yields theentangled state √ / | (cid:105) + | (cid:105) ). This example reveals thatcoherence, or at least coherent-generating operations, is a pre-requisite for producing entanglement. In fact, as Streltsov etal. have shown [23], every coherent state can be used for thegeneration of entanglement in a manner similar to this exam-ple.Notice that the transformation | + (cid:105) | (cid:105) → √ / | (cid:105) + | (cid:105) )requires performing an entanglement-generating incoherentoperation. To capture both coherence and entanglement in acommon resource-theoretic framework, one must modify thescenario by adopting the “distant lab” perspective in whichtwo or more parties share a quantum system but they are spa-tially separated from one another [22, 24]. In this setting,entanglement cannot be generated between the parties and itbecomes another resource in play. When the constraint oflocality is added to the incoherent framework, the allowableoperations for Alice and Bob are then local incoherent op-erations and classical communication (LIOCC). The hybridcoherence-entanglement theory described here is similar inspirit to previous work on the locality-restricted resource the-ories of purity [25–28] and asymmetry [29]. We admittedlydo not point to a speciﬁc biological or thermodynamic pro-cess as motivation for studying LIOCC – although, one couldenvision potential physical applications in certain coherence-enhanced transport networks where the nodes interact throughclassical signaling. Rather, we promote LIOCC as the naturalsetting to explore the interplay between coherence and entan-glement as resource primitives in quantum information theory.For example, how much local coherence and shared entan-glement do Alice ( A ) and Bob ( B ) need to prepare a particu-lar bipartite state ρ AB using LIOCC (Fig. 1 (a))? Conversely,how much coherence and entanglement can be distilled from agiven state ρ AB using LIOCC (Fig. 1 (b))? The latter task can a r X i v : . [ qu a n t - ph ] J un Figure 1. (a) An LIOCC formation protocol asymptotically gen-erates an arbitrary state ρ AB from an initial supply of local coherentbits ( Φ A / Φ B ) and shared entanglement bits ( Φ A (cid:48) B (cid:48) ). (b) An LIOCCdilution protocol performs the reverse transformation. also be seen as type of collaboative randomness distillation,where Alice and Bob work together to generate local sourcesof genuine randomness for each other [6].Our main results are the following. (1) We completelycharacterize the achievable coherence-entanglement rate re-gion for the task of asymptotically generating some pure state | Ψ (cid:105) AB (Theorem 1). (2) We introduce a new LIOCC mono-tone that combines both coherence and entanglement mea-sures (Theorem 4), and we show it quantiﬁes the optimal ratein which Alice and Bob can simultaneously distill local co-herence from a pure state. (3) We identify an achievable rateregion for the coherence-entanglement distillation of a purestate and show optimality at almost all corner points (Theo-rem 5). (4) In analogy to Refs. [25–28], we introduce andcompute for pure states the nonlocal coherence deﬁcit and theLIOCC coherence deﬁcit (Eqns. (8)–(9)). (5) We show thatLIOCC operations alone are su ﬃ cient to decide whether en-tanglement can be distilled from a mixed state using generalLOCC.Let us begin by brieﬂy describing the theory of bipartitecoherence in more detail. Assigned to both Alice and Bob’ssystem is a particular basis called their incoherent basis. Wedenote Alice’s incoherent basis by {| x (cid:105) A } d A − x = and Bob’s inco-herent basis by {| y (cid:105) B } d B − y = so that the incoherent basis for theirjoint system H A ⊗ H B is {| x (cid:105) A | y (cid:105) B } d A − , d B − x , y = . Then any bipar-tite state belongs to the set of incoherent states I i ﬀ it has theform σ AB = (cid:88) xy p xy | x (cid:105)(cid:104) x | A ⊗ | y (cid:105)(cid:104) y | B . (1)Following the framework of Baumgratz et al. [3], a local inco-herent operation for Alice is given by a complete set of Krausoperators { K α } α such that ( K α ⊗ I B ) ρ AB ( K α ⊗ I B ) † / tr [ K α K † α ⊗ I B ρ AB ] ∈ I for all ρ AB ∈ I . If ever she introduces a localancilla system H A (cid:48) , the incoherent basis for this additionalsystem is labeled in the same way {| x (cid:105) A (cid:48) } d A (cid:48) − x = . Analogousstatements characterize the notion of incoherent operations on Bob’s system. In the LIOCC setting, Alice and Bob take turnsperforming local incoherent operations and sharing their mea-surement data over a classical communication channel.The canonical resource states in the bipartite LIOCC frame-work are the maximally coherent bits (CoBits), | Φ A (cid:105) : = √ / | (cid:105) A + | (cid:105) A ) and | Φ B (cid:105) : = √ / | (cid:105) B + | (cid:105) B ) for Aliceand Bob’s systems respectively [3], as well as the entangledstate | Φ AB (cid:105) : = √ / | (cid:105) + | (cid:105) ), which we will call the max-imally coherent entangled bit (eCoBit). Notice that unlike en-tanglement theory, only those bipartite states related to | Φ AB (cid:105) by an incoherent local unitary transformation can be regardedas equivalent to | Φ AB (cid:105) . For example, as we will see below,one eCoBit cannot be incoherently transformed into the state √ / | + (cid:105) + | −(cid:105) ), even asymptotically.We now describe the primary tasks studied in this paper,which can be seen as the resource-theoretic tasks recently an-alyzed by Winter and Yang in Ref. [7] but now with additionallocality constraints. All of the detailed proofs can be found inthe Supplemental Material, and here we just present the re-sults. Let us begin with the problem of asymptotic state for-mation shown in Fig. 1 (a). A triple ( R A , R B , E co ) is an achiev-able coherence-entanglement formation triple for the state ρ AB if for every (cid:15) > L and in-teger n such that L (cid:16) Φ ⊗(cid:100) n ( R A + (cid:15) ) (cid:101) A ⊗ Φ ⊗(cid:100) n ( R B + (cid:15) ) (cid:101) B ⊗ Φ ⊗(cid:100) n ( E co + (cid:15) ) (cid:101) A (cid:48) B (cid:48) (cid:17) (cid:15) ≈ ρ ⊗ n . Dual to the task of formation is resource distillation, as de-picted in Fig. 1 (b). A triple ( R A , R B , E co ) is an achievable coherence-entanglement distillation triple for ρ AB if for every (cid:15) > L and integer n suchthat L ( ρ ⊗ n ) (cid:15) ≈ Φ ⊗(cid:98) n ( R A − (cid:15) ) (cid:99) A ⊗ Φ ⊗(cid:98) n ( R B − (cid:15) ) (cid:99) B ⊗ Φ ⊗(cid:98) n ( E co − (cid:15) ) (cid:99) AB . As we are dealing with asymptotic transformations, we shouldexpect the optimal rate triples to be given by entropic quan-tities. Recall that for a bipartite state ω AB , the von Neu-mann entropy of, say, Alice’s reduced state ω A is given by S ( A ) ω = − tr[ ω A log ω A ]. The quantum mutual informationof ω AB takes the form I ( A : B ) ω : = S ( A ) ω − S ( A | B ) ω , where S ( A | B ) ω : = S ( AB ) ω − S ( B ) ω . For a pure state | Ψ (cid:105) AB , the en-tropy of entanglement E( Ψ ) : = S ( A ) Ψ = S ( B ) Ψ is the uniquemeasure of entanglement in the asymptotic regime [30], andit can be generalized to mixed states as the entanglement offormation E F ( ρ ) [31]. We will also be interested in these en-tropic quantities after sending our state ω AB through the com-pletely dephasing channel, ∆ ( ω ) : = (cid:80) xy | xy (cid:105)(cid:104) xy | ω | xy (cid:105)(cid:104) xy | . Itwill be convenient to think of ∆ ( ω ) as encoding random vari-ables XY having joint distribution p ( x , y ) = (cid:104) xy | ∆ ( ω ) | xy (cid:105) . Forthis reason, we follow standard convention and replace the la-bels ( A , B ) → ( X , Y ) when discussing a dephased state.Our ﬁrst main result completely characterizes the achiev-able rate region for the LIOCC formation of bipartite purestates. Theorem 1.

For a pure state | Ψ (cid:105) AB the following triples areachievable coherence-entanglement formation rates ( R A , R B , E co ) = (cid:0) , S ( Y | X ) ∆ ( Ψ ) , S ( X ) ∆ ( Ψ ) (cid:1) (2)( R A , R B , E co ) = (cid:0) S ( X ) ∆ ( Ψ ) , S ( Y | X ) ∆ ( Ψ ) , E( Ψ ) (cid:1) (3)( R A , R B , E co ) = (cid:0) , , S ( XY ) ∆ ( Ψ ) (cid:1) (4) as well as the points obtained by interchanging A ↔ B inEqns. (51) – (53) . Moreover, these points are optimal in thesense that any achievable rate triple must satisfy (i) E co ≥ E( Ψ ) , (ii) R A + R B ≥ S ( XY ) ∆ ( Ψ ) , (iii) R B + E co ≥ S ( XY ) ∆ ( Ψ ) . For a mixed state ρ AB , a formation protocol can be con-structed that achieves the average rates for any ensemble { p k , | ϕ k (cid:105) AB } such that ρ = (cid:80) k p k | ϕ k (cid:105)(cid:104) ϕ k | [31]. For instance, onecan consider an ensemble whose average bipartite coherenceattains the coherence of formation C F for ρ ; i.e. it is an ensem-ble { p k , | ϕ k (cid:105) AB } for ρ that minimizes (cid:80) k p k S ( XY ) ∆ ( ϕ k ) [6, 7].Then for a mixed state ρ , the coherence rate sum R A + R B of Eq. (52) can attain the coherence of formation C F ( ρ ). Inthe global setting where Alice and Bob are allowed to per-form joint operations across system AB , it has been shown that C F ( ρ ) quantiﬁes the optimal coherence consumption rate forgenerating ρ using global incoherent operations [7]. Our re-sult then intuitively says that in the restricted LIOCC setting,the same coherence rate is su ﬃ cient to generate ρ , howeverthey now need additional entanglement at a rate (cid:80) k p k E( ϕ k ),where the ensemble { p k , | ϕ k (cid:105) AB } minimizes the average coher-ence of ρ .The proof of Theorem 1 uses two lemmas that may be ofindependent interest. The ﬁrst generalizes a result presented inRef. [3], and the second is an incoherent version of Nielsen’sMajorization Theorem [32]. Lemma 2.

An arbitrary d × d unitary operator U can be per-formed on a system using incoherent operations and (cid:100) log d (cid:101) CoBits.

Lemma 3.

Suppose | ψ (cid:105) AB and | φ (cid:105) AB have reduced density ma-trices that are diagonal in the incoherent bases for both par-ties and both states. Then | ψ (cid:105) → | φ (cid:105) by LIOCC i ﬀ the squaredSchmidt coe ﬃ cients of | φ (cid:105) majorize those of | ψ (cid:105) . Next, we introduce a new LIOCC monotone and provideits operational interpretation. To do so, we recall the re-cently studied task of assisted coherence distillation, whichinvolves one party helping another distill as much coherenceas possible through general quantum operations performedon the helper side and incoherent operations performed onthe distillation side [33]. For a given state ρ AB , the opti-mal asymptotic rate of coherence distillation on Bob’s sidewhen Alice helps is denoted by C A | Ba ( ρ AB ). When the roles areswitched, the optimal asymptotic rate is denoted by C B | Aa ( ρ AB ).It was shown in Ref. [33] that C A | Ba ( ρ AB ) = S ( Y ) ∆ ( Ψ ) and C B | Aa ( ρ AB ) = S ( X ) ∆ ( Ψ ) . With these quantities in hand, we de-ﬁne for a bipartite pure state | Ψ (cid:105) AB the function C L ( Ψ ) : = C A | Ba ( Ψ ) + C B | Aa ( Ψ ) − E( Ψ ) = S ( X ) ∆ ( Ψ ) + S ( Y ) ∆ ( Ψ ) − E( Ψ ) . (5) Its extension to mixed states can be deﬁned by a convex roofoptimization [34]: C L ( ρ AB ) = inf { p k , | ϕ k (cid:105) AB } (cid:80) k p k C L ( ϕ ABk ) forwhich ρ AB = (cid:80) k p k | ϕ k (cid:105)(cid:104) ϕ k | . Theorem 4.

The function C L is an LIOCC monotone. We note that this is the ﬁrst monotone of its kind since it be-haves monotonically under LIOCC, but not general LOCC oreven under LQICC, the latter being an operational class inwhich only one of the parties is required to perform incoher-ent operations (as opposed to LIOCC where both parties mustperform incoherent operations) [33].Using the monotonicity of C L , we are able to derive tightupper bounds on coherence distillation rates. Theorem 5.

For a pure state | Ψ (cid:105) AB the following triples areachievable coherence-entanglement distillation rates ( R A , R B , E co ) = (cid:0) S ( X ) ∆ ( Ψ ) − E( Ψ ) , S ( Y ) ∆ ( Ψ ) , (cid:1) (6)( R A , R B , E co ) = (cid:0) , S ( Y | X ) ∆ ( Ψ ) , I ( X : Y ) ∆ ( Ψ ) (cid:1) , (7) as well as the points obtained by interchanging A ↔ B inEqn. (65) and (66) . Moreover, these points are optimal in thesense that any achievable rate triple must satisfy (i) R A + R B ≤ C L ( Ψ ) and (ii) R B + E co ≤ S ( Y ) ∆ ( Ψ ) . This theorem endows C L with the operational meaning ofquantifying how much local coherence can be simultaneouslydistilled from a pure state. For a state | Ψ (cid:105) the maximum thatAlice can help Bob distill coherence is C A | Ba while the maxi-mum that Bob can help Alice is C B | Aa . Evidently, they cannotboth simultaneously help each other at these optimal rates. In-stead, they are bounded away from simultaneous optimality ata rate equaling their shared entanglement.It is still unknown the precise range of achievable distilla-tion triples ( R A , R B , E comax ), where E comax is the maximum eCo-Bit distillation rate. While we are able to prove that E comax is theregularized version of I ( X : Y ) ∆ ( Ψ ) optimized over all LIOCCprotocols, we have no single-letter expression for this rate nordo we know the achievable local coherence rates for optimalprotocols.A natural question is whether E comax ( Ψ ) = E( Ψ ). While thisquestion remains open, we can show that E( Ψ ) is achievableif the Schmidt basis of the ﬁnal state need not be incoherent.More precisely, we say a number R is an achievable LIOCCentanglement distillation rate if for every (cid:15) >

0, there exists anLIOCC protocol L acting on n copies of Ψ such that L ( Ψ ⊗ n ) (cid:15) ≈ Λ d , where Λ d is a d ⊗ d maximally entangled pure state (i.e. Λ A = Λ B = I / d ) with n log d > R − (cid:15) . The largest achievabledistillation rate will be denoted by E LIOCCD ( Ψ ). Theorem 6. E LIOCCD ( Ψ ) = E( Ψ ) . It is interesting to compare the coherence distillation ratesusing incoherent operations under di ﬀ erent types of local-ity constraints. In Refs. [25–28], similar comparisons weremade in terms of purity (or work-information) extraction. Let C GlobalD , C LIOCCD , and C LIOD denote the optimal rate sum R A + R B of local coherence distillation using global incoherent opera-tions, LIOCC, and local incoherent operations (with no clas-sical communication), respectively. In complete analogy to[25–28], we deﬁne the nonlocal coherence deﬁcit of a bipar-tite state ρ AB as δ ( ρ AB ) = C GlobalD ( ρ AB ) − C LIOCCD ( ρ AB ) andthe LIOCC coherence deﬁcit as δ c ( ρ AB ) = C LIOCCD ( ρ AB ) − C LIOD ( ρ AB ). Intuitively, the quantity δ ( ρ AB ) quantiﬁes the co-herence in a state that can only be accessed using nonlocal in-coherent operations. Likewise, δ c ( ρ AB ) gives the coherence in ρ AB that requires classical communication to be obtained. Theresults of Winter and Yang imply that C GlobalD ( Ψ ) = S ( XY ) ∆ ( Ψ ) and C LIOD ( Ψ ) = S ( X ) ∆ ( Ψ ) + S ( Y ) ∆ ( Ψ ) − Ψ ) for a bipartite purestate | Ψ (cid:105) AB [35]. Combined with Theorem 5, we can computethe two coherence deﬁcits for pure states: δ ( Ψ ) = E( Ψ ) − I ( X : Y ) ∆ ( Ψ ) (8) δ c ( Ψ ) = E( Ψ ) . (9)It is curious that the entanglement E( Ψ ) quantiﬁes the coher-ence gain unlocked by classical communication. But note thata similar phenomenon exists in the resource theory of purity.Namely, the quantum deﬁcit δ ( Ψ ) and classical deﬁcit δ c ( Ψ )measure the analogous di ﬀ erences in local purity distillationby so-called “closed operations” (CO), and they are given by δ ( Ψ ) = δ c ( Ψ ) = E( Ψ ) [25, 26]. For the task of distilling Co-Bits, every protocol using incoherent operations can be seenas one using closed operations by accounting for all ancillasystems at the start of protocol [36]. However, closed opera-tions allow for arbitrary unitary rotations, which are forbiddenin coherence theory. The term I ( X : Y ) ∆ ( Ψ ) in δ ( Ψ ) identiﬁesprecisely the basis dependence in coherence theory and showshow this decreases the nonlocal coherence deﬁcit δ ( Ψ ) relativeto δ ( Ψ ). On the other hand, there is evidently no basis depen-dency in the LIOCC classical deﬁcit δ c ( Ψ ) and it is equivalentto δ c ( Ψ ).Although our distillation results so far have only applied topure states, we can deduce a very general result concerningthe distillability of mixed states. Theorem 7.

A mixed state ρ AB has (LOCC) distillable en-tanglement if and only if entanglement can be distilled usingLIOCC. The proof of this theorem is actually quite simple and usesthe fact that an arbitrary quantum operation can be simu-lated using incoherent operations and CoBits (Lemma 2). InRef. [33] it was shown how local coherence can always bedistilled for both Alice and Bob from multiple copies of everyentangled states using LIOCC. Hence for a su ﬃ ciently largenumber of any distillable entangled state ρ AB , Alice and Bobﬁrst distill su ﬃ cient local coherence using LIOCC, and thenthey simulate the LOCC protocol which distills entanglement.As shown in Ref. [37], a state ρ has distillable entanglementi ﬀ for some k there exists rank two operators A and B such thatthe (unnormalized) state A ⊗ B ρ ⊗ k A ⊗ B is entangled. By The-orem 5 and following the same argumentation of Ref. [37],we can further require that the A and B are incoherent oper-ators; that is, they have the form A = | (cid:105)(cid:104) α | + | (cid:105)(cid:104) α | and B = | (cid:105)(cid:104) β | + | (cid:105)(cid:104) β | where ∆ ( α ) : = ∆ ( | α (cid:105)(cid:104) α | ) is orthogonalto ∆ ( α ) : = ∆ ( | α (cid:105)(cid:104) α | ), and likewise for ∆ ( β ) : = ∆ ( | β (cid:105)(cid:104) β | )for ∆ ( β ) : = ∆ ( | β (cid:105)(cid:104) β | ). We are thus able to add an additionalcondition to the distinguishability criterion of Ref. [37]. Corollary 8.

A bipartite state ρ has distillable entanglementi ﬀ for any pair of orthonormal local bases B A = {| x (cid:105) A } and B B = {| y (cid:105) B } there exists some k and projectors P A = | α (cid:105)(cid:104) α | + | α (cid:105)(cid:104) α | and P B = | β (cid:105)(cid:104) β | + | β (cid:105)(cid:104) β | such that1. ( P A ⊗ P B ) ρ ⊗ k ( P A ⊗ P B ) is entangled,2. tr[ ∆ A ( α ) ∆ A ( α )] = tr[ ∆ B ( β ) ∆ B ( β )] = ,where ∆ Z is the completely dephasing map in the basis B ⊗ kZ .Conclusion: In this letter, we have investigated the relation-ship between entanglement and coherence in the framework oflocal incoherent operations and classical communication. Theﬁndings of this study suggest that indeed entanglement andcoherence are closely linked resources. For instance, Theorem5 shows that the entanglement of a state plays a crucial role inlimiting the amount of coherence that can be distilled from astate, a result highly reminiscent of the complementarity be-tween local and nonlocal information studied in Ref. [27]. In asimilar spirit, Theorem 7 shows that entanglement distillabil-ity can be studied through the lens of coherence theory. Thislatter result seems somewhat remarkable since despite coher-ence being a basis-dependent resource, its resource-theoreticanalysis can be used to draw conclusions about entangle-ment, a basis-independent resource. Future work will be con-ducted to see whether the strengthened distillability criterionof Corollary 8 can be useful in the long-standing search forNPT bound entanglement.Finally, we would like to comment on the particular typeof incoherent operations studied in this letter. As noted in theintroduction, there have been various proposals for the “free”class of operations in a resource theory of coherence. This let-ter has adopted the incoherent operations (IO) of Baumgratz et al. [3], where each Kraus operator in a measurement justneeds to be incoherence-preserving. While the class IO hasdrawbacks in terms of formulating a full physically consistentresource theory of coherence [11, 20], it nevertheless seemsunlikely that the results of this letter would remain true if otheroperational classes were considered. For example, the strictlyincoherent operations (SIO) proposed by Yadin et al. are un-able to convert one eCoBit into a CoBit [11]. Thus, we believethat the interesting connections between IO coherence theoryand entanglement demonstrated in this letter make a positivecase for why IO is important in quantum information theory,independent of any other motivation. In fact, one could evenput coherence aside and view LIOCC as just being a simpli-ﬁed subset of LOCC. As we have shown here, nontrivial con-clusions about entanglement can indeed be drawn by studyingLOCC from “the inside.” This approach is somewhat dual tothe standard practice of studying LOCC using more generalseparable operations (SEP), the chain of inclusions being LI-OCC ⊂ LOCC ⊂ SEP. Interesting future work would be toconsider more general connections between coherence non-generating and entanglement non-generating operations.During preparation of this manuscript, we learned of workby Streltsov and co-authors who have also initiated a studyinto local incoherent operations and classical communication[38].

Acknowledgments

We thank Alex Streltsov for fruitful exchanges on the topicof coherence distillation. EC is supported by the National Sci-ence Foundation (NSF) Early CAREER Award No. 1352326.MH is supported by an ARC Future Fellowship under GrantFT140100574. [1] J. Äberg, (2006), arXiv:quant-ph / , 033007 (2014).[3] T. Baumgratz, M. Cramer, and M. B. Plenio, Phys. Rev. Lett. , 140401 (2014).[4] T. R. Bromley, M. Cianciaruso, and G. Adesso, Phys. Rev. Lett. , 210401 (2015).[5] K. Korzekwa, M. Lostaglio, J. Oppenheim, and D. Jennings,(2015), arXiv:1506.07875.[6] X. Yuan, H. Zhou, Z. Cao, and X. Ma, Phys. Rev. A , 022124(2015).[7] A. Winter and D. Yang, Phys. Rev. Lett. , 120404 (2016).[8] U. Singh, M. N. Bera, A. Misra, and A. K. Pati, (2015),arXiv:1506.08186.[9] I. Marvian, R. W. Spekkens, and P. Zanardil, (2015),1510.06474.[10] A. Streltsov, (2015), arXiv:1511.08346.[11] B. Yadin, J. Ma, D. Girolami, M. Gu, and V. Vedral, (2015),arXiv:1512.02085.[12] S. Lloyd, J. Phys.: Conf. Series , 012037 (2011).[13] C.-M. Li, N. Lambert, Y.-N. Chen, G.-Y. Chen, and F. Nori,Sci. Rep. (2012), 10.1038 / srep00885.[14] S. F. Huelga and M. B. Plenio, Contemp. Phys. , 181 (2013).[15] N. Lambert, Y.-N. Chen, Y.-C. Cheng, C.-M. Li, G.-Y. Chen,and F. Nori, Nature Physics , 10 (2013).[16] P. Rebentrost, M. Mohseni, and A. Aspuru-Guzik, J. Phys.Chem. B , 9942 (2009).[17] B. Witt and F. Mintert, New J. Phys. , 093020 (2013).[18] M. Lostaglio, D. Jennings, and T. Rudolph, Nature Communi-cations , 6383 (2015).[19] V. Narasimhachar and G. Gour, Nature Communications ,7689 (2015).[20] E. Chitambar and G. Gour, (2016), 1602.06969.[21] I. Marvian and R. W. Spekkens, (2016), 1602.08049.[22] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki,Rev. Mod. Phys. , 865 (2009).[23] A. Streltsov, U. Singh, H. S. Dhar, M. N. Bera, and G. Adesso,Phys. Rev. Lett. , 020403 (2015).[24] M. B. Plenio and S. Virmani, Quant. Inf. Comput. , 1 (2007),quant-ph / , 180402 (2002).[26] M. Horodecki, K. Horodecki, P. Horodecki, R. Horodecki,J. Oppenheim, A. Sen(De), and U. Sen, Phys. Rev. Lett. ,100402 (2003). [27] J. Oppenheim, K. Horodecki, M. Horodecki, P. Horodecki, andR. Horodecki, Phys. Rev. A , 022307 (2003).[28] M. Horodecki, P. Horodecki, R. Horodecki, J. Oppenheim,A. Sen(De), U. Sen, and B. Synak-Radtke, Phys. Rev. A ,062307 (2005).[29] The resource theory of asymmetry [39–43] under locality con-straints has been studied in Refs. [44, 45]. Note that the pre-cise connection between coherence and asymmetry is rathersubtle, since in the latter one can allow for decoherence-freesubspaces when taking tensor products. For instance, if U (1) isthe local symmetry for Alice, then when considering two copiesof her system, the coherence-generating operation | (cid:105) A A → / √ | (cid:105) A A + | (cid:105) A A ) is allowed. See [20] and [21] for moredetails.[30] S. Popescu and D. Rohrlich, Phys. Rev. A , R3319 (1997).[31] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Woot-ters, Phys. Rev. A , 3824 (1996).[32] M. A. Nielsen, Phys. Rev. Lett. , 436 (1999).[33] E. Chitambar, A. Streltsov, S. Rana, M. N. Bera, G. Adesso,and M. Lewenstein, Phys. Rev. Lett. , 070402 (2016).[34] G. Vidal, J. Mod. Opt. , 355 (2000).[35] A. Winter, in Information Theory, 2005. ISIT 2005. Proceed-ings. International Symposium on (2005) pp. 2270–2274.[36] With su ﬃ cient ancilla systems, every incoherent operation canbe modeled using a projective measurement on a larger sys-tem and classical processing [20]. However, since the targetstates are pure, one does not need to classically process mea-surement outcomes or discard any subsystems in a successfuldistillation protocol, and thus the whole procedure can be doneon a “closed” system [28].[37] M. Horodecki, P. Horodecki, and R. Horodecki, Phys. Rev.Lett. , 5239 (1998), quant-ph / , 555 (2007).[40] G. Gour and R. W. Spekkens, New Journal of Physics ,033023 (2008).[41] G. Gour, I. Marvian, and R. W. Spekkens, Phys. Rev. A ,012307 (2009).[42] I. Marvian and R. W. Spekkens, New Journal of Physics ,033001 (2013).[43] I. Marvian and R. W. Spekkens, Nature Communications ,3821 (2014).[44] J. A. Vaccaro, F. Anselmi, H. M. Wiseman, and K. Jacobs,Phys. Rev. A , 032114 (2008).[45] G. A. White, J. A. Vaccaro, and H. M. Wiseman, Phys. Rev. A , 032109 (2009).[46] C. Fuchs and J. van de Graaf, Information Theory, IEEE Trans-actions on , 1216 (1999).[47] M. Fannes, Communications in Mathematical Physics , 291(1973).[48] K. M. Audenaert, J. Phys. A , 8127 (2007).[49] A. Winter, IEEE Transactions on Information Theory , 2481(1999).[50] I. Csiszár and J. Körner, Information Theory: Coding Theoremsfor Discrete Memoryless Systems (Cambridge University Press,Cambridge, UK, 2011).[51] I. Devetak and A. Winter, IEEE Trans. Inf. Theory , 3183(2004).[52] I. Devetak and A. Winter, Proceedings of the Royal SocietyA: Mathematical, Physical and Engineering Science , 207(2005). [53] A. S. Holevo, IEEE Trans. Inf. Theory , 269 (1998).[54] B. Schumacher and M. D. Westmoreland, Phys. Rev. A , 131(1997).[55] A. Uhlmann, Rep. Math. Phys. , 273 (1976).[56] R. Jozsa, Journal of Modern Optics , 2315 (1994). [57] J. A. Smolin, F. Verstraete, and A. Winter, Phys. Rev. A ,052317 (2005).[58] R. Bhatia, Matrix Analysis (Springer, 1996).[59] B. Synak-Radtke and M. Horodecki, Journal of Physics A:Mathematical and General , L423 (2006). Supplemental Material

PRELIMINARIESDistance Measures

The distance measure used in this paper is based on the trace norm, which for an operator A is || A || : = tr | A | = tr √ A † A . Thetrace distance for two states ρ and σ is D tr ( ρ, σ ) = || ρ − σ || , and we will write ρ (cid:15) ≈ σ to indicate D tr ( ρ, σ ) ≤ (cid:15) . The ﬁdelity oftwo states - given by F ( ρ, σ ) = tr (cid:113) √ ρσ √ ρ - can be related to the trace distance by [46]1 − F ( ρ, σ ) ≤ D tr ( ρ, σ ) ≤ (cid:113) − F ( ρ, σ ) . When σ is pure, the lower bound can be improved:1 − F ( ρ, | ϕ (cid:105)(cid:104) ϕ | ) ≤ D tr ( ρ, | ϕ (cid:105)(cid:104) ϕ | ) . (10)We reference Fannes’ inequality, which provides a bound on entropy di ﬀ erence in terms of the trace distance. Lemma 1 (Fannes-Audenaert Inequality [47, 48]) . For density matrices ρ and σ acting on a d-dimensional space, | S ( ρ ) − S ( σ ) | ≤ D tr ( ρ, σ ) log( d − + h ( D tr ( ρ, σ )) , (11) where h ( x ) = − x log x − (1 − x ) log(1 − x ) . Finally, we will need Winter’s gentle measurement lemma.

Lemma 2 (Gentle Measurements [49]) . For ρ ≥ and tr ρ ≤ , suppose ≤ X ≤ I and tr ρ X ≥ − (cid:15) . Then || ρ − √ X ρ √ X || ≤ √ (cid:15) . Types, Typical Sequences, Channel Coding

In what follows, we assume that random variables X , Y , · · · take on values x , y , · · · from sets X , Y , · · · . Probability distributionswill be denoted by p or q . For n identical and independently distributed (i.i.d.) events each with outcome distribution p , thedistribution over the sequence of events is denoted by p n . See Refs. [50, Chapter 2] and [51] for a comprehensive presentationof the following concepts.The type of a sequence x n ∈ X n is the distribution p x n over X deﬁned by p x n ( a ) : = n N ( a | x n ) ∀ a ∈ X , where N ( a | x n ) is the number of occurrences of the symbol a ∈ X in the sequence x n . For a given distribution p , the collection ofall sequences having type p is called the type class of p and is denoted by T np . A distribution p is said to be an empirical type (for some n ∈ N ) if T np is nonempty.A sequence x n is said to be δ -typical (or just typical) w.r.t. distribution p if (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n N ( a | x n ) − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < δ ∀ a ∈ X . The set of all δ -typical sequences will be denoted by T n [ p ] δ . Note that the set T n [ p ] δ is the union of empirical type classes, andhence we will say that a distribution q is typical w.r.t. p if it is an empirical type with T nq ⊂ T n [ p ] δ .Three standard properties of typicality are the following. First, for n i.i.d. samples of X according to distribution p , p n (cid:16) T n [ p ] δ (cid:17) : = Pr[ x n ∈ T n [ p ] δ ] ≥ − (cid:15) (12)for any (cid:15), δ > n su ﬃ ciently large. Second, let X be a random variable having distribution p and entropy H ( X ) : = − (cid:80) a ∈X p ( a ) log p ( a ). Then the size of T n [ p ] δ can be related to H ( X ) as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log | T n [ p ] δ | − H ( X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15) (13)for any (cid:15), δ > n su ﬃ ciently large. Third, if q ∈ T n [ p ] δ then (for δ < (2 |X| ) − )( n + −|X| n ( H ( X ) − τ ( δ )) ≤ | T nq | ≤ n ( H ( X ) + τ ( δ )) , (14)where τ ( δ ) = − δ |X| log δ . Note that τ ( δ ) is a function increasing in δ for the range 0 < δ < (2 |X| ) − . Eq. (14) follows from anapplication of Lemma 1 to the inequality ( n + −|X| nH ( X ) ≤ | T np | ≤ nH ( X ) (15)Moving to the quantum setting, for a quantum system H , we will assume throughout that the computational basis {| x (cid:105)} is theincoherent basis. For an empirical type p , the corresponding type projector acting on H ⊗ n is given by Π p = (cid:88) x n ∈ T np | x n (cid:105)(cid:104) x n | . We restrict attention to CQ-channels W CQ : | x (cid:105)(cid:104) x | X → ρ Bx , which map each element | x (cid:105) to a density matrix ρ Bx acting on H . Notethat CQ channels generalize classical channels. If we are given a set of transition probability p ( y | x ) characterizing a classicalchannel, then the corresponding CQ channel is | x (cid:105)(cid:104) x | X (cid:55)→ ρ Yx , where ρ Yx = (cid:80) y ∈Y p ( y | x ) | y (cid:105)(cid:104) y | . We will refer to channels of thisform as CC channels W CC .When a distribution p is given over X , we associate a quantum ensemble { p ( x ) , ρ Bx } x ∈X with the channel W CQ , as well as aclassical-quantum state ρ XB : ρ XB = (cid:88) x ∈X p ( x ) | x (cid:105)(cid:104) x | X ⊗ ρ Bx . The mutual information of the classical-quantum state ρ XB is the so-called Holevo quantity and denoted by I ( X : B ) = S ( (cid:80) x p ( x ) ρ Bx ) − (cid:80) x p ( x ) S ( ρ Bx ). For CC channels, the associated joint state is fully incoherent: ρ XY = (cid:88) x ∈X , y ∈Y p ( x , y ) | x (cid:105)(cid:104) x | X ⊗ | y (cid:105)(cid:104) y | Y , with mutual information I ( X : Y ) = H ( X ) + H ( Y ) − H ( XY ).Recall that an ( n , (cid:15) ) code of size C for a CQ channel W CQ : | x (cid:105)(cid:104) x | X → ρ Bx is a sequence of codewords ( U ( c ) ) Cc = with U ( c ) ∈ X n and a POVM { D c } Cc = acting on H ⊗ n such that 1 C C (cid:88) c = tr[ ρ nU ( c ) D c ] > − (cid:15), (16)where if U ( c ) = x n , then ρ nU ( c ) : = ρ Bx ⊗ · · · ⊗ ρ Bx n ∈ H ⊗ n . Coding Theorems

We now introduce the information-theoretic machinery that provides the foundation for our coding schemes. The followingis adopted from the work of Devetak and Winter in Ref. [52]. For an empirical type p over X , let ( U ( lc ) ) be an i.i.d. sequenceof random variables obtained by sampling from T np uniformly, with l = , · · · , L and c = , · · · , C . We consider the followingevents: • (cid:15) -evenness : For all x n ∈ T np , (1 − (cid:15) ) LC | T np | ≤ (cid:88) lc U ( lc ) ( x n ) ≤ (1 + (cid:15) ) LC | T np | , (17)where U ( lc ) is the indicator function for whether U ( lc ) = x n . • { C l } Ll = are ( n , (cid:15) ) codes : For every l = , · · · , L the codebook C l : = ( U ( lc ) ) Cc = forms an ( n , (cid:15) ) code for the channel W CQ : | x (cid:105)(cid:104) x | X → ρ Bx . Lemma 3 ([52–54]) . Consider a CQ channel W CQ : | x (cid:105)(cid:104) x | X → ρ Bx and a random variable X with a distribution given by someempirical type p. Let ( U ( lc ) ) be an i.i.d. sampling from T np with l = , · · · , L, c = , · · · , C, and C l = ( U ( lc ) ) Cc = . For every δ, (cid:15) > and n su ﬃ ciently large, Pr { (cid:15) -evenness } ≥ − |X| n − LC (cid:15) | T np | , C ≤ n ( I ( X : B ) − δ ) ⇒ Pr { A fraction − (cid:15) of the { C l } Ll = are ( n , (cid:15) ) channel codes } ≥ − − L (cid:15) . (18) Corollary 4.

Consider a CQ channel W CQ : | x (cid:105)(cid:104) x | X → ρ Bx and let X be a distribution given by some empirical type p. Forsu ﬃ ciently large n and δ < I ( X : B ) , there exists a partition of T np such that a fraction − (cid:15) of the sequences in T np belong to an ( n , (cid:15) ) channel code C , · · · , C L , where L = (cid:100) n ( H ( X ) − I ( X : B ) + δ ) (cid:101) and each C l consists of C = (cid:98) n ( I ( X : B ) − δ ) (cid:99) codewords.Proof. Since by the choices of L and C , LC | T np | ≥ n δ + − n ( H ( X | B ) + δ ) → ∞ , then by Lemma 3 an i.i.d. sequence ( U ( lc ) ) will satisfyPr { (cid:15) -evenness } → { A fraction 1 − (cid:15) of the C l are ( n , (cid:15) ) channel codes } → n → ∞ . Thus for su ﬃ ciently large n , there must exist families of ( n , (cid:15) ) codes ( U ( lc ) ) ⊂ T np for W CQ that cover T np with afraction 1 − (cid:15) of these codes being ( n , (cid:15) ) channel codes. The union of these codes will consist of (1 − (cid:15) ) LC codewords, includingmultiplicities. But by (cid:15) -evenness, the number of distinct codewords in this union will be at least (1 − (cid:15) ) LC (1 + (cid:15) ) LC / | T np | > (1 − (cid:15) ) | T np | . Indeed, (cid:15) -evenness guarantees that each individual x n has multiplicity no more than (1 + (cid:15) ) LC | T np | among all the codebooks. Hence, thenumber of distinct codewords is at least (1 − (cid:15) ) LC (1 + (cid:15) ) LC / | T np | , and so at least a fraction (1 − (cid:15) ) of the elements of T np are codewords for an( n , (cid:15) ) code. (cid:3) With Corollary 4, we are able to almost entirely cover each type class T np by ( n , (cid:15) ) channel codes having a constant rate C . Wewill also be interested in decomposing T np into “obfuscation” sets. The following covering lemma is presented in [7]. Lemma 5.

Consider a CQ channel W CQ : | x (cid:105)(cid:104) x | X → ρ Bx on a d-dimensional Hilbert space and a random variable X with adistribution given by some empirical type p. Let { Ω s } Ss = be obtained by a uniform sampling without replacement from the set { ρ nx n : x n ∈ T np } . Deﬁne the average state σ ( p ) : = | T np | (cid:88) x n ∈ T np ρ x n . Then for every (cid:15), δ ∈ (0 , and n su ﬃ ciently large, Pr  (cid:107) S S (cid:88) s = Ω s − σ ( p ) (cid:107) ≥ (cid:15)  ≤ d n exp (cid:32) − S t n (cid:15)

288 ln 2 (cid:33) , (19) where t = − ( I ( X : B ) + δ ) . We will say that a collection of states { Ω s } Ss = corresponding to S distinct sequences from T np is “good” if (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) S (cid:80) Ss = Ω s − σ ( p ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15) . Figure 2. Code Depiction. Our code involves ﬁrst decomposing the typical set T n [ p ] δ into its typical type classes T np t , where p ( x ) is distributiongiven by | Ψ (cid:105) AB = (cid:80) x ∈X (cid:112) p ( x ) | x (cid:105) A | ψ x (cid:105) B for Alice’s incoherent basis | x (cid:105) A . Each type class is further decomposed in three di ﬀ erent ways. Theﬁrst involves a partitioning into obfuscation sets S m for which Bob’s average state is roughly the same when restricting to these sets. The othertwo decompositions involve partitioning T np t into codebooks C l and C l for the channels | x (cid:105)(cid:104) x | → ∆ ( | ψ x (cid:105)(cid:104) ψ x | ) and | x (cid:105)(cid:104) x | → | ψ x (cid:105)(cid:104) ψ x | respectively. Corollary 6 ([7]) . Let X be a random variable with distribution given by some empirical type p. For n su ﬃ ciently large, thereexists a partitioning of T np consisting of M subsets { S m } Mm = each of size S = (cid:100) n ( I ( X : B ) + δ ) (cid:101) (plus a remainder) such that a fraction (1 − (cid:15) ) of the subsets are good sets.Proof. Consider a random partition of T np into M blocks each of size S . Note that each block is equivalently obtained by auniform sampling of S elements from T np without replacement. Let E m be the random variable for which E m = m th block is a good set and E m = n can be taken su ﬃ ciently large so that the expectation of E m is greater than 1 − (cid:15) . Therefore for a random partition of T np , the expected number of good sets across all blocks is given by (cid:104) (cid:80) Mm = E m (cid:105) = (cid:80) Mm = (cid:104) E m (cid:105) > M (1 − (cid:15) ). Hence, there must exist at least one partition with M (1 − (cid:15) ) of the blocks being good. (cid:3) CODE STRUCTURE

We combine Corollaries 4 and 6 to obtain the basic structure of both our distillation and formation codes. A diagram isprovided in Fig. 2.Let | Ψ (cid:105) AB be an arbitrary bipartite state with | Ψ (cid:105) AB = (cid:88) x ∈X (cid:112) p ( x ) | x (cid:105) A | ψ x (cid:105) B , (20)where {| x (cid:105) A } and {| y (cid:105) B } denote the preferred bases with respect to which the incoherent operations are deﬁned for Alice and Bob,respectively, and | ψ x (cid:105) B = (cid:80) y ∈Y e i θ y | x (cid:112) p ( y | x ) | y (cid:105) B are normalized but not necessarily orthogonal states of Bob. Let W CQ and W CC be the CQ and CC channels given by W CQ : | x (cid:105)(cid:104) x | X → ψ Bx ≡ | ψ x (cid:105)(cid:104) ψ x | B W CC : | x (cid:105)(cid:104) x | X → ∆ ( ψ Bx ) = (cid:88) y ∈Y p ( y | x ) | y (cid:105)(cid:104) y | Y . Let X be the random variable taking on values from X according to the distribution p ( x ) in Eq. (20). In other words, p ( x )describes the distribution of outcomes when measuring ∆ ( Ψ A ) in the incoherent basis. For a ﬁxed n , the set of typical sequences T n [ p ] δ is the union of typical types. We will denote the typical types by p t , for t = , , · · · , T , and the random variable associatedwith p t will be denoted by X t . Note that T ≤ ( n + |X| .We will be interested in four di ﬀ erent n -copy decompositions of | Ψ (cid:105) AB , where in all cases we assume that n is being takensu ﬃ ciently large.0 Decomposition 1:

The ﬁrst decomposition is based on the coherence distillation protocol presented in Ref. [7]. It involvesforming good sets S m in the sense of Corollary 6 and w.r.t. the CQ channel W CQ . For each typical type p t , consider a partitioningof T np t according to Corollary 6. Then for each each x n ∈ T n [ p ] δ , we can relabel x n → ( t , m , s ) where p t is the typical type forwhich x n ∈ T np t ; m is the block number within T np t for which x n belongs (with m = s is the order of x n in the m th block. For a ﬁxed t , the range of m and s is m = , · · · , M t and s = , · · · S t , where M t = (cid:98)| T np t | / S t (cid:99) , S t = (cid:100) n ( I ( X t : B ) W CQ + δ ) (cid:101) . (21)The bit rates of S t and M t satisfy (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log S t − E( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ )) (22) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log M t − [ S ( A ) ∆ ( Ψ ) − E( Ψ )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ ) , log nn ) . (23)The ﬁrst line follows from Fannes’ Inequality and the fact that I ( X : B ) Ψ = E( Ψ ), while the second can be seen from n log | T np t | ≈ H ( X ) and n log S t ≈ I ( X : B ) Ψ . Since all but a vanishing small fraction of x n belong to T n [ X ] δ , we can thus write | Ψ (cid:105) ⊗ n = (cid:88) x n (cid:112) p n ( x n ) | x n (cid:105) | ψ x n (cid:105) (cid:15) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t M t (cid:88) m = | m (cid:105) A √ S t S t (cid:88) s = | s (cid:105) A | ψ tms (cid:105) B (24)where q ( t ) is the probability of typical type class T np t (conditioned on the event x n ∈ T n [ p ] δ ). Note that for a sequence x n labeledby ( t , m , s ) we have that S ( ∆ ( ψ x n )) = S ( ∆ ( ψ tms )) = (cid:80) x ∈X N ( x | x n ∈ T np t ) S ( ∆ ( ψ x )), with the RHS being independent of s and m .Since S ( Y | X ) ∆ ( Ψ ) = (cid:80) x ∈X p ( x ) S ( ∆ ( ψ x )), the following bound is obtained, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n S ( ∆ ( ψ x n )) − S ( Y | X ) ∆ ( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:88) x ∈X S ( ∆ ( ψ x )) , (25)which again follows from δ -typicality. For each typical type p t , we now further restrict the sum over m to only those values forwhich S m are good sets. By Corollary 6 there are M t (1 − (cid:15) ) such sets. For these values of m , we have that1 S t S t (cid:88) s = ψ tms (cid:15) ≈ | T np t | (cid:88) x n ∈ T npt ψ x n , (26)with the RHS being independent of m . As Uhlmann’s Theorem states that F ( ρ , ρ ) = max |(cid:104) ϕ | ϕ (cid:105)| , where the maximization istaken over all puriﬁcations of ρ and ρ respectively [55, 56], the previous equation implies for each pair ( t , m ) the existence ofa unitary U tm acting on A such that 1 √ S t S t (cid:88) s = | s (cid:105) A | ψ tms (cid:105) B O ( (cid:15) ) ≈ √ S t S t (cid:88) s = ( U tm | s (cid:105) A ) | ψ tm s (cid:105) B , (27)where m ∈ { , · · · , M t (1 − (cid:15) ) } is some ﬁxed number. We thus continue Eq. (24) by restricting the sum over m to just good valuesand replacing the sum over s with Eq. (27): | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ S t S t (cid:88) s = (cid:16) U tm | s (cid:105) A (cid:17) | ψ tm s (cid:105) B . (28) Decompositions 2 and 3:

The second and third decompositions are built from ( n , (cid:15) ) codes for the channels W CC and W CQ re-spectively. The structure of the decompositions is based on the entanglement-assisted and GHZ distillation schemes of Ref. [57].First we turn to W CC . For every typical type p t , consider a partitioning of T np t according to Corollary 4. Then each x n ∈ T n [ p ] δ can be relabeled x n → ( t , l , c ) where p t is the typical type for which x n ∈ T np t ; l is the code C l within T np t for which x n belongs;and c is the order of x n in the l th code. For a ﬁxed t , the range of l and c is l = , · · · , L t and c = , · · · C t , where L t = (cid:98) n ( H ( X t ) − I ( X t : Y ) W CC + δ ) (cid:99) , C t = (cid:100) n ( I ( X t : Y ) W CC − δ ) (cid:101) . (29)1The bit rates of L t and C t satisfy (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log L t − [ H ( X ) − I ( X : Y ) ∆ ( Ψ ) ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ ) , log nn ) (30) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log C t − I ( X : Y ) ∆ ( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ )) . (31)By discarding non-typical sequences and the fraction 3 (cid:15) of x n not belonging to an ( n , (cid:15) ) code, we obtain the approximation | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = | l (cid:105) A √ C t C t (cid:88) c = | c (cid:105) A | ψ tlc (cid:105) B . (32)For every ( t , l ) deﬁne the state | χ tl (cid:105) = √ C t (cid:80) C t c = | c (cid:105) A | ψ tlc (cid:105) B . By (cid:15) -decodability of the channel W CC there exists a family ofdecoding POVMs ( D ( tl ) c ) C t c = such that 1 C t C t (cid:88) c = tr[ ∆ ( ψ tlc ) D ( tl ) c ] > − (cid:15). Note that tr[ ∆ ( ψ tlc ) D ( tl ) c ] = tr[ ∆ ( ψ tlc ) ∆ ( D ( tl ) c )], and so without loss of generality, we can assume that the D ( tl ) c are diagonal in theincoherent basis. We consider a dilation of the ( t , l ) th POVM. To do so, introduce the isometries Y : B → BB and W tl : B → BB as Y = (cid:88) y n ∈Y n | y n (cid:105) B (cid:104) y n | B ⊗ | y n (cid:105) B W tl = C t (cid:88) c = (cid:113) D ( tl ) c ⊗ | c (cid:105) B . (33)Crucially, both Y and W tl represent incoherent operations. Deﬁne the state | (cid:98) χ tl (cid:105) = ( I A ⊗ W tl Y ) | χ tl (cid:105) = √ C t C t (cid:88) c = | c (cid:105) A (cid:88) y n ∈Y n C t (cid:88) c (cid:48) = (cid:104) y n | ψ tlc (cid:105) (cid:113) D ( tl ) c (cid:48) | y n (cid:105) B | y n (cid:105) B | c (cid:48) (cid:105) B . (34)We want to show that this state is (cid:15) -close to the state | (cid:98)(cid:98) χ tl (cid:105) = √ C t C t (cid:88) c = | c (cid:105) A (cid:88) y n ∈Y n (cid:104) y n | ψ tlc (cid:105) | y n (cid:105) B | y n (cid:105) B | c (cid:105) B , (35)which would imply that Bob coherently decode | c (cid:105) from | χ tl (cid:105) without disturbing the state that much. To this end, ﬁrst note that | (cid:98) χ tl (cid:105) O ( (cid:15) ) ≈ ( I A ⊗ √ X tl ) | (cid:98)(cid:98) χ tl (cid:105) where X tl = C t (cid:88) c = D ( tl ) c ⊗ I B ⊗ | c (cid:105)(cid:104) c | B . The approximation O ( (cid:15) ) ≈ here can be seen from the fact that (cid:104) (cid:98) χ tl | ( I A ⊗ (cid:112) X tl ) | (cid:98)(cid:98) χ tl (cid:105) = C t C t (cid:88) c = tr[ ∆ ( ψ tlc ) D ( tl ) c ] > − (cid:15). Then applying Lemma 2 to tr[( I A ⊗ X tl ) (cid:98)(cid:98) χ tl ] > − (cid:15) , we can conclude that ( I A ⊗ √ X tl ) | (cid:98)(cid:98) χ tl (cid:105) O ( (cid:15) ) ≈ | (cid:98)(cid:98) χ tl (cid:105) . Therefore, | (cid:98) χ tl (cid:105) O ( (cid:15) ) ≈ | (cid:98)(cid:98) χ tl (cid:105) and2so | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = | l (cid:105) A √ C t C t (cid:88) c = | c (cid:105) A (cid:88) y n ∈Y n (cid:104) y n | ψ tlc (cid:105) W † tl (cid:16) | y n (cid:105) B | c (cid:105) B (cid:17) = T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = | l (cid:105) A √ C t C t (cid:88) c = | c (cid:105) A W † tl (cid:16) | ψ tlc (cid:105) B | c (cid:105) B (cid:17) = T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = | l (cid:105) A √ C t C t (cid:88) c = | c (cid:105) A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) , (36)where Π tlc permutes | ψ tl c (cid:105) into | ψ tlc (cid:105) , for some ﬁxed l ∈ , · · · , L t and c ∈ , · · · , C t . Recall that for each type t , each | ψ tlc (cid:105) isa sequence | ψ tlc (cid:105) = | ψ x (cid:105) | ψ x (cid:105) · · · | ψ x n (cid:105) related to one another through a permutation of the | ψ x i (cid:105) .We now repeat an analogous decomposition for the CQ channel W CQ . Since this will involve a di ﬀ erent covering of the typeclasses we use a di ﬀ erent labeling x n → ( t , l , c ). By the same arguments as above, the decomposition takes the form | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A (cid:113) L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = | l (cid:105) A (cid:113) C t C t (cid:88) c = | c (cid:105) A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) . (37)Here, like before, W tl is an isometry for the ( t , l ) th decoding POVM of W CQ as in Eq. (33). However W tl will in general not beincoherent. Decomposition 4:

The fourth decomposition is a hybrid of decompositions 1 and 2. It begins with Eq. (24) and the fact that thesum over m includes M t (1 − (cid:15) ) good sets in the sense that1 S t S t (cid:88) s = ψ tms (cid:15) ≈ | T np t | (cid:88) x n ∈ T npt ψ x n (38)for these good values of m . In this decomposition, we now replace the RHS by W CC channel codes. That is, we use Corollary 4to write 1 S t S t (cid:88) s = ψ tms (cid:15) ≈ | T np t | (cid:88) x n ∈ T npt ψ x n = L t C t L t (cid:88) l = C t (cid:88) c = ψ tlc . Uhlmann’s Theorem again implies that for each good value of m there exists a right orthogonal matrix V tm : A A → A (with V tm V † tm = I A ) such that 1 √ S t S t (cid:88) s = | s (cid:105) A | ψ tms (cid:105) B O ( (cid:15) ) ≈ √ L t T t L t (cid:88) l = C t (cid:88) c = (cid:16) V tm | lc (cid:105) A A (cid:17) ψ Btlc . (39)Hence by restricting to good values of m and ( n , (cid:15) ) channel codes, we have the decomposition | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = (cid:16) V tm | lc (cid:105) A A (cid:17) | ψ tlc (cid:105) B . (40)Finally, similar to the construction in decomposition 2, decoding isometries W tl exist for Bob so that the state can be expressedas | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = (cid:16) V tm | lc (cid:105) A A (cid:17) W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) , (41)for some ﬁxed ( l , c ). Summary of Code Construction: | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ S t S t (cid:88) s = (cid:16) U tm | s (cid:105) A (cid:17) | ψ tm s (cid:105) B , (42a) | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = | l (cid:105) A | c (cid:105) A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) , (42b) | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A (cid:113) L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = | l (cid:105) A | c (cid:105) A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) , (42c) | Ψ (cid:105) ⊗ n O ( (cid:15) ) ≈ T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = (cid:16) V tm | lc (cid:105) A A (cid:17) W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) , (42d)with bit rates 1 n log T ≤ O ( log nn ) , (43) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log S t − E( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ )) , (44) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log M t − [ S ( X ) ∆ ( Ψ ) − E( Ψ )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ ) , log nn ) , (45) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log L t − S ( X | Y ) ∆ ( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ ) , log nn ) , (46) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log C t − I ( X : Y ) ∆ ( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ )) , (47) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n S ( ∆ ( ψ x n )) − S ( Y | X ) ∆ ( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( δ ) , (48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log L t − [ S ( X ) ∆ ( Ψ ) − E( Ψ )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ ) , log nn ) , (49) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n log C t − E( Ψ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( τ ( δ )) . (50) PROOFS OF MAIN TEXT THEOREMS / LEMMAS AND EXPANDED DISCUSSIONProof of Theorem 1

Theorem 1.

For a pure state | Ψ (cid:105) AB the following triples are achievable coherence-entanglement formation rates( R A , R B , E co ) = (cid:0) , S ( Y | X ) ∆ ( Ψ ) , S ( X ) ∆ ( Ψ ) (cid:1) (51)( R A , R B , E co ) = (cid:0) S ( X ) ∆ ( Ψ ) , S ( Y | X ) ∆ ( Ψ ) , E( Ψ ) (cid:1) (52)( R A , R B , E co ) = (cid:0) , , S ( XY ) ∆ ( Ψ ) (cid:1) (53)as well as the points obtained by interchanging X ↔ Y in Eqns. (51) – (53). Moreover, these points are optimal in the sense thatany achievable rate triple must satisfy E co ≥ E( Ψ ) R A + R B ≥ S ( XY ) ∆ ( Ψ ) R B + E co ≥ S ( XY ) ∆ ( Ψ ) . (54) Example 1.

2. This state has entanglement E( Ψ ) = S ( A ) Ψ < E co = E( Ψ ) must have coherence rates satisfying R B ≥ S ( AB ) ∆ ( Ψ ) − E( Ψ ) =

1. However, if Alice and Bob would rather use more eCoBits than CoBits, they can reduce Bob’slocal coherence rate. Namely, rate (51) gives R A = R B = S ( Y | X ) ∆ ( Ψ ) < Proof.

The lower bounds of Eq. (54) follow from the coherence cost rates using global operations (Theorem 3 of [7] as well as[6] and Theorem 5 below), and the fact that one eCoBit can be converted into one CoBit using LIOCC. Indeed, | Φ AB (cid:105) → | Φ B (cid:105) when Alice performs the incoherent measurement with Kraus operators {| (cid:105)(cid:104) + | , | (cid:105)(cid:104)−|} , and then Bob performs σ Z i ﬀ Aliceobtains outcome | (cid:105) .Moving to achievability, we ﬁrst prove Eq. (51). The protocol is based on decomposition (42b). Alice and Bob share log E n eCoBits expressed as 1 √ E n T (cid:88) t = L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = | tt (cid:105) A B | ll (cid:105) A B | cc (cid:105) A B , (55)where E n = (cid:81) Tt = L t (1 − (cid:15) ) C t , while Bob has an additional n [ S ( Y | X ) ∆ ( Ψ ) + O ( δ )] CoBits. In the ﬁrst step, Alice and Bobdeterministically transform their state into T (cid:88) t = (cid:112) q ( t ) 1 √ L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = √ C t C t (cid:88) c = | tt (cid:105) A B | ll (cid:105) A B | cc (cid:105) A B where q ( t ) is given in Eq. (42b). This can always be done via the majorization criterion of Lemma 3 below. Using hislocal CoBits, Bob ﬁrst prepares | ψ tl c (cid:105) B on ancillary system B , which can be done to arbitrary precision for su ﬃ ciently large n (Theorem 3 of [7] as well as [6] and Remark 2 below). Next he performs the incoherent unitary W † tl Π tlc conditioned on | t (cid:105) B | l (cid:105) B | c (cid:105) B . More precisely, Π tlc is performed on system B conditioned on B B B , and W † tl is performed on BB conditionedon B B . Finally, he decouples his registers B B . To accomplish this, he performs a generalized incoherent measurement { K t , l = | (cid:105)(cid:104) γ t , l | B B } T , L t (1 − (cid:15) ) t , l = where | γ t , l (cid:105) = (cid:80) Tt (cid:48) = (cid:80) L t (1 − (cid:15) ) l (cid:48) = e i π (cid:32) ( t − t (cid:48) − T + ( l − l (cid:48) − L t (1 − (cid:15) ) (cid:33) | t (cid:48) (cid:105) | l (cid:48) (cid:105) . For outcome K t , l , Bob announces theresult to Alice, and she performs the incoherent unitary U tl = T (cid:88) t (cid:48) = L t (1 − (cid:15) ) (cid:88) l (cid:48) = e − i π (cid:32) ( t − t (cid:48) − T + ( l − l (cid:48) − L t (1 − (cid:15) ) (cid:33) | t (cid:105)(cid:104) t (cid:48) | A | l (cid:105)(cid:104) l (cid:48) | A . (56)The desired state is thus obtained: T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ L t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = | l (cid:105) A √ C t C t (cid:88) c = | c (cid:105) A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) . Asymptotically, the consumption rates of entanglement and coherence approach Eq. (51).Now we prove achievability of Eq. (52). The protocol is based on decomposition (42a). Alice and Bob share log E (cid:48) n eCoBitsexpressed as 1 (cid:112) E (cid:48) n T (cid:88) t = S t (cid:88) s = | tt (cid:105) A B | ss (cid:105) A B , (57)where E (cid:48) n = (cid:81) Tt = S t , while Alice has an additional n [ S ( X ) ∆ ( Ψ ) + O ( τ ( δ ) , log nn )] CoBits and Bob has n [ S ( Y | X ) ∆ ( Ψ ) + O ( δ )] CoBits.In the ﬁrst step of the protocol, Alice and Bob again deterministically transform their entanglement into T (cid:88) t = (cid:112) q ( t ) | tt (cid:105) A B √ S t S t (cid:88) s = | ss (cid:105) A B using LIOCC. Using local CoBits, Bob prepares | ψ (cid:105) Btm s , conditioned on | ts (cid:105) B B . After this, he decouples his registers B B usinga measurement described above, and Alice performs a suitable incoherent unitary similar to Eq. (56). At this point, Alice andBob share T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ S t S t (cid:88) s = | s (cid:105) A | ψ tm s (cid:105) B . Next, Alice splits her coherence into two parts | κ (cid:105) | κ (cid:105) , where | κ i (cid:105) = √ κ i (cid:80) κ i x = | x (cid:105) for κ = n [ S ( X ) ∆ ( Ψ ) − E( Ψ ) + O ( τ ( δ ) , log nn )] and κ = n [E( Ψ ) + O ( τ ( δ ) , log nn )]. Using | κ (cid:105) , she implements a rotation | (cid:105) → √ M t (1 − (cid:15) ) (cid:80) M t m = | m (cid:105) , conditioned on | t (cid:105) . Using | κ (cid:105) , she5implements another unitary rotation | s (cid:105) → U tm | s (cid:105) conditioned on | t (cid:105) | m (cid:105) , for s = , · · · , S t . The desired state is thus obtained: T (cid:88) t = (cid:112) q ( t ) | t (cid:105) A √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ S t S t (cid:88) s = (cid:16) U tm | s (cid:105) A (cid:17) | ψ tm s (cid:105) B . Asymptotically, the consumption rates of entanglement and coherence approach Eq. (52).Finally, the achievability of Eq. (53) follows from Eq. (51) and the fact that every eCoBit can be deterministically transformedinto a CoBit for Bob using LIOCC. (cid:3)

Proof of Lemmas 2 and 3

Lemma 2.

An arbitrary d × d unitary operator U can be performed on a system using incoherent operations and (cid:100) log d (cid:101) CoBits.

Proof.

Let us introduce an orthonormal basis {| b jk (cid:105)} dj , k = for a d ⊗ d bipartite system S S (cid:48) consisting of maximally entangled states | b jk (cid:105) S S (cid:48) = I ⊗ W jk | Φ ( d ) S S (cid:48) (cid:105) (58)where | Φ ( d ) S S (cid:48) (cid:105) = √ d d − (cid:88) i = | i (cid:105) S ⊗ | i (cid:105) S (cid:48) , W jk = d − (cid:88) l = τ l j | k + l (cid:105)(cid:104) l | , with τ = e π i / d and addition is modulo d . The {| b jk (cid:105)} generalize the Bell basis in higher dimensions. Note that each unitary W jk is an incoherent operation.Suppose now that we wish to perform an arbitrary d ⊗ d unitary U on some state | ψ (cid:105) . We can accomplish this incoherentlyusing (cid:100) log d (cid:101) CoBits through the following procedure. First, let | ψ (cid:105) belong to system S , and express | ψ (cid:105) S = [ ψ ] | Φ ( d ) S (cid:105) , where[ ψ ] is a d × d complex matrix and | Φ ( d ) S (cid:105) = / √ d (cid:80) d − i = | i (cid:105) S . We next introduce (cid:100) log d (cid:101) CoBits | Φ S (cid:48) (cid:105) ⊗(cid:100) log d (cid:101) on system S (cid:48) . This isdeterministically transformed into | Φ ( d ) S (cid:48) (cid:105) = / √ d (cid:80) d − i = | i (cid:105) S (cid:48) , which can always be accomplished since | Φ ( d ) S (cid:48) (cid:105) majorizes | Φ S (cid:48) (cid:105) ⊗(cid:100) log d (cid:101) (see Theorem 1 of [7] as well as Ref. [20]). An additional system S (cid:48)(cid:48) is then introduced in state | (cid:105) S (cid:48)(cid:48) and an entangling incoherentoperation is performed to obtain | Φ ( d ) S (cid:48) (cid:105) | (cid:105) → | Φ ( d ) S (cid:48) S (cid:48)(cid:48) (cid:105) . Thus at this point the state across all three systems | ψ (cid:105) S | Φ ( d ) S (cid:48) S (cid:48)(cid:48) (cid:105) = d d − (cid:88) i , j = (cid:16) [ ψ ] | i (cid:105) S (cid:17) ⊗ | j (cid:105) S (cid:48) ⊗ | j (cid:105) S (cid:48)(cid:48) . (59)An incoherent measurement { M jk } d − j , k = is then performed on systems S S (cid:48) given by M jk = | jk (cid:105) (cid:16) (cid:104) b jk | U S ⊗ I S (cid:48) (cid:17) = | jk (cid:105)(cid:104) Φ ( d ) S S (cid:48) | I ⊗ U T W † jk . (60)From Eq. (59), we see that outcome jk generates the (unnormalized) post-measurement state1 d d − (cid:88) m , n = | jk (cid:105)(cid:104) Φ ( d ) S S (cid:48) | [ ψ ] ⊗ U T W † jk | mn (cid:105) S S (cid:48) ⊗ | n (cid:105) S (cid:48)(cid:48) = d d − (cid:88) m , n = | jk (cid:105)(cid:104) Φ ( d ) S S (cid:48) | I ⊗ [ ψ ] T U T W † jk | mn (cid:105) S S (cid:48) ⊗ | n (cid:105) S (cid:48)(cid:48) = d | jk (cid:105) ⊗ √ d d − (cid:88) m = W ∗ jk U [ ψ ] | m (cid:105) S (cid:48)(cid:48) = d | jk (cid:105) ⊗ W ∗ jk U | ψ (cid:105) S (cid:48)(cid:48) . (61)Therefore, after applying the (incoherent) rotation W ∗ jk on system S (cid:48)(cid:48) , the state U | ψ (cid:105) is obtained. (cid:3) Remark 1.

Lemma 2 can easily be extended to performing controlled unitaries of the form (cid:80) Rr = | r (cid:105)(cid:104) r | C ⊗ U r , where C is thecontrol system of dimension R . When each U r acts on a d -dimensional system, the amount of CoBits needed to perform thisoperation is given by (cid:100) log d (cid:101) . Indeed, the above protocol is repeated with a replacement in Eq. (60) of an incoherent measurement { M jk } d − j , k = performed on systems CS S (cid:48) given by M jk = R (cid:88) r = | r (cid:105)(cid:104) r | C ⊗ | jk (cid:105) (cid:16) (cid:104) b jk | U Sr ⊗ I S (cid:48) (cid:17) . (62) Remark 2.

Lemma 2 o ﬀ ers an alternative protocol for coherence dilution in pure states [7]. For an arbitrary pure state | ψ (cid:105) = (cid:80) dx = (cid:112) p ( x ) e i θ x | x (cid:105) , we can transform | + (cid:105) n → (cid:15) ≈ | ψ (cid:105) ⊗(cid:98) nR (cid:99) at any rate R < S ( X ) ∆ ( ψ ) as n → ∞ . To see this we consider the n -copydecomposition of | ψ (cid:105) into its typical and atypical parts [7]: | ψ (cid:105) ⊗ n = (cid:113) p n ( T n [ p ] δ ) | typical (cid:105) + (cid:113) − p n ( T n [ p ] δ ) | atypical (cid:105) . (63)Since | T n [ p ] δ | ≤ n [ S ( X ) ∆ ( ψ ) + δ ] and p n ( T n [ p ] δ ) → δ > n → ∞ , dilution is achieved by performing a unitaryoperation that rotates | + (cid:105) n to | typical (cid:105) . Since | typical (cid:105) is an element in a | T n [ p ] δ | -dimensional space, Lemma 2 implies that such aunitary can be implemented by incoherent operations at a coherence consumption rate arbitrary close to S ( X ) ∆ ( ψ ) . Lemma 3.

Let | ψ (cid:105) AB and | φ (cid:105) AB be two bipartite pure states with squared Schmidt coe ﬃ cients being (cid:126)τ ( ψ ) and (cid:126)τ ( φ ) respectively.Suppose that Alice and Bob’s incoherent bases are Schmidt bases for both | ψ (cid:105) AB and | φ (cid:105) AB , and suppose that (cid:126)τ ( φ ) majorizes (cid:126)τ ( ψ )(i.e. (cid:126)τ ( ψ ) ≺ (cid:126)τ ( φ )). Then there exists an LIOCC protocol transforming | ψ (cid:105) AB → | φ (cid:105) AB with probability one. Proof.

Recall that a probability distribution (cid:126) y = ( y , · · · , y n ) majorizes another distribution (cid:126) x = ( x , · · · , x n ) if (cid:80) kj = y ↓ j ≥ (cid:80) kj = x ↓ j for all k = , · · · , n , where y ↓ j is the components of (cid:126) y in non-increasing order and likewise for x ↓ j . Without loss of generality,suppose that both | ψ (cid:105) AB and | φ (cid:105) AB are maximally correlated (i.e. have the form | ψ (cid:105) = (cid:80) i √ ψ i | ii (cid:105) and | φ (cid:105) = (cid:80) i √ φ i | ii (cid:105) ). Pad (cid:126)τ ( ψ )with enough zeros so that (cid:126)τ ( ψ ) and (cid:126)τ ( φ ) are real vectors of equal length. Since (cid:126)τ ( ψ ) ≺ (cid:126)τ ( φ ), there exists a doubly stochasticmatrix D such that (cid:126)τ ( ψ ) = D (cid:126)τ ( φ ) [58]. Birkho ﬀ ’s Theorem assures that D = (cid:80) α p α Π α , where the p α form a probabilitydistribution and the Π α are permutation matrices. Then deﬁne the operators M α : = √ p α Π † α • S , where the elements of S are given by [[ S ]] i j = √ φ i / (cid:112) ψ j and “ • ” denotes the Hadamard product. Recall that the Hadamard product of two matrices A and B is the matrix A • B with elements [[ A • B ]] i j = [[ A ]] i j [[ B ]] i j . Note that each M α is an incoherent operator. Byconstruction M α ⊗ Π α | ψ (cid:105) ∝ | φ (cid:105) for every α , and the relation (cid:126)τ ( ψ ) = (cid:80) α p α Π α (cid:126)τ ( φ ) readily implies that (cid:80) α M † α M α = I . Hence, theprotocol consists of Alice performing the incoherent measurement { M α } α , announcing her result, and then Bob performing thepermutation Π α . (cid:3) Proof of Theorem 4

Theorem 4.

The function C L is an LIOCC monotone. Proof.

By the convex roof construction, it su ﬃ ces to prove monotonicity for pure state transformations [34]. To do so, we ﬁrstintroduce two relative entropy quantities for a general density matrix ρ S on system S and a bipartite state ρ AB on joint system AB : C r ( ρ S ) = min σ S ∈I S ( ρ S || σ A ) [3] and C A | Br ( ρ AB ) = min σ AB ∈QI S ( ρ AB || σ AB ) [33], where I is the set of incoherent states forsystem S and QI is the set of quantum-incoherent states for system AB . For a pure state | ϕ (cid:105) AB with reduced density matrices ϕ A and ϕ B , these quantities reduce to C r ( ϕ A ) = S ( A ) ∆ ( ϕ ) − E( ϕ ), C r ( ϕ B ) = S ( B ) ∆ ( ϕ ) − E( ϕ ), and C A | Br ( ϕ AB ) = S ( B ) ∆ ( ϕ ) . Furthermore,it was shown in Ref. [33] that C A | Br = C A | Ba for pure states, where C A | Ba ( ρ AB ) the optimal asymptotic rate of coherence distillationon Bob’s side when Alice helps. Collecting these observations we therefore obtain C L ( ϕ AB ) = C A | Br ( ϕ AB ) + C r ( ϕ A ) = C B | Ar ( ϕ AB ) + C r ( ϕ B ) . (64)Now suppose that in the ﬁrst round of the protocol, Alice makes a local measurement on the joint state | Ψ (cid:105) AB that gen-erates an ensemble of pure state transformations | Ψ (cid:105) AB → {| ω k (cid:105) AB , p k } . Since C A | Br is an LIOCC monotone, we have C A | Br ( Ψ AB ) ≥ (cid:80) k p k C A | Br ( ω ABk ), and likewise because C r (for Alice’s system) is a monotone under Alice’s incoherent opera-tion, we have C r ( Ψ A ) ≥ (cid:80) k p k C r ( ω Ak ). Hence C L ( Ψ AB ) ≥ (cid:80) k p k C L ( ω ABk ). When Bob measures in the next round, we repeat thesame argument on each ω ABk and use the fact that C L ( ω ABk ) = C B | Ar ( ω ABk ) + C r ( ω Bk ). By iteration, C L behaves monotonically forall rounds of the protocol, and the theorem is proven. (cid:3) Proof of Theorem 5

Theorem 5.

For a pure state | Ψ (cid:105) AB the following triples are achievable coherence-entanglement distillation rates( R A , R B , E co ) = (cid:0) S ( X ) ∆ ( Ψ ) − E( Ψ ) , S ( Y ) ∆ ( Ψ ) , (cid:1) (65)( R A , R B , E co ) = (cid:0) , S ( Y | X ) ∆ ( Ψ ) , I ( X : Y ) ∆ ( Ψ ) (cid:1) , (66)as well as the points obtained by interchanging A ↔ B in Eqns. (65) and (66). Moreover, these points are optimal in the sensethat any achievable rate triple must satisfy R A + R B ≤ C L ( Ψ ) = S ( X ) ∆ ( Ψ ) + S ( Y ) ∆ ( Ψ ) − E( Ψ ) , R B + E co ≤ S ( Y ) ∆ ( Ψ ) . (67) Proof.

The upper bound R A + R B ≤ C L ( Ψ ) follows from monotonicity of C L under LIOCC and the fact that C L is asymptoticallycontinuous. The latter property holds because C L is deﬁned on pure states in terms of relative entropy measures and extended tomixed states using a convex roof (see [59]). The upper bound R B + E co ≤ S ( Y ) ∆ ( Ψ ) follows from the ﬁrst and again the fact thatone eCoBit can be transformed into one local CoBit using LIOCC.Moving to achievability, we ﬁrst prove the rate triple given in Eq. (65). The protocol is based on decomposition (42d). Startingfrom | Ψ (cid:105) ⊗ n expressed in this form, Alice ﬁrst measures the typical type encoded in register A and (with high probability)announces the result t . This leaves them with the state1 √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A √ L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = (cid:16) V tm | lc (cid:105) A A (cid:17) W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) . Alice next the performs the incoherent measurement { K lc } L t (1 − (cid:15) ) , C t l , c = on A A where K lc = | l (cid:105)(cid:104) l γ c | V † lm (68)where | γ c (cid:105) = (cid:80) C t c (cid:48) = e i π (cid:32) ( c − c (cid:48) − C t (cid:33) | c (cid:48) (cid:105) . She announces the result ( l , c ), and then Bob performs the incoherent operation W † tl ( Π Btlc ⊗ I B ) to systems BB followed by the error-correction unitary U c = (cid:80) C t c (cid:48) = e − i π (cid:32) ( c − c (cid:48) − C t (cid:33) | c (cid:48) (cid:105) performed on B . The output state is1 √ M t (1 − (cid:15) ) M t (1 − (cid:15) ) (cid:88) m = | m (cid:105) A C t (cid:88) c = | c (cid:105) B | ψ tl c (cid:105) B . (69)Bob can further distill | ψ tl c (cid:105) → | + (cid:105) rn with r → S ( Y | X ) ∆ ( Ψ ) as n → ∞ . The total rates of coherence distillation thus approachEq. (65).We next turn to the achievability of rate triple Eq. (66). It is based on decomposition (42b). Alice ﬁrst does a type measurementand with high probability will generate the post-measurement state1 √ L t C t (1 − (cid:15) ) L t (1 − (cid:15) ) (cid:88) l = C t (cid:88) c = | lc (cid:105) A A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) . Alice then measures the code block l on register A and communicates the result to Bob. He then performs the incoherent unitary W tl with the permutation Π − tlc conditioned on | c (cid:105) B . This generates the state1 √ C t C t (cid:88) c = | c (cid:105) A | c (cid:105) B | ψ tl c (cid:105) B , (70)which asymptotically approaches the desired rates of Eq. (66). (cid:3) Remark 3.

As noted in the main text, it is still unknown the optimal rate in which eCoBits can be distilled from a pure stateusing LIOCC. Rate triple (66) gives a rate of I ( X : Y ) ∆ ( Ψ ) with an additional coherence output rate of S ( Y | X ) ∆ ( Ψ ) . However, thispoint is not optimal in terms of the eCoBit rate. The reason is that the quantity I ( X : Y ) ∆ ( Ψ ) can be increased by LIOCC. As anexample of this e ﬀ ect, consider the state | Ψ (cid:105) AB = √ (cid:18) | (cid:105) ⊗ ( | (cid:105) + | (cid:105) + | (cid:105) ) + | (cid:105) ⊗ ( | (cid:105) − | (cid:105) + | (cid:105) ) (cid:19) . I ( X : Y ) ∆ ( Ψ ) =

0. However, when Bob performs the incoherent measurement described by Kraus operators { K = | (cid:105)(cid:104) + | + | (cid:105)(cid:104) | , K = | (cid:105)(cid:104)−|} , correlations are generated by measuring in the incoherent bases after Bob obtains outcome K . Hence, this state has a nonzero eCoBit distillation rate.We will now show that optimizing the mutual information I ( X : Y ) ∆ ( Ψ ) over all LIOCC protocols yields the maximum eCoBitdistillation rate E comax . Lemma.

For a pure state | Ψ (cid:105) AB , the optimal distillation rate of | Φ AB (cid:105) is given by E coD ( Ψ ) = lim n →∞ n sup L (cid:88) m p ( m ) I ( X : Y ) ∆ ( Ψ m ) , (71)where the supremum is taken over all LIOCC protocols that generate the multi-outcome transformation | Ψ (cid:105) ⊗ n → { p ( m ) , | Ψ m (cid:105)} sm = . Proof.

First let us prove su ﬃ ciency. Consider any LIOCC protocol L that generates the pure state transformation Ψ ⊗ n →{ p ( m ) , Ψ m } sm = . Fix arbitrary (cid:15), δ >

0. We consider t blocks of Ψ ⊗ n and perform L on each of the blocks. This is a standardtechnique used in quantum Shannon theory, and is often called “double blocking”. For t su ﬃ ciently large, with probability > − (cid:15) the state obtained is (cid:78) sm = Ψ ⊗ tN m m with | N m − p ( m ) | < δ . This follows from the deﬁnition of δ -typicality and Eq. (12). On each Ψ ⊗ tN m m Alice and Bob perform the distillation protocol of Theorem 5, thus generating the state σ m where || σ m − Φ ⊗(cid:98) tN m ( R m − (cid:15) ) (cid:99) AB || < (cid:15) and R m = I ( X : Y ) ∆ ( Ψ m ) . Hence in total we have the transformation Ψ ⊗ nt → (cid:78) sm = σ m where || (cid:78) sm = σ m − Φ ⊗ (cid:80) m (cid:98) tN m ( R m − (cid:15) ) (cid:99) AB || < s (cid:15) ,from which we compute the rate 1 nt (cid:88) m (cid:98) tN m ( R m − (cid:15) ) (cid:99) ≥ n (cid:88) m p ( m ) I ( X : Y ) ∆ ( Ψ − O (cid:18) δ + (cid:15) + sn (cid:19) , (72)where the additional terms come from N m > p ( m ) − δ and the removal of (cid:98)·(cid:99) .We now turn to the converse. Consider any LIOCC distillation protocol transforming Ψ ⊗ n → (cid:80) m p ( m ) ρ m such that F ( (cid:80) m p ( m ) ρ m , Φ ⊗ nRAB ) ≥ − (cid:15) (where ρ ABm need not be pure). Hence,(1 − (cid:15) ) ≤ (cid:88) m p ( m ) F ( ρ m , Φ nRAB ) ≤ (cid:88) m p ( m ) F ( ∆ ( ρ m ) , ∆ ( Φ nRAB )) . (73)Using Fannes’ Inequality (Lemma 1), monotonicity of the trace norm under CPTP maps, and the relation F ( ρ, σ ) ≤ − || ρ − σ || , it is straightforward to show that F ( ∆ ( ρ m ) , ∆ ( Φ nRAB )) ≤ − (cid:32) | I ( X : Y ) ∆ ( ρ m ) − nR | − n log d A d B (cid:33) , where we have also used the fact that I ( X : Y ) ∆ ( Φ nRAB ) = nR . Combining with Eq. (73) gives1 − (1 − (cid:15) ) ≥ (cid:88) m p ( m ) 164 (cid:32) | nR − I ( X : Y ) ∆ ( ρ m ) | − n log d A d B (cid:33) ≥ (cid:32) | nR − (cid:80) m p ( m ) I ( X : Y ) ∆ ( ρ m ) | − n log d A d B (cid:33) . Therefore, we obtain 1 n (cid:88) m p ( m ) I ( X : Y ) ∆ ( ρ m ) ≥ R − d A d B (64[1 − (1 − (cid:15) )]) / − / n . This completes the proof. (cid:3)

Proof of Theorem 6

Theorem 6. E LIOCCD ( Ψ ) = E( Ψ ) . (74)9 Proof.

The protocol is based on decomposition (42c). Quite simply, Alice measures the typical type | t (cid:105) A and codebook | l (cid:105) A .With high probability the post-measurement state will take the form1 (cid:113) C t C t (cid:88) c = | c (cid:105) A W † tl (cid:16) Π tlc | ψ tl c (cid:105) B | c (cid:105) B (cid:17) . (75)This is a maximally entangled state of approaching the desired size of C t → E( Ψ ) as n → ∞ . (cid:3) Proof of Theorem 7

Theorem 7.

A mixed state ρ AB has distillable entanglement if and only if entanglement can be distilled using LIOCC. Proof.

Note that an arbitrary quantum operation can be accomplished by unitary operations and incoherent projective measure-ments. Thus, if L is a general LOCC operation such that L ( ρ ⊗ n ) ≈ Φ AB , then, because of Lemma 2, there exists an LIOCCoperation L I consuming some ﬁnite amount of local coherence that transforms L I ( ρ ⊗ n ) = L ( ρ ⊗ n ) ≈ Φ AB . Therefore, to asymp-totically distill entanglement from ρ by LIOCC, it su ﬃ ces for Alice and Bob to ﬁrst have a su ﬃ cient amount of local coherence.Theorem 2 of Ref. [33] implies that local coherence for either Alice or Bob can be distilled from ρ AB using LIOCC whenever ρ AB is entangled (see Remark below). Hence, Alice and Bob ﬁrst use n A copies of ρ AB to distill a su ﬃ cient amount of localcoherence for Alice and an additional n B copies to distill su ﬃ cient coherence for Bob. They can then implement L I on ρ ⊗ n withhigh precision, thus generating a close of approximation of Φ AB using LIOCC operations and n A + n B + n copies of ρ . (cid:3) Remark 4.

Ref. [33] deals with a more general setting in which the assisting party can perform arbitrary quantum operations.However, the projective POVM described in Theorem 2 of [33] can be implemented incoherently. Indeed, if, say Alice, performsany projective measurement {| b k (cid:105)(cid:104) b k |} d − k = with the | b k (cid:105) being orthonormal, Bob’s post-measurement state will be the same if Alicewere to instead perform the incoherent projective measurement {| k (cid:105)(cid:104) b k |} d − k =0