[PDF] Optimal compression for identically prepared qubit states

Abstract

We establish the ultimate limits to the compression of sequences of identically prepared qubits. The limits are determined by Holevo's information quantity and are attained through use of the optimal universal cloning machine, which finds here a novel application to quantum Shannon theory.

Full PDF

OOptimal Compression for Identically Prepared Qubit States

Yuxiang Yang, Giulio Chiribella,

1, 2 and Masahito Hayashi

3, 4 Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong Canadian Institute for Advanced Research, CIFAR Program inQuantum Information Science, Toronto, Ontario, M5G 1Z8, Canada Graduate School of Mathematics, Nagoya University, Nagoya, Japan Centre for Quantum Technologies, National University of Singapore, Singapore

We establish the ultimate limits to the compression of sequences of identically prepared qubits.The limits are determined by Holevo’s information quantity and are attained through use of theoptimal universal cloning machine, which ﬁnds here a novel application to quantum Shannon theory.

Introduction.

A fundamental feature distinguishingquantum states from classical probability distributions isthe freedom in the choice of basis, which can be used toencode information even when the spectrum of the stateis ﬁxed. States with ﬁxed spectrum can be used, for in-stance, as indicators of spatial directions [1, 2], probesfor frequency estimation [3, 4] or even pieces of cryp-tocurrency [5]. Because of Holevo’s bound [6], the basisinformation cannot be extracted from a single quantumparticle, but becomes accessible when multiple copies ofthe same quantum state are available. Suppose that asender wants to transmit to a receiver the informationcontained in a sequence of n identically prepared par-ticles. In this scenario, an important question is howto minimize the amount of quantum bits (qubits) usedin the transmission, subject to the requirement that theinitial n -particle state can be approximately rebuilt atthe receiver’s end.The compression of identically prepared states hasbeen theoretically studied [7] and experimentally imple-mented [8] in the pure state case. For mixed states, twoof us proposed a protocol [9] that compresses states withﬁxed spectrum and variable basis. The protocol encodes n identically prepared qubits into a memory of 3 / n qubits, which is proven to be the smallest memory sizewhen the decoder is bound by the conservation of thetotal angular momentum. Whether lifting the angularmomentum constraint allows for further compression hasremained an open problem so far. Moreover, little isknown in the case where no prior information is avail-able on the spectrum. Finding the optimal compressionprotocol for general quantum states is important for ap-plications (where the spectrum may be unknown) and forthe foundations of quantum theory, because it providesa characterization of the diﬀerent information content ofquantum states and classical probability distributions.In this Letter we identify the optimal compressionprotocols for sequences of identically prepared qubits.We ﬁrst consider states with known spectrum, devis-ing a compression protocol that stores a sequence of n qubits into a memory of log n qubits, the ultimate limitset by Holevo’s information quantity [6]. The mem-ory reduction from 3 / n to log n qubits is accom- plished through a novel application of the optimal uni-versal cloning machine [10–12], here used to modulate thevalues of the total angular momentum. On average, themodulation is of size √ n and its logarithm is exactly theamount of memory saved by our protocol, compared tothe optimal protocol with angular momentum preservingdecoder [9]. We then address a new compression scenariowhere no prior information about the state is given. Forthis scenario, called full-model compression , we devise aprotocol that uses a hybrid memory of log n qubits and1 / n classical bits. The protocol is optimal; in fact,no further compression can be achieved even if the hybridmemory is replaced by a fully quantum memory. Themain result of the Letter is summarized by the followingtheorem: Theorem 1.

A sequence of n identically prepared qubitstates can be optimally compressed into log n qubits if thespectrum is known and into log n qubits plus / n classical bits if the spectrum is unknown. Comparing the two protocols, we identify log n qubitsas the amount of information contained in the choice ofbasis and 1 / n bits as the information contained inthe spectrum. This interpretation is consistent with thefact that 1 / n is the number of bits needed to faith-fully compress n independent samples of a classical prob-ability distribution over the binary set { , } [13]. Compression protocol for known spectrum.

Consider thecompression of n qubits, independently prepared in thestate ρ g = gρg † , where ρ = p | (cid:105)(cid:104) | + (1 − p ) | (cid:105)(cid:104) | is aﬁxed density matrix and g ∈ SU (2) is a variable unitarymatrix implementing a change of basis. Without loss ofgenerality, we assume p ≥ / p < / n qubits canbe written in the block diagonal form ρ ⊗ ng = n/ (cid:77) J =0 q J (cid:18) ρ g,J ⊗ I m J m J (cid:19) , (1)where the equality holds up to a global unitary trans-formation, known as the Schur transform and eﬃcientlyimplementable on a quantum computer [15]. In Eq. (1), a r X i v : . [ qu a n t - ph ] D ec J is the quantum number of the total angular momen-tum [16], q J is a probability distribution, ρ g,J is a densitymatrix with support in an irreducible space R J , and I m J is the identity matrix on an m J -dimensional multiplicityspace M J [14]. The state ρ g,J can be expressed in theGibbs form [17] ρ g,J = e − βH g,J Tr [ e − βH g,J ] , β = 2 tanh − (2 p − H g,J = U g,J (cid:32) J (cid:88) m = − J − m | J, m (cid:105)(cid:104)

J, m | (cid:33) U † g,J , (2)where {| J, m (cid:105)} Jm = − J are the eigenstates of the z compo-nent of the angular momentum operator and U g,J is theunitary matrix representing the change of basis g in theirreducible space R J .We now show how to optimally compress the states ρ ⊗ ng . In general, a compression protocol consists of twocomponents: the encoder, which stores the input stateinto a memory, and the decoder, which attempts to re-construct the input state from the state of the memory.The encoder and the decoder are both represented bycompletely positive trace preserving linear maps (alsoknown as quantum channels) [18]. Therefore, a quantumcompression protocol is speciﬁed by a couple ( E , D ), con-sisting of the encoding and the decoding channel, respec-tively. The performance of the protocol is determinedby the tradeoﬀ between two quantities: the memory size,quantiﬁed by the dimension d enc of the memory’s Hilbertspace, and the compression error, measured by the worst-case trace distance between the initial state and the staterecovered from the memory (cid:15) = max g ∈ SU (2) (cid:13)(cid:13) D ◦ E (cid:0) ρ ⊗ ng (cid:1) − ρ ⊗ ng (cid:13)(cid:13) , (3)with (cid:107) A (cid:107) := Tr √ A † A . The key issue is to minimize thememory size, while guaranteeing that the compressionerror vanishes in the large n limit.The optimal protocol is based on two ingredients: Theﬁrst is the concentration of the probability distribution q J in Eq. (1). Explicitly, the probability is given by [9] q J = 2 J + 12 J (cid:104) B (cid:16) n J + 1 (cid:17) − B (cid:16) n − J (cid:17)(cid:105) (4)where B ( k ) is the binomial distribution with n + 1 trialsand probability p and J := ( p − / n +1) is close to theaverage value (cid:104) J (cid:105) = (cid:80) J J q J . From the above expres-sion it is clear that the values of J with | J − J | (cid:29) √ n have exponentially small probability in the large n limit.As a result, the performance of a compression protocoldepends only on its action on the subspaces R J ⊗ M J that satisfy the condition | J − J | = O ( √ n ).The second ingredient of our compression protocol is aremarkable property of the optimal universal cloning ma-chine (UCM) [11, 12]. Mathematically, the UCM is de-scribed by a map transforming (operators supported in) FIG. 1.

Optimal compression for known spectrum andcompletely unknown basis.

The encoder collects infor-mation from subspaces with diﬀerent angular momenta andconcentrates it into a system with angular momentum J .The decoder spreads the information back, modulating theangular momentum by √ n units on average. the symmetric subspace of 2 J qubits into (operators sup-ported in) the symmetric subspace of 2 K qubits. Here weallow J to be larger than K , in which case the “cloning”process just consists in getting rid of 2( J − K ) qubits.With this convention, the cloning channel is C J → K ( ρ ) = (cid:40) (cid:16) J +12 K +1 (cid:17) P K ( ρ ⊗ P K − J ) P K J ≤ K Tr J − K ) [ ρ ] J > K (5)where P x is the projector on the symmetric subspace of2 x qubits and Tr x denotes the partial trace over the ﬁrst x qubits. The key to our compression protocol is to regardthe Gibbs states in Eq. (2) as states on the symmetricsubspace of 2 J qubits and to observe that UCM has thefollowing property, derived in the Appendix: Lemma 1.

The universal cloning channel C J → K trans-forms the Gibbs state ρ g,J into the Gibbs state ρ g,K witherror (cid:107)C J → K ( ρ g,J ) − ρ g,K (cid:107) ≤ δ − s + O ( δ ) , (6) where s > is an arbitrary constant and δ := | J − K | /J . This result establishes a bridge between the cloning ofpure states and the compression of mixed states. Lever-aging on Lemma 1 and on the concentration of the proba-bility distribution { q J } , we devise the following protocol: • Encoder.

Perform the Schur transform. Then, mea-sure the quantum number J with the nondemoli-tion measurement that preserves the quantum in-formation in each subspace R J ⊗ M J . Discard themultiplicity register and apply the cloning channel C J → J to the remaining state ρ g,J . Store the out-put state C J → J ( ρ g,J ) into a quantum memory ofdimension d enc = 2 J + 1. • Decoder.

Pick a value K at random with probabil-ity q K and apply the cloning channel C J → K to thequantum memory. Append a multiplicity registerin the maximally mixed state I m K /m K . Finally,perform the inverse of the Schur transform.The protocol, illustrated in Fig. 1, is mathematicallydescribed by the channels E ( ρ ) = n/ (cid:88) J =0 C J → J [Tr M J (Π J ρ Π J )] D ( ρ ) = n/ (cid:77) K =0 q K (cid:20) C J → K ( ρ ) ⊗ I m K m K (cid:21) , (7)where Π J is the projector on R J ⊗M J and Tr M J denotesthe partial trace over M J .The above protocol requires a memory of log(2 J +1) =log n + O (1) qubits. On the other hand, the error is ar-bitrarily small for large n : this is because the states ρ g,J with | J − J | (cid:29) √ n have negligible probability accordingto Eq. (22), while the states ρ g,J with | J − J | = O ( √ n )can be faithfully encoded in the state ρ g,J , thanks toLemma 1 (see the Appendix for more details). Optimality of the protocol with known spectrum.

Our pro-tocol uses the minimum memory size compatible withthe requirement of vanishing error. The argument goesas follows: For a generic ensemble E = { ρ x , p x } , a mea-sure of the information content is provided by Holevo’sinformation [6] χ ( E ) = H (cid:32)(cid:88) x p x ρ x (cid:33) − (cid:88) x p x H ( ρ x ) (8)where H ( ρ ) = − Tr[ ρ log ρ ] is the von Neumann entropy.When the ensemble E is faithfully stored in a quantummemory, the memory should be large enough to accom-modate the Holevo information of E . Since a memory ofdimension d enc can have at most a Holevo informationof log d enc [6], one has the bound log d enc ≥ χ ( E ). For (cid:15) >

0, an approximate version of the bound is [19]log d enc ≥ χ ( E ) − (cid:15) log d E − µ ( (cid:15) ) , (9)where d E is the eﬀective dimension, deﬁned as the rankof the average state ρ E := (cid:80) x p x ρ x , and µ ( (cid:15) ) := − (cid:15) ln (cid:15) .Equation (9) sets a lower bound on the memory size,valid for arbitrary ensembles. However, the bound maynot be tight. Notably, the bound is not tight for theensembles considered in our paper. The reason is thedimension-dependent term log d E , which can be arbitrar-ily large: in our case, we have d E = 2 n for p (cid:54) = 0 , E (cid:48) = { ρ (cid:48) x , p x } is called a suﬃ-cient statistics for the ensemble E = { ρ x , p x } if the statesof E can be encoded into states of E (cid:48) and decoded with zero error. Since the encoding is reversible, the ensem-bles E and E (cid:48) have the same Holevo information, namely χ ( E (cid:48) ) = χ ( E ). Moreover, the number of qubits neededto encode the original ensemble E up to error (cid:15) is equalto the number of qubits needed to encode the ensemble E (cid:48) , up to the same error (see the Appendix for more de-tail). Using these facts, we can improve the bound (9),obtaininglog d enc ≥ χ ( E ) − (cid:15) log d min E − µ ( (cid:15) ) , (10)where d min E is the minimum of d E (cid:48) over all ensembles E (cid:48) that are suﬃcient statistics for E . We call Eq. (10) the Holevo bound for compression .Let us apply the bound to the ensemble E = { ρ ⊗ ng , g. } ,where g. represents the uniform distribution over allchanges of basis. For this ensemble, explicit calculation(provided in the Appendix) yields χ ( E ) = log n + O (1) . (11)A suﬃcient statistics for E is provided by the ensem-ble E (cid:48) = { ρ (cid:48) g , g. } with ρ (cid:48) g := (cid:76) n/ J =0 q J ρ g,J , obtained bygetting rid of the multiplicity spaces in Eq. (1). Theensemble E (cid:48) has eﬀective dimension d E (cid:48) = n/ (cid:88) J =0 (2 J + 1) = (cid:16) n (cid:17) , (12)which has been proven to be the minimum over all suﬃ-cient statistics [9, 21]. Inserting Eqs. (11) and (12) intoEq. (10) we obtain the boundlog d enc ≥ (1 − (cid:15) ) log n + 4 (cid:15) − µ ( (cid:15) ) + O (1) . (13)When (cid:15) is asymptotically small, the leading term is log n ,the number of qubits used by our protocol. Hence, weconclude that the protocol is optimal and that the Holevobound for compression is tight for the ensemble E . Compression protocol for arbitrary qubit states.

Let usnow turn to the full-model compression. A simple pro-tocol for compressing arbitrary states is to measure themagnitude of the total angular momentum, to store theoutcome J in a classical memory and the state ρ g,J in aquantum memory. Since J can take any value between0 and n/

2, this protocol requires (cid:100) log( n/ (cid:101) classicalbits. Moreover, since ρ g,J has support in a (2 J + 1)-dimensional space, the protocol requires (cid:100) log( n + 1) (cid:101) qubits in the worst case scenario. At ﬁrst sight, it seemsdiﬃcult to do any better: One cannot use less than log n qubits, because the input state could consist of n copiesof a random pure state and no protocol can compresssuch a state in less than log n qubits [9]. On the otherhand, J can take n/ n bits. Despitethese facts, we now show that the amount of classicalbits can be cut down by half with asymptotically negli-gible error. The key idea is that the decoder need not FIG. 2.

Optimal full-model compression.

The encoderdisassembles an arbitrary sequence of n identically preparedqubits into a classical part (1 / n bits) and a quantumpart (log n qubits). The decoder recombines these two piecesof information, approximately retrieving the initial state ofthe sequence. have full information about J : thanks to Lemma 1, twostates ρ g,J and ρ g,K with | J − K | = O ( √ n ) are approxi-mately interconvertible. Motivated by this fact, we par-tition the values of J into disjoint intervals L , . . . , L t ofsize O ( √ n ). Instead of encoding the measurement out-come J , we compute the index i such that J ∈ L i andstore it in a classical memory. Since the index i can take O ( √ n ) values, the size of the memory is (1 /

2) log n , in-stead of log n . The details of the protocol are as follows: • Encoder.

Perform the Schur transform. Then, mea-sure the quantum number J with the nondemoli-tion measurement that preserves the quantum in-formation in each subspace R J ⊗ M J . Find theindex i ( J ) such that J ∈ L i ( J ) . Discard the multi-plicity register and send the remaining state ρ g,J tothe input of the quantum channel C J → f ( J ) , where f ( J ) is the median of the subset L i ( J ) . Store theoutput state C J → f ( J ) ( ρ g,J ) in a quantum memoryand the index i ( J ) in a classical memory. • Decoder.

Read the value of i ( J ) from the classicalmemory. For a given value of i ( J ), pick a randomvalue K in the subset L i ( J ) and apply the channel C f ( K ) → K to the quantum memory. Then, appendthe multiplicity register in the maximally mixedstate I m K /m K . Finally, perform the inverse of theSchur transform.The protocol is illustrated in Fig. 2. The explicit expres-sion of the channels E and D , as well as the proof thatthe error vanishes in the large n limit can be found inthe Appendix. Here we emphasize a few points: First,it is convenient to choose one interval—say, L t —to con-tain only the value J = n/

2. In this way, the protocol J FIG. 3.

Spectral distributions of the output stateswith and without sampling.

A comparison of the spec-tral distributions of the following states: the original state ρ ⊗ ng (black, solid line), the output state of the optimal pro-tocol (red, dashed line), and the output state of a protocolwith the same encoder of the optimal protocol and a decoderwithout sampling (blue, dashed line). acts as the identity in the symmetric subspace and purestates are compressed without error. Second, randomsampling in the decoder is essential for achieving vanish-ing error. This fact is illustrated in Fig. 3, which showsthat sampling yields a well-behaved interpolation of thespectral distribution in Eq. (22), while the lack of sam-pling leads to a poor approximation. Third, comparingthe full model compression with the ﬁxed-spectrum com-pression leads us to identify 1 / n bits as the amountof memory needed to store the information about thespectrum. This interpretation is consistent with the factthat 1 / n bits is the size of the smallest classicalmemory needed to faithfully store n samples of a genericprobability distribution over the set { , } [13]. Optimality for the full-model compression:

The optimal-ity of the full-model protocol can be proven with thesame techniques used for ﬁxed spectrum. In fact, an evenstronger result holds: replacing the hybrid memory witha fully quantum memory does not improve the compres-sion, because 3 / n qubits is the minimum memorysize allowed by the Holevo bound for compression. Thedetails are provided in the Appendix. Conclusion:

In this Letter we showed how to compressidentically prepared qubits in the smallest possible mem-ory. The key technique is the use of universal cloningto convert Gibbs states of diﬀerent angular momentum.Converting Gibbs states is a novel application of quan-tum cloning [22–24] and may inspire further applicationsin the resource theory of quantum thermodynamics, bothin the free [25] and in the size-restricted case [26]. Ex-tending our results, it is also interesting to investigatethe relation between cloning and compression for otherfamilies of states, such as phase [27, 28] and mirror-phase[29] covariant states, and mixed states of arbitrary ﬁnitedimensional systems [9]. The recent implementations ofvarious quantum cloning machines [30–33] suggests thatprototypes of optimal compression may be experimen-tally demonstrated in the near future.We acknowledge the referees of this Letter for usefulsuggestions that helped improve the presentation. G.C. is supported by the Canadian Institute for AdvancedResearch (CIFAR), by the Hong Kong Research GrantCouncil through Grant No. 17326616, by National Sci-ence Foundation of China through Grant No. 11675136,and by the HKU Seed Funding for Basic Research. Y.Y. is supported by a Hong Kong and China Gas Schol-arship. M. H. is partially supported by a MEXT Grant-in-Aid for Scientiﬁc Research (A) No. 23246071 and theOkawa Research Grant. Centre for Quantum Technolo-gies is a Research Centre of Excellence funded by theMinistry of Education and the National Research Foun-dation of Singapore. This work was completed duringthe “Hong Kong Workshop on Quantum Information andFoundations, organized with support from the Founda-tional Question Institute (FQXi-MGA-1502). [1] E. Bagan, M. A. Ballester, R. D. Gill, A. Monras, andR. Mu˜noz Tapia, Physical Review A , 032301 (2006).[2] G. Chiribella, G. M. D’Ariano, C. Macchiavello,P. Perinotti, and F. Buscemi, Physical Review A ,012315 (2007).[3] S. F. Huelga, C. Macchiavello, T. Pellizzari, A. K. Ekert,M. Plenio, and J. Cirac, Physical Review Letters ,3865 (1997).[4] A. Smirne, J. Ko(cid:32)lody´nski, S. F. Huelga, andR. Demkowicz-Dobrza´nski, Physical Review Letters ,120801 (2016).[5] F. Pastawski, N. Y. Yao, L. Jiang, M. D. Lukin, and J. I.Cirac, Proceedings of the National Academy of Sciences , 16079 (2012).[6] A. S. Holevo, Problemy Peredachi Informatsii , 3 (1973).[7] M. Plesch and V. Buˇzek, Physical Review A , 032317(2010).[8] L. A. Rozema, D. H. Mahler, A. Hayat, P. S. Turner, andA. M. Steinberg, Physical Review Letters , 160504(2014).[9] Y. Yang, G. Chiribella, and D. Ebler, Physical ReviewLetters , 080501 (2016).[10] V. Buˇzek and M. Hillery, Physical Review A , 1844(1996).[11] N. Gisin and S. Massar, Physical Review Letters , 2153(1997). [12] R. F. Werner, Physical Review A , 1827 (1998).[13] B. S. Clarke and A. R. Barron, IEEE Transactions onInformation Theory , 453 (1990).[14] W. Fulton and J. Harris, Representation Theory , Vol.129 (Springer Science & Business Media, New York, US,1991).[15] D. Bacon, I. L. Chuang, and A. W. Harrow, PhysicalReview Letters , 170502 (2006).[16] For concreteness, here we assume n to be even and J tobe integer, but all the arguments hold also for odd n andsemi-integer J .[17] J. Cirac, A. Ekert, and C. Macchiavello, Physical ReviewLetters , 4344 (1999).[18] A. S. Holevo, Statistical structure of quantum theory ,Vol. 67 (Springer Science & Business Media, Berlin, Ger-many, 2001).[19] M. Wilde, in

Quantum Information Theory (CambridgeUniversity Press, Cambridge, England, 2013) Chap. 18.[20] D. Petz, Communications in Mathematical Physics ,123 (1986).[21] M. Koashi and N. Imoto, Physical Review Letters ,017902 (2001).[22] V. Scarani, S. Iblisdir, N. Gisin, and A. Ac´ın, Reviewsof Modern Physics , 1225 (2005).[23] N. J. Cerf and J. Fiurasek, Progress in Optics , 455(2006).[24] H. Fan, Y.-N. Wang, L. Jing, J.-D. Yue, H.-D. Shi, Y.-L.Zhang, and L.-Z. Mu, Physics Reports , 241 (2014).[25] F. G. S. L. Brandao, M. Horodecki, J. Oppenheim, J. M.Renes, and R. W. Spekkens, Physical Review Letters , 250404 (2013).[26] H. Tajima and M. Hayashi, arXiv preprintarXiv:1405.6457 (2014).[27] D. Bruß, M. Cinchetti, G. Mauro D’Ariano, and C. Mac-chiavello, Phys. Rev. A , 012302 (2000).[28] F. Buscemi, G. M. D’Ariano, C. Macchiavello, andP. Perinotti, Physical Review A , 042309 (2006).[29] K. Bartkiewicz, A. Miranowicz, and S¸. K. ¨Ozdemir,Physical Review A , 032306 (2009).[30] E. Nagali, D. Giovannini, L. Marrucci, S. Slussarenko,E. Santamato, and F. Sciarrino, Physical Review Letters , 073602 (2010).[31] H. Chen, D. Lu, B. Chong, G. Qin, X. Zhou, X. Peng,and J. Du, Physical Review Letters , 180404 (2011).[32] K. Bartkiewicz, K. Lemr, A. ˇCernoch, J. Soubusta, andA. Miranowicz, Physical Review Letters , 173601(2013).[33] W.-B. Wang, C. Zu, L. He, W.-G. Zhang, and L.-M.Duan, Scientiﬁc Reports (2015).[34] M. Hayashi, Communications in Mathematical Physics , 171 (2010). Proof of Lemma 1.

In this section, we show that the universal cloning channel C J → K transforms the Gibbs state ρ g,J into an approxi-mation of the Gibbs state ρ g,K , which becomes accurate when | J − K | /J is small. Speciﬁcally, we show that the errorsatisﬁes the bound 12 (cid:107)C J → K ( ρ g,J ) − ρ g,K (cid:107) ≤ δ − s O ( δ s )] , δ := | J − K | J , (14)valid for arbitrary g ∈ SU (2) and arbitrary s > (cid:107)C J → K ( ρ g,J ) − ρ g,K (cid:107) = (cid:107)C J → K ( ρ e,J ) − ρ e,K (cid:107) ∀ g ∈ SU (2) , (15)where e is the identity element in SU (2). Hence, it is enough to show the bound12 (cid:107)C J → K ( ρ J ) − ρ K (cid:107) ≤ δ − s O ( δ s )] , (16)with ρ J := ρ e,J and ρ K := ρ e,K . To prove this bound, we use the expansion ρ J = ( N J ) − J (cid:88) m = − J p J + m (1 − p ) J − m | J, m (cid:105)(cid:104)

J, m | , (17)where N J is the normalization constant given by N J = J (cid:88) j = − J p J + j (1 − p ) J − j = p J +1 − (cid:16) − pp (cid:17) J +1 p − . (18)In the following we will analyze the cases J ≤ K and J > K separately.

The J ≤ K case. We begin by checking the action of C J → K on the projectors | J, m (cid:105)(cid:104)

J, m | . For J ≤ K we have C J → K ( | J, m (cid:105)(cid:104)

J, m | ) = (cid:18) J + 12 K + 1 (cid:19) P K ( | J, m (cid:105)(cid:104)

J, m | ⊗ P K − J ) P K = (cid:18) J + 12 K + 1 (cid:19) (cid:88) k (cid:18) K − JK − J + k − m (cid:19)(cid:18) JJ + m (cid:19)(cid:18) KK + k (cid:19) − | K, k (cid:105)(cid:104)

K, k | . Note that we have the equality (cid:104)

K, K + m − J | C J → K ( | J, m (cid:105) (cid:104)

J, m | ) | K, K + m − J (cid:105) = (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) . Therefore, we can express C J → K ( | J, m (cid:105)(cid:104)

J, m | ) as C J → K ( | J, m (cid:105)(cid:104)

J, m | ) = (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) | K, K + m − J (cid:105)(cid:104) K, K + m − J | + (cid:34) − (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) (cid:35) σ J,K where σ J,K is a suitable quantum state. Combining the above equation with Eq. (17), we obtain C J → K ( ρ J ) = J (cid:88) m = − J p J + m (1 − p ) J − m N J (cid:40)(cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) | K, K + m − J (cid:105)(cid:104) K, K + m − J | + (cid:34) − (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) (cid:35) σ J,K (cid:41) . Now, we focus on the entries with m ∈ [ J − (cid:98) δ s (cid:99) , J ], where s > C J → K ( ρ J ) = J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) | K, K + m − J (cid:105)(cid:104) K, K + m − J | + µ J,K , where µ J,K is a positive operator with traceTr[ µ J,K ] = 1 − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) . Next, substituting J − m with k , we have C J → K ( ρ J ) = (cid:98) δ − s (cid:99) (cid:88) k =0 p J − k (1 − p ) k N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) | K, K − k (cid:105)(cid:104) K, K − k | + µ J,K . Using the expression (17) for ρ K , we bound the error as12 (cid:107)C J → K ( ρ J ) − ρ K (cid:107) = 12 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:98) δ − s (cid:99) (cid:88) k =0 (1 − p ) k (cid:34) p J − k N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − p K − k N K (cid:35) | K, K − k (cid:105)(cid:104) K, K − k | + µ J,K − K (cid:88) k = (cid:98) δ − s (cid:99) +1 p K − k (1 − p ) k N K | K, K − k (cid:105)(cid:104) K, K − k | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:98) δ − s (cid:99) (cid:88) k =0 (1 − p ) k p − k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 12 Tr[ µ J,K ] + 12 K (cid:88) k = (cid:98) δ − s (cid:99) +1 p K − k (1 − p ) k N K ≤ p p − k ∈ [0 , (cid:98) δ − s (cid:99) ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 12 Tr[ µ J,K ] + (cid:18) − pp (cid:19) (cid:98) δ − s (cid:99) Since p > / s >

0, it is obvious that the third term in the last inequality vanishes exponentially in J , and weneed only to show that the ﬁrst term and the second term also vanish as J grows.For the ﬁrst error term, we have the following expansion: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J − K (cid:18) J + 12 K + 1 (cid:19) N K N J (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) J + 12 K + 1 (cid:19) − (cid:16) − pp (cid:17) K +1 − (cid:16) − pp (cid:17) J +1 (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) J + 12 K + 1 (cid:19) − (cid:16) − pp (cid:17) K +1 − (cid:16) − pp (cid:17) J +1 e k ln ( JK ) +(2 K − k +1) ln ( − k K ) − (2 J − k +1) ln ( − k J ) + O ( J ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , the third line coming from Eq. (18). Recalling that δ = ( K − J ) /J , it is straightforward to verify that2 J + 12 K + 1 = 1 − δ + O ( δ ) e k ln( J/K ) = 1 − kδ + O ( kδ ) e (2 K − k +1) ln[1 − k/ (2 K )] − (2 J − k +1) ln[1 − k/ (2 J )] = 1 − k δ K + O (cid:0) k δJ − (cid:1) . Substituting the above equations into the expression of the ﬁrst error term, we havemax k ∈ [0 , (cid:98) δ − s (cid:99) ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:18) J + 12 K + 1 (cid:19) (cid:0) Jk (cid:1)(cid:0) Kk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K max k ∈ [0 , (cid:98) δ − s (cid:99) ] (cid:12)(cid:12)(cid:12)(cid:12) − δ − kδ − k δ K + O (cid:0) k δJ − (cid:1) + O ( kδ ) + O ( J − ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ p K N K (cid:2) δ − s + O ( δ ) + O ( J − ) (cid:3) = 2 p − p · δ − s [1 + O ( δ s )] . For the second error term, we haveTr[ µ J,K ] = 1 − ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m (cid:18) J + 12 K + 1 (cid:19) (cid:0) JJ − m (cid:1)(cid:0) KJ − m (cid:1) ≤ − ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m (cid:18) J + 12 K + 1 (cid:19) min m (cid:48) ∈ [ J −(cid:98) δ − s (cid:99) ,J ] (cid:0) JJ − m (cid:48) (cid:1)(cid:0) KJ − m (cid:48) (cid:1) ≤ − ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m (cid:18) J + 12 K + 1 (cid:19) = 1 − − (cid:16) − pp (cid:17) δ − s +1 − (cid:16) − pp (cid:17) J +1 (cid:18) J + 12 K + 1 (cid:19) ≤ δ, which vanishes as J grows. Finally, combining the above calculations, the error of the conversion can be bounded as12 (cid:107)C J → K ( ρ J ) − ρ K (cid:107) ≤ δ − s O ( δ s )] , , for any s >

0. Since s can be chosen to be arbitrarily small, the leading order of the error is close to δ . The

J > K case.

In this case, the action of C J → K on the projectors | J, m (cid:105)(cid:104)

J, m | is C J → K ( | J, m (cid:105)(cid:104)

J, m | ) = (cid:88) k (cid:18) J − KJ − K + m − k (cid:19)(cid:18) KK + k (cid:19)(cid:18) JJ + m (cid:19) − | K, k (cid:105)(cid:104)

K, k | . Notice that (cid:104)

K, K + m − J |C J → K ( | J, m (cid:105)(cid:104)

J, m | ) | K, K + m − J (cid:105) = (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) . Therefore, we can express C J → K ( | J, m (cid:105)(cid:104)

J, m | ) as C J → K ( | J, m (cid:105)(cid:104)

J, m | ) = (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) | K, K + m − J (cid:105)(cid:104) K, K + m − J | + (cid:34) − (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) (cid:35) σ J,K where σ J,K is a suitable quantum state. Combining the above equation with Eq. (17), we have C J → K ( ρ J ) = ( N J ) − J (cid:88) m = − J p J + m (1 − p ) J − m (cid:40) (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) | K, K + m − J (cid:105)(cid:104) K, K + m − J | + (cid:34) − (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) (cid:35) σ J,K (cid:41) . Again, we focus on the entries with m ∈ [ J − (cid:98) δ − s (cid:99) , J ] for a parameter s > C J → K ( ρ J ) = ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) | K, K + m − J (cid:105)(cid:104) K, K + m − J | + µ J,K . Here µ J,K is a positive operator with trace Tr[ µ J,K ] = 1 − ( N J ) − (cid:80) Jm = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m ( KJ − m )( JJ − m ) . Next, substi-tuting J − m with k , we have C J → K ( ρ J ) = ( N J ) − (cid:98) δ − s (cid:99) (cid:88) k =0 p J − k (1 − p ) k (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) | K, K − k (cid:105)(cid:104) K, K − k | + µ J,K . Using Eq. (17) for ρ K , we bound the error as12 (cid:107)C J → K ( ρ J ) − ρ K (cid:107) = 12 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:98) δ − s (cid:99) (cid:88) k =0 (1 − p ) k (cid:34) p J − k N J (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − p K − k N K (cid:35) | K, K − k (cid:105)(cid:104) K, K − k | + µ J,K − K (cid:88) k = (cid:98) δ − s (cid:99) +1 p K − k (1 − p ) k N K | K, K − k (cid:105)(cid:104) K, K − k | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:98) δ − s (cid:99) (cid:88) k =0 (1 − p ) k p − k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 12 Tr[ µ J,K ] + 12 K (cid:88) k = (cid:98) δ − s (cid:99) +1 p K − k (1 − p ) k N K ≤ p p − k ∈ [0 , (cid:98) δ − s (cid:99) ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 12 Tr[ µ J,K ] + (cid:18) − pp (cid:19) (cid:98) δ − s (cid:99) Since p > / s >

0, it is obvious that the third term in the last inequality vanishes exponentially in J , and weneed only to show that the ﬁrst term and the second term also vanish as J grows.For the ﬁrst error term, we have the following expansion since k (cid:28) J : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J − K N K N J (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:16) − pp (cid:17) K +1 − (cid:16) − pp (cid:17) J +1 (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:16) − pp (cid:17) K +1 − (cid:16) − pp (cid:17) J +1 e k ln( K/J )+(2 J − k +1) ln[1 − k/ (2 J )] − (2 K − k +1) ln[1 − k/ (2 K )]+ O ( J − ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Recalling that δ = ( J − K ) /J , it is straightforward to verify that e k ln( K/J ) = 1 − kδ + O ( kδ ) e (2 J − k +1) ln[1 − k/ (2 J )] − (2 K − k +1) ln[1 − k/ (2 K )] = 1 − k δ J + O (cid:0) k δJ − (cid:1) . Substituting the above equations into the expression of the ﬁrst error term, we havemax k ∈ [0 , (cid:98) δ − s (cid:99) ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p J N J (cid:0) Kk (cid:1)(cid:0) Jk (cid:1) − p K N K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = p K N K max k ∈ [0 , (cid:98) δ − s (cid:99) ] (cid:12)(cid:12)(cid:12)(cid:12) − kδ − k δ J + O (cid:0) k δJ − (cid:1) + O ( J − ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ p K N K (cid:2) δ − s + O ( J − ) (cid:3) = 2 p − p · δ − s [1 + O ( δ s )] . µ g,J,K ] = 1 − ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m (cid:0) KJ − m (cid:1)(cid:0) JJ − m (cid:1) ≤ − ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m min m (cid:48) ∈ [ J −(cid:98) δ − s (cid:99) ,J ] (cid:0) KJ − m (cid:48) (cid:1)(cid:0) JJ − m (cid:48) (cid:1) ≤ − ( N J ) − J (cid:88) m = J −(cid:98) δ − s (cid:99) p J + m (1 − p ) J − m = 1 − − (cid:16) − pp (cid:17) (cid:98) δ − s (cid:99) +1 − (cid:16) − pp (cid:17) J +1 ≤ (cid:18) − pp (cid:19) (cid:98) δ − s (cid:99) +1 , which vanishes exponentially fast as J grows. Finally, combining the above calculations, the error of the conversioncan be bounded as 12 (cid:107)C J → K ( ρ J ) − ρ K (cid:107) ≤ δ − s O ( δ s )] . (19)for any s > Precision analysis for known spectrum.

The compression protocol for known spectrum is characterized by the couple ( E , D ), where the encoding channel is E ( ρ ) = n/ (cid:88) J =0 C J → J [Tr M J (Π J ρ Π J )]where Π J is the projector on R J ⊗ M J and Tr M J is the partial trace over M J . The decoding channel is D ( σ ) = n/ (cid:77) K =0 q K (cid:20) C J → K ( σ ) ⊗ I m K m K (cid:21) . It is then straightforward to check that, when the input state is ρ ⊗ ng , the output state of the protocol will be D ◦ E (cid:0) ρ ⊗ ng (cid:1) = (cid:77) K q K (cid:34)(cid:88) J q J ( C J → K ◦ C J → J ) ( ρ g,J ) ⊗ I m K m K (cid:35) . Now we evaluate the performance of the protocol. The error can be expressed and bounded as in the following. (cid:15) = max g ∈ SU (2) (cid:13)(cid:13) D ◦ E (cid:0) ρ ⊗ ng (cid:1) − ρ ⊗ ng (cid:13)(cid:13) = max g (cid:88) K q K (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ρ g,K − (cid:88) J q J ( C J → K ◦ C J → J ) ( ρ g,J ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:88) J,K q J q K (cid:107) ρ K − ( C J → K ◦ C J → J ) ( ρ J ) (cid:107) , having used the covariance of the universal cloning channel.1Now, recall that, for large n , the distribution { q J } is peaked around J . Using this fact, we can deﬁne the set S := [ J − √ n s , J + √ n s ] (20)for some positive parameter s to be speciﬁed later, so that lim n →∞ (cid:80) J (cid:54)∈ S q J = 0. Then, we continue bounding theerror as (cid:15) ≤ (cid:88) J (cid:54)∈ S ,K q J q K + 12 (cid:88) K (cid:54)∈ S ,J q J q K + 12 (cid:88) J,K ∈ S q J q K (cid:107) ρ K − ( C J → K ◦ C J → J ) ( ρ J ) (cid:107) = (cid:88) J (cid:54)∈ S q J + 12 (cid:88) J,K ∈ S q J q K (cid:107) ρ K − ( C J → K ◦ C J → J ) ( ρ J ) (cid:107) ≤ (cid:88) J (cid:54)∈ S q J + max J,K ∈ S (cid:107) ρ K − C J → K ( ρ J ) (cid:107) , (21)where the last inequality comes from the bound (cid:107) ρ K − ( C J → K ◦ C J → J ) ( ρ J ) (cid:107) ≤ (cid:107) ρ K − C J → K ( ρ J ) (cid:107) + (cid:107)C J → K ( ρ J ) − ( C J → K ◦ C J → J ) ( ρ J ) (cid:107) ≤ (cid:107) ρ K − C J → K ( ρ J ) (cid:107) + (cid:107) ρ J − C J → J ( ρ J ) (cid:107) ≤ J,K ∈ S (cid:107) ρ K − C J → K ( ρ J ) (cid:107) . Now, we show that both terms in Eq. (21) vanish in the large n limit. To handle the ﬁrst term, we use the explicitexpression of q J [9], whose derivation is provided here for convenience of the reader: q J = Tr (cid:2) Π J ρ ⊗ ng (cid:3) = m J J (cid:88) m = − J p n/ m (1 − p ) n/ − m = (2 J + 1)[ p n/ J +1 (1 − p ) n/ − J − p n/ − J (1 − p ) n/ J +1 ](2 p − n + 1) (cid:18) n + 1 n/ J + 1 (cid:19) , having used the expression of the multiplicity m J = (2 J + 1) (cid:0) n +1 n/ J +1 (cid:1) / ( n + 1). Rearranging the terms we get q J = 2 J + 12 J (cid:104) B (cid:16) n J + 1 (cid:17) − B (cid:16) n − J (cid:17)(cid:105) (22)where B ( k ) = p k (1 − p ) n − k (cid:0) nk (cid:1) and J = ( p − / n + 1).Using Eq. (22), we then have (cid:88) J (cid:54)∈ S q J = 1 − J + √ n s (cid:88) J = J −√ n s J + 12 J (cid:104) B (cid:16) n J + 1 (cid:17) − B (cid:16) n − J (cid:17)(cid:105) ≤ − J − √ n s + 12 J J + √ n s (cid:88) J = J −√ n s B (cid:16) n J + 1 (cid:17) + 2 J + 2 √ n s + 12 J J + √ n s (cid:88) J = J −√ n s B (cid:16) n − J (cid:17) ≤ − J − √ n s + 12 J (cid:2) − (cid:0) − n s p − (cid:1)(cid:3) + 2 exp (cid:34) − (cid:18) − pp (cid:19) n (cid:35) ≤ √ n s J + 2 exp (cid:18) − n s p (cid:19) + 2 exp (cid:34) − (cid:18) − pp (cid:19) n (cid:35) , (23)having used the Hoeﬀding’s inequality in the second last inequality. From the above inequalities, it is clear that forany positive threshold we can choose an s small enough so that this term is bounded by the threshold for large enough n . On the other hand, we notice that J ≈ K for any J, K ∈ S , and thus the second error term also vanishes.Substituting δ ≤ (2 √ n s ) /J into Eq. (16), we get thatmax J,K ∈ S (cid:107) ρ g,K − C J → K ( ρ g,J ) (cid:107) ≤ n − − s + s (cid:48) + O (cid:16) n − − s (cid:17) ∀ s (cid:48) > . (24)Summarizing from Eq. (23) and Eq. (24), we have shown that (cid:15) ≤ O (cid:16) n − + s (cid:17) for arbitrarily small s > Elementary properties of suﬃcient statistics

Here we complete the argument given in the main text, showing that if E (cid:48) is a suﬃcient statistics for E , then i) E and E (cid:48) have the same Holevo information and ii) E can be stored in a memory of q qubits with error (cid:15) if and only if E (cid:48) can be stored in a memory of the same size, with the same error.By deﬁnition, the fact that E (cid:48) is a suﬃcient statistics means that there exist encoding and decoding channels ( E , D )such that reversible map R from any state ρ x ∈ E to the state ρ (cid:48) x ∈ E (cid:48) , in formula E ( ρ x ) = ρ (cid:48) x and D ( ρ (cid:48) x ) = ρ x , (25)for every possible x . Using the above relation, it is easy to show that every compression protocol for the ensemble E —say, ( E , D )—can be turned into a compression protocol for the ensemble E (cid:48) —call it ( E (cid:48) , D (cid:48) )—by deﬁning E (cid:48) := E ◦ D and D (cid:48) := E ◦ D . Likewise, every compression protocol for E (cid:48) —say ( E (cid:48) , D (cid:48) )—can be turned into a compression protocol for E —call it( E , D )—by deﬁning E := E (cid:48) ◦ E and D := D ◦ D . (26)Hence, the ensembles E and E (cid:48) can be compressed in the same quantum memory with the same amount of error.Moreover, Eqs. (25) and the monotonicity of Holevo’s information imply the relations χ ( E (cid:48) ) ≤ χ ( E ) and χ ( E ) ≤ χ ( E (cid:48) ), whence χ ( E (cid:48) ) ≡ χ ( E ). Optimality of the compression protocol for known spectrum.

In this section we present the complete prove for the optimality of our protocol for compressing qubit stateswith known spectrum. We choose the suﬃcient statistics E (cid:48) = { (cid:76) J q J ρ g,J , g. } , which has eﬀective dimension d E (cid:48) =( n/ . Recall from the Letter the following boundlog d enc ≥ χ ( E ) − (cid:15) log n + 4 (cid:15) − µ ( (cid:15) ) + O (1) . (27)Next, explicit calculation shows that the Holevo information of the ensemble E can be expressed as χ ( E ) = − nH ( ρ g ) + H ( { q J } ) + (cid:88) J q J [log(2 J + 1) + log m J ] (28)From a previous work [see Eqs. (7), (10) and (11) of [34]], we know that (cid:88) J q J [log(2 J + 1) + log m J ] = 12 log n + nH ( ρ g ) + O (1) . (29)For the entropy of the probability distribution { q J } , we ﬁrst notice that by deﬁnition [cf. Eq. (22)], the entropy of { q J } is H ( { q J } ) = − (cid:88) J q J (cid:26) log 2 J + 12 J + log B (cid:16) n J + 1 (cid:17) + log (cid:20) − B ( n/ − J ) B ( n/ J + 1) (cid:21)(cid:27) . (30)Next, we calculate the three terms in Eq. (30) separately. Notice from Eq. (6) of [34] that asymptotically the ﬁrstterm is − (cid:88) J q J log 2 J + 12 J = log(2 J ) − (cid:88) J q J log(2 J + 1)= log( n + 1) + log(2 p − − log(2 p − − log n + o (1)= o (1) , (31)3which vanishes with the growth of n . By explicit expanding the binomial distribution, the second term can becalculated as − (cid:88) J q J log B (cid:16) n J + 1 (cid:17) = − (cid:88) J q J log (cid:20) p p ( n/ J +1) (1 − p ) (1 − p )( n/ − J ) (cid:18) n + 1 n + J + 1 (cid:19)(cid:21) = − p log p (cid:88) J q J (cid:16) n J + 1 (cid:17) − (1 − p ) log(1 − p ) (cid:88) J q J (cid:16) n − J (cid:17) − (cid:88) J q J log (cid:18) n + 1 n + J + 1 (cid:19) = nH ( { p, − p } ) − (cid:88) J q J log (cid:18) n n + J (cid:19) + O (1)= nH ( { p, − p } ) − nH ( { p, − p } ) + 12 log n + O (1)= 12 log n + O (1) , (32)having used Eq. (11) of [34] in the second last step. Finally, the last term in Eq. (30) can be evaluated as − (cid:88) J q J log (cid:20) − B ( n/ − J ) B ( n/ J + 1) (cid:21) = − (cid:88) J q J log (cid:34) − (cid:18) − pp (cid:19) J +1 n + 2 J + 2 n − J (cid:35) = O (cid:34)(cid:18) − pp (cid:19) J (cid:35) = o (1) . (33)Substituting Eqs. (31), (32) and (33) into Eq. (30), we immediately get that H ( { q J } ) = 12 log n + O (1) . (34)Substituting Eqs. (28), (29) and (34) into Eq. (27), we bound the memory size aslog d enc ( E ) ≥ log n − (cid:15) log n + 4 (cid:15) − µ ( (cid:15) ) + O (1) . (35)When (cid:15) is vanishing, the leading order in the bound (35) is log n . We thus conclude that our protocol for theknown-spectrum compression is asymptotically optimal. Precision analysis for the full model compression

Let us ﬁrst recall the details of the compression protocol. The protocol uses a partition of the set { , . . . , n/ } into t = O ( √ n ) intervals L , . . . L t , deﬁned as follows: L m = { ( m − (cid:98) r √ n (cid:99) , . . . , m (cid:98) r √ n (cid:99) − } , m = 1 , . . . , t − L t = { n/ } , where r is a parameter, chosen so that (cid:98) r √ n (cid:99) × ( t −

1) = n/

2. We denote by

Med = {(cid:98) r √ n (cid:99) / , (cid:98) r √ n (cid:99) / , . . . } the collection of all medians of these subsets. In the encoder, we measure the total spin using the POVM { Π J } J andstore the index i ( J ) that J ∈ L i ( J ) . For convenience, we deﬁne a map f which takes any J ∈ { , . . . , n/ } to themedian of the subset containing J , formally deﬁned as f : J → J med ∈ Med s . t . J med ∈ L i ( J ) . E ( ρ ) := n/ (cid:88) J =0 C J → f ( J ) (Tr M J [Π J ρ Π J ]) ⊗ | i ( J ) (cid:105)(cid:104) i ( J ) | . The decoding channel is D (cid:32)(cid:88) i σ i ⊗ | i (cid:105)(cid:104) i | (cid:33) := (cid:77) K ∈ L i | L i | (cid:20) C f ( K ) → K ( σ i ) ⊗ I m K m K (cid:21) Note that pure states are compressed with zero error. Indeed, when the state ρ g is pure ( p = 1 or p = 0), the state ρ ⊗ ng is contained in the symmetric subspace, with J = n/

2. By the deﬁnition of E and D , we have D ◦ E ( ρ n/ ) = ρ n/ for every state ρ n/ with support in the symmetric subspace.Let us focus now on the mixed state case (0 < p < D ◦ E )( ρ ⊗ ng ) = n/ (cid:77) J =0  (cid:88) K ∈ L i ( J ) q K | L i ( J ) | C f ( J ) → J ◦ C K → f ( J ) ( ρ g,K )  ⊗ I m J m J . Noticing that the encoder and the decoder fare equally well on all input states, the error of the protocol can be writtenas (cid:15) = max g (cid:13)(cid:13) ( D ◦ E )( ρ ⊗ ng ) − ρ ⊗ ng (cid:13)(cid:13) = 12 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n/ (cid:77) J =0  (cid:88) K ∈ L i ( J ) q K | L i ( J ) | C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  ⊗ I m J m J − n/ (cid:77) J =0 q J (cid:18) ρ J ⊗ I m J m J (cid:19)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 12 (cid:88) J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K | L i ( J ) | C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  − q J ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . To further bound the error, we shall use the concentration property of the distribution { q J } . Explicitly, we deﬁne aset S as S = (cid:8) (cid:98) J − cr √ n (cid:99) , . . . , (cid:98) J + cr √ n (cid:99) (cid:9) with a parameter c > | S | . For any t > c to be large enough that lim n →∞ (cid:80) J (cid:54)∈ S q J = 0as shown later. Separating the tail error term from the rest, we get that (cid:15) = 12 (cid:88) J ∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K | L i ( J ) | C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  − q J ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 12 (cid:88) J (cid:54)∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K | L i ( J ) | C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  − q J ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . We further split the error within S into two terms: the ﬁrst error term is the imprecision of the adapter, while thesecond error term is the error of the interpolation. Precisely, we have: (cid:15) ≤ (cid:15) + (cid:15) + (cid:15) (36) (cid:15) = 12 (cid:88) J ∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99) C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  −  (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (37) (cid:15) = 12 (cid:88) J ∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  ρ J − q J ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (38) (cid:15) = 12 (cid:88) J (cid:54)∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99) C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  − q J ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . (39)5Now, we show the details of bounding each of these three error terms respectively. First, the error term (cid:15) , namelythe imprecision of the adapter, can be upper bounded as (cid:15) ≤ (cid:88) J ∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99) C f ( J ) → J ◦ C K → f ( J ) ( ρ K )  −  (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  C f ( J ) → J (cid:0) ρ f ( J ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 12 (cid:88) J ∈ S (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  C f ( J ) → J (cid:0) ρ f ( J ) (cid:1) −  (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  ρ J (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤  n/ (cid:88) J =0 (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  (cid:26) max J ∈ S (cid:13)(cid:13) C f ( J ) → J (cid:0) ρ f ( J ) (cid:1) − ρ J (cid:13)(cid:13) + max J ∈ S (cid:13)(cid:13) C J → f ( J ) ( ρ J ) − ρ f ( J ) (cid:13)(cid:13) (cid:27) = 12 (cid:26) max J ∈ S (cid:13)(cid:13) C f ( J ) → J (cid:0) ρ f ( J ) (cid:1) − ρ J (cid:13)(cid:13) + max J ∈ S (cid:13)(cid:13) C J → f ( J ) ( ρ J ) − ρ f ( J ) (cid:13)(cid:13) (cid:27) ≤ max J ∈ S max K ∈ L i ( J ) (cid:107)C J → K ( ρ J ) − ρ K (cid:107) ≤ (cid:18) r √ n (cid:19) − s + O (cid:18) r √ n (cid:19) ∀ s > , (40)having used Eq. (16) in the last step. Second, the error term (cid:15) , namely the error of the interpolation, can be upperbounded as (cid:15) = 12 (cid:88) J ∈ S (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99)  − q J (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:32) (cid:88) J ∈ S q J (cid:33) max J ∈ S max K ∈ L i ( J ) (cid:12)(cid:12)(cid:12)(cid:12) q J q K − (cid:12)(cid:12)(cid:12)(cid:12) ≤

12 max J ∈ S max K ∈ L i ( J ) (cid:12)(cid:12)(cid:12)(cid:12) q K q J − (cid:12)(cid:12)(cid:12)(cid:12) . (41)Now, by Eq. (22) we have q K q J = 2 K + 12 J + 1 · B (cid:0) n + K + 1 (cid:1) − B (cid:0) n − K (cid:1) B (cid:0) n + J + 1 (cid:1) − B (cid:0) n − J (cid:1) . We further notice that, by the De Moivre-Laplace theorem, the binomial B ( k ) can be approximated by a Gaussianfor J ∈ S and for large n . Precisely we have B (cid:16) n J + 1 (cid:17) = 1 (cid:112) πnp (1 − p ) exp (cid:20) − ( J − J ) np (1 − p ) (cid:21) (cid:20) O (cid:18) √ n (cid:19)(cid:21) . Moreover, noticing that the term B (cid:0) n − J (cid:1) is exponentially small compared to B (cid:0) n + J + 1 (cid:1) , we have q K q J ≥ J − ( c + 1) r √ nJ − cr √ n (cid:26) − cr p (1 − p ) + O ( c r ) + O (cid:18) √ n (cid:19)(cid:27) (42) q K q J ≤ J − cr √ nJ − ( c + 1) r √ n (cid:26) cr p (1 − p ) + O ( c r ) + O (cid:18) √ n (cid:19)(cid:27) (43)Substituting (42) and (43) into (41), we have (cid:15) ≤ cr p (1 − p ) + O (cid:18) r √ n (cid:19) + O ( c r ) . (44)6At last, the error term (cid:15) , namely the tail term, can be upper bounded as (cid:15) ≤ (cid:88) J (cid:54)∈ S (cid:88) K ∈ L i ( J ) q K (cid:98) r √ n (cid:99) + (cid:88) J (cid:54)∈ S q J  ≤ − J +( c − r √ n (cid:88) J = J − ( c − r √ n q J ≤ (cid:20) − c − r p (cid:21) . (45)Finally, substituting Eqs. (40), (44) and (45) into (36), we have (cid:15) ≤ (cid:18) r √ n (cid:19) − s + cr p (1 − p ) + 2 exp (cid:20) − c − r p (cid:21) + O (cid:18) r √ n (cid:19) + O ( c r ) ∀ s > . (46)To ensure that the error can be bounded arbitrarily from above for small enough r and big enough n , we can choose c = r − − δ for a small constant δ >

0. In this case the error bound reduces to (cid:15) ≤ (cid:18) r √ n (cid:19) − s + r − δ p (1 − p ) + 2 exp (cid:20) − r δ p (cid:21) + O (cid:18) r √ n (cid:19) + O ( r − δ ) ∀ s > . Recall that we are dealing with the mixed state case where 1 / < p <

1. We can choose, for instance, r = 1 / (log n )to make the above error bound to be vanishing with n . Conclusively, we have shown that for any state ρ g and anyerror threshold (cid:15) > r and n so that the error of the compression is smaller than (cid:15) for n > n . Optimality for the full-model compression.

In this section, we prove that the full-model protocol is optimal when no prior information on the qubit stateis available. A protocol for full-model should have vanishing error fon any possible input ensemble of n identicallyprepared qubit states. In particular, it should have vanishing error on the ensemble [34] U = { ρ ⊗ n , g. f ( p )p. } , where f ( p ) is the probability distribution given by f ( p ) := e c ( p ) / (cid:90) p. (cid:48) e c ( p (cid:48) ) , c ( p ) := 2 log(2 p − − [(4 p −

1) log p + (4 p −

3) log(1 − p )] / (4 p − . Explicitly, we show that every protocol that compresses U with vanishing error requires a total memory size of atleast (3 /

2) log n qubits.As in the known-spectrum case discussed in the main text, we use the boundlog d enc ( U ) ≥ χ ( U ) − (cid:15) log d U min + µ ( (cid:15) )] (47)where d min U is the minimum of the eﬀective dimension d U (cid:48) over all ensembles U (cid:48) that are suﬃcient statistics for U . Wepick the suﬃcient statistics U (cid:48) deﬁned by U (cid:48) = (cid:40)(cid:77) J q J ρ g,J , g. p. (cid:41) . The eﬀective dimension of the ensemble U (cid:48) is d U (cid:48) = ( n/ . Now, Theorem 1 of [34] states that χ ( U ) = 32 log n + O (1) . (48)7Combining Eq. (47) with Eq. (48), we achieve the following lower bound on the memory size:log d enc ( U ) ≥

32 log n − (cid:15) log n + 4 (cid:15) − µ ( (cid:15) ) + O (1) . (49)For large n and vanishing (cid:15) , the leading order of the above bound is (3 /

2) log n , as stated in the main text.Eq. (49) states that, if a protocol uses a fully quantum memory, the minimum amount of qubits needed to compressa completely unknown state is 3 / n . Since the quantum memory is a stronger resource than the classical memory,this result implies that the every protocol using q qubits and c classical bits to compress n copies with vanishing errormust satisfy the bound q + c ≥ / n . Our protocol saturates the bound, as it uses log n qubits and 1 / n bits.A natural question is whether the number of qubits in our protocol can be further reduced. The answer is negative,due to the following argument: A compression protocol for the full model should also compress with vanishing errorthe ensemble P = { φ ⊗ ng , g. } , where φ g is the generic pure state φ g = g | (cid:105)(cid:104) | g † . In order to compress the ensemble P ,one needs a memory of log n qubits [9]. Hence, our compression protocol uses i) the minimum amount of qubits, and ii)ii)