One-Shot Classical-Quantum Capacity and Hypothesis Testing
aa r X i v : . [ qu a n t - ph ] J a n One-Shot Classical-Quantum Capacity and Hypothesis Testing
Ligong Wang ∗ Research Laboratory of Electronics, MIT, Cambridge, MA, USA
Renato Renner † Institute for Theoretical Physics, ETH Zurich, Switzerland
The one-shot classical capacity of a quantum channel quantifies the amount of classical informationthat can be transmitted through a single use of the channel such that the error probability isbelow a certain threshold. In this work, we show that this capacity is well approximated by arelative-entropy-type measure defined via hypothesis testing. Combined with a quantum version ofStein’s Lemma, our results give a conceptually simple proof of the well-known Holevo-Schumacher-Westmoreland Theorem for the capacity of memoryless channels. More generally, we obtain tightcapacity formulas for arbitrary (not necessarily memoryless) channels.
PACS numbers: 89.70.-a,89.70.Kn,89.70.Cf
In Information Theory, a channel models a physicaldevice that takes an input and generates an output. Onemay, for instance, think of a communication channel(such as an optical fiber) that connects a sender (whoprovides the input) with a receiver (who obtains an out-put, which may deviate from the input). Another ex-ample is a memory device, such as a hard drive, wherethe input consists of the data written into the device,and where the output is the (generally noisy) data thatis retrieved from the device at a later point in time.A central question studied in Information Theory iswhether, and how, a channel can be used to transmitdata reliably in spite of the channel noise. This is usu-ally achieved by coding , where an encoder prepares thechannel input by adding redundancy to the data to betransmitted, and where a decoder reconstructs the datafrom the noisy channel output.Here we focus on the case of classical-quantum chan-nel coding , where the data to be transmitted reliably areclassical. No assumptions are made about the channelthat is used to achieve this task, i.e., the inputs and out-puts may be arbitrary quantum states. However, sincethe quantum-mechanical structure of the input space isirrelevant for the encoding of classical data, it can berepresented by a (classical) set X . For any input x ∈ X ,the channel produces an output, specified by a densityoperator ρ x on a Hilbert space B . For our purposes, it istherefore sufficient to characterize a channel by a map-ping x ρ x from a set X to a set of density operators.Classical-quantum channel coding has been studied ex-tensively in a scenario where the channel can be used ar-bitrarily many times. The channel coding theorem forstationary memoryless classical-quantum channels , es-tablished by Holevo [1] and Schumacher and Westmore-land [2], provides an explicit formula (see (11)) for therate at which data can be transmitted under the assump-tion that each use of the channel is independent of theprevious uses. More general channel coding theoremsthat do not rely on this independence assumption have been developed in later work by Hayashi and Nagaoka [3]and by Kretschmann and Werner [4]. These results areasymptotic, i.e., they refer to a limit where the numberof channel uses tends to infinity while the probability oferror is required to approach zero.Here we consider a scenario where a given quantumchannel is used only once and derive tight bounds onthe number of classical bits that can be transmitted witha given average error probability ǫ , in the following re-ferred to as the ǫ -one-shot classical-quantum capacity .This one-shot approach provides a high level of gener-ality, as nothing needs to be assumed about the struc-ture of the channel [20]. (Note that any situation inwhich a channel is used repeatedly can be equivalentlydescribed as one single use of a larger channel.) In par-ticular, our bounds on the channel capacities imply theaforementioned Holevo-Schumacher-Westmoreland The-orem for the capacity of memoryless channels, as wellas the generalizations by Hayashi and Nagaoka. On theother hand, our work generalizes similar one-shot resultsfor classical channels [5–7]. Despite their generality, thebounds as well as their proofs are remarkably simple. Wehope that our approach may therefore also be of peda-gogical value.Our derivation is based on the idea, already exploitedin previous works (see, e.g., [3, 8–10]), of relating theproblem of channel coding to hypothesis testing. Here,we use hypothesis testing directly to define a relative-entropy-type quantity, denoted D ǫ H ( ·k· ) (see (1)). Ourmain result asserts that the one-shot channel capacity iswell approximated by D ǫ H ( ·k· ) (Theorem 1).We note that one-shot capacity bounds that arevery similar to ours have been implicitly used inthe information-spectrum approach to classical-quantumchannel coding by Hayashi and Nagaoka [3, 11].The remainder of this Letter is structured as follows.We briefly describe hypothesis testing and state a fewproperties of the quantity D ǫ H ( ·k· ). We then state andprove our main result which provides upper and lowerbounds on the ǫ -one-shot classical-quantum capacity interms of D ǫ H ( ·k· ). Finally, we show how the knownasymptotic bounds (for arbitrarily many channel uses)can be obtained from Theorem 1. Hypothesis Testing and D ǫ H ( ·k· ) .— Hypothesis testingis the task of distinguishing two possible states of a sys-tem, ρ and σ . A strategy for this task is specified bya Positive Operator Valued Measure (POVM) with twoelements, Q and I − Q , corresponding to the two possiblevalues for the guess. The probability that the strategyproduces a correct guess on input ρ is given by tr[ Qρ ],and the probability that it produces a wrong guess on in-put σ is tr[ Qσ ]. We define the hypothesis testing relativeentropy D ǫ H ( ρ k σ ) as D ǫ H ( ρ k σ ) , − log inf Q :0 ≤ Q ≤ I, tr[ Qρ ] ≥ − ǫ tr[ Qσ ] . (1)Note that D ǫ H ( ρ k σ ) is a semidefinite program and cantherefore be evaluated efficiently.As its name suggests, D ǫ H ( ρ k σ ) can be understoodas a relative entropy. In particular, for ǫ = 0, it isequal to R´enyi’s relative entropy of order D ( ρ k σ ) = − log tr[ ρ σ ], where ρ denotes the projector onto thesupport of ρ . For ǫ >
0, it corresponds to a “smoothed”variant of the relative R´enyi entropy of order 0 used byBuscemi and Datta [12] for characterizing the quantumcapacity of channels [21]. D ǫ H ( ρ k σ ) has the followingproperties, all of which hold for all ρ , σ and ǫ ∈ [0 , Positivity: D ǫ H ( ρ k σ ) ≥ , with equality if ρ = σ and ǫ = 0.2. Data Processing Inequality (DPI): for any Com-pletely Positive Map (CPM) E , D ǫ H ( ρ k σ ) ≥ D ǫ H ( E ( ρ ) kE ( σ )) .
3. Let D ( ·k· ) denote the usual quantum relative en-tropy, then D ǫ H ( ρ k σ ) ≤ (cid:0) D ( ρ k σ ) + H b ( ǫ ) (cid:1) / (1 − ǫ ) , (2)where H b ( · ) is the binary entropy function.Positivity follows immediately from the definition.To prove the DPI, consider any POVM to distinguish E ( ρ ) from E ( σ ). We can then construct a new POVMto distinguish ρ from σ by preceding the given POVMwith the CPM E . This new POVM clearly gives thesame error probabilities (in distinguishing ρ and σ ) asthe original POVM (in distinguishing E ( ρ ) and E ( σ )).The DPI then follows because an optimization over allpossible strategies for distinguishing ρ and σ can onlydecrease the failure probability. To prove (2), first see that it holds when D ( ρ k σ ) isreplaced by D ( P ρ k P σ ), where P ρ is the distribution of theoutcomes of the optimal POVM performed on ρ , namely,it is (1 − ǫ, ǫ ), and similarly for P σ which is (2 − D ǫ H ( ρ k σ ) , − − D ǫ H ( ρ k σ ) ). This can be shown by directly computing D ( P ρ k P σ ). Then (2) follows because D ( ·k· ) satisfies theDPI so D ( ρ k σ ) ≥ D ( P ρ k P σ ).A further connection between D ǫ H ( ·k· ) and D ( ·k· ) isthe Quantum Stein’s Lemma [8, 13], which we restate asfollows. Lemma 1 (Quantum Stein’s Lemma) . For any twostates ρ and σ on a Hilbert space and for any ǫ ∈ (0 , , lim n →∞ n D ǫ H ( ρ ⊗ n k σ ⊗ n ) = D ( ρ k σ ) . Statement and Proof of the Main Result.—
Before stat-ing our main result, we introduce some general terminol-ogy. The encoder is specified by a list of inputs, { x i } , i ∈ { , . . . , m } , called a codebook of size m . The decoderapplies a corresponding decoding POVM , which acts on B and has m elements. A decoding error occurs if theoutput of the decoding POVM is not equal to the index i of the input x i fed into the channel. An ( m, ǫ ) -code consists of a codebook of size m and a correspondingdecoding POVM such that, when the message is chosenuniformly, the average probability of a decoding error isat most ǫ [22].The main result of this Letter is the following theorem. Theorem 1.
The ǫ -one-shot classical-quantum capacityof a channel x ρ x , i.e., the largest number R for whicha (2 R , ǫ ) -code exists, satisfies sup P X D ǫ H ( π AB k π A ⊗ π B ) ≥ R ≥ sup P X D ǫ ′ H ( π AB k π A ⊗ π B ) − log ǫ ( ǫ − ǫ ′ ) (3) for any ǫ ′ ∈ (0 , ǫ ) , where π AB is the joint state of theinput and output for an input chosen according to thedistribution P X , i.e., π AB , X x ∈X P X ( x ) | x ih x | A ⊗ ρ B x , for any representation of the inputs x in terms of or-thonormal vectors | x i A on a Hilbert space A , and where π A and π B are the corresponding marginals. Remark 1.
In an earlier version of this Letter (whichappeared in
Physical Review Letters ), the right-hand side(RHS) of (3) is given by its special case where ǫ ′ = ǫ/ : sup P X D ǫ/ ( π AB k π A ⊗ π B ) − log ǫ − . (4) For practical scenarios where ǫ is close to zero, the dif-ference between (4) and the RHS of (3) is usually small. However, allowing an arbitrary ǫ ′ we can use (3) to de-rive the ǫ -capacity of a channel, while fixing ǫ ′ = ǫ/ wecannot. The proof of Theorem 1 is divided into two parts, onefor the first inequality (referred to as the converse ) andthe other for the second inequality (the achievability ).We start with the converse which asserts that, if a (2 R , ǫ )-code exists, then R ≤ sup P X D ǫ H ( π AB k π A ⊗ π B ) . (5) Proof of Theorem 1—Converse Part.
By definition, it issufficient to prove (5) for a uniform distribution on the x ’s used in the codebook, so we can focus on states π AB of the form π AB = 2 − R R X i =1 | x i ih x i | ⊗ ρ x i . Note that the decoding POVM combined with the in-verse of the encoding map (which is classical) can beviewed as a CPM. This CPM maps π AB to the (classical)state P MM ′ denoting the joint distribution of the trans-mitted message M and the decoder’s guess M ′ . Similarly,it maps π A ⊗ π B to P M ⊗ P M ′ . Hence, it follows from theDPI for D ǫ H ( ρ k σ ) that D ǫ H ( P MM ′ k P M ⊗ P M ′ ) ≤ D ǫ H ( π AB k π A ⊗ π B ) . It thus remains to prove R ≤ D ǫ H ( P MM ′ k P M ⊗ P M ′ ) . (6)For this, we consider a (possibly suboptimal) strategy todistinguish between P MM ′ and P M ⊗ P M ′ . The strategyguesses P MM ′ if M = M ′ , and guesses P M ⊗ P M ′ other-wise. Using this distinguishing strategy, the probabilityof guessing P M ⊗ P M ′ on state P MM ′ is exactly the prob-ability that M = M ′ computed from P MM ′ , namely, theaverage probability of a decoding error, and is thus notlarger than ǫ by assumption. Furthermore, the probabil-ity of guessing P MM ′ on state P M ⊗ P M ′ is given by R X i =1 P M ( i ) · P M ′ ( i ) = 2 − R R X i =1 P M ′ ( i ) = 2 − R . This implies (6).We proceed with the achievability part of Theorem 1.We show that, for any ǫ > ǫ ′ > c >
0, there existsa (2 R , ǫ )-code with R ≥ sup P X D ǫ ′ H ( π AB k π A ⊗ π B ) − log c + c − ǫ − (1 + c ) ǫ ′ . (7)Optimized over c , this bound implies the second inequal-ity of (3).The main technique we need for proving (7) is the fol-lowing lemma by Hayashi and Nagaoka [3, Lemma 2]: Lemma 2.
For any positive real c and any operators ≤ S ≤ I and T ≥ , we have I − ( S + T ) − / S ( S + T ) − / ≤ (1 + c )( I − S ) + (2 + c + c − ) T. Proof of Theorem 1—Achievability Part.
Fix ǫ ′ > c >
0, and P X . We are going to show that there exists a(2 R , ǫ )-code such that ǫ ≤ (1 + c ) ǫ ′ + (2 + c + c − )2 R − D ǫ ′ H ( π AB k π A ⊗ π B ) , which immediately implies (7).Let Q be an operator acting on AB such that 0 ≤ Q ≤ I and tr (cid:2) Qπ AB (cid:3) ≥ − ǫ ′ . By definition, it suffices to provethat there exists a codebook and a decoding POVM witherror probability ǫ ≤ (1 + c ) ǫ ′ + (2 + c + c − )2 R tr (cid:2) Q ( π A ⊗ π B ) (cid:3) . (8)We generate a codebook by choosing its codewords x j at random, each independently according to the distri-bution P X . Furthermore, we define the correspondingdecoding POVM by its elements, E i = R X j =1 A x j − A x i R X j =1 A x j − , where A x , tr A (cid:2)(cid:0) | x ih x | A ⊗ I B (cid:1) Q (cid:3) .For a specific codebook { x j } and the transmitted code-word x i , the probability of error is given byPr(error | x i , { x j } ) = tr[( I − E i ) ρ x i ] . We now use Lemma 2 with S = A x i and T = P j = i A x j to bound this byPr(error | x i , { x j } ) ≤ (1 + c ) (cid:0) − tr[ A x i ρ x i ] (cid:1) + (2 + c + c − ) X j = i tr[ A x j ρ x i ] . Averaging over all codebooks, but keeping the transmit-ted codeword x i fixed, we findPr(error | x i ) ≤ (1 + c ) (cid:0) − tr[ A x i ρ x i ] (cid:1) + (2 + c + c − )(2 R − h X x ′ ∈X P X ( x ′ ) A x ′ ρ x i i . Averaging now in addition over the transmitted code-word x i , we obtain the upper boundPr(error) ≤ (1 + c ) (cid:0) − X x P X ( x )tr[ A x ρ x ] (cid:1) + (2 + c + c − )2 R tr h X x ′ P X ( x ′ ) A x ′ X x P X ( x ) ρ x i . (9)Note that X x P X ( x )tr[ A x ρ x ] = X x P X ( x )tr h Q | x ih x | A ⊗ ρ B x i = tr (cid:2) Qπ AB (cid:3) ≥ − ǫ ′ andtr h X x P X ( x ′ ) A x ′ X x P X ( x ) ρ x i = X x ′ P X ( x ′ )tr h Q | x ′ ih x ′ | ⊗ X x P X ( x ) ρ x i = tr (cid:2) Q ( π A ⊗ π B ) (cid:3) . Inserting these expressions into (9) we find that the up-per bound (8) holds for the probability of error averagedover the class of codebooks we generated. Thus theremust exist at least one codebook whose error probability ǫ satisfies (8). Asymptotic Analysis.—
Theorem 1 applies to the trans-mission of a message in a single use of the channel. Ob-viously, a channel that can be used n times can alwaysbe modeled as one big single-use channel. We can thusretrieve the known expressions for the (usual) capacityof channels, i.e., the average number of bits that can betransmitted per channel use in the limit where the chan-nel is used arbitrarily often and the error ǫ approaches 0.Most generally, a channel that can be used an arbitrarynumber of times is characterized by a sequence of map-pings x n ρ n , n ∈ { , , . . . } , where x n ∈ X n representsan input state over n channel-uses [23], and where ρ n isa density operator on B ⊗ n . Note that such a channelneed not have any structure such as “causality” as de-fined in [4]. From Theorem 1 it immediately follows thatthe capacity of any channel is given by C = lim ǫ ↓ lim n →∞ n sup P Xn D ǫ H ( π A n ⊗ B ⊗ n k π A n ⊗ π B ⊗ n ) , (10)where A n denotes the Hilbert space spanned by orthonor-mal states | x n i for all x n ∈ X n . This expression is equiv-alent to [3, Theorem 1] [24]. We can also derive simi-lar results for the optimistic capacity and the ǫ -capacity ,see [14].Now consider a memoryless channel whose behaviorin each use is independent of the previous uses. Thecapacity C of such a channel is given by the well-knownHolevo-Schumacher-Westmoreland Theorem [1, 2]: C = lim k →∞ k sup P Xk D ( π A k ⊗ B ⊗ k k π A k ⊗ π B ⊗ k ) . (11)Note that D ( π A k ⊗ B ⊗ k k π A k ⊗ π B ⊗ k ) may equivalently bewritten as a mutual information I ( A k ; B ⊗ k ). This theo-rem can be proved easily using (10). Proof of (11) . To show achievability, i.e., that C is lower-bounded by the RHS of (11), we restrict the supremumin (10) to product distributions on k -use states, so thejoint state π A n ⊗ B ⊗ n looks like ( π A k ⊗ B ⊗ k ) ⊗ ( n/k ) [25]. Wethen let n tend to infinity and apply Lemma 1 to obtainthat, for any k , C ≥ k sup P Xk D ( π A k ⊗ B ⊗ k k π A k ⊗ π B ⊗ k ) . (12)This concludes the proof of the achievability part.The converse, i.e., that C is upper-bounded by theRHS of (11), follows immediately from (10) and (2).To conclude, it may be interesting to compare The-orem 1 to other recently derived bounds on the one-shot capacity of classical-quantum channels [15–17]. Thebounds of [15, 16] are different from ours in that theyare not known to coincide asymptotically for arbitrarychannels. In [17], it has been shown that the one-shotclassical-quantum capacity R of a channel can be ap-proximated (up to additive terms of the order log /ǫ )by R ≈ max P X H ǫ min ( A ) π A − H ǫ max ( A | B ) π AB , where H ǫ min and H ǫ max denote the smooth min- andmax-entropies, which have recently been shown to bethe relevant quantities for characterizing a number ofinformation-theoretic tasks (see, e.g., [18] for definitionsand properties). Combined with our result, this suggeststhat there is a deeper and more general relation betweenhypothesis testing and smooth entropies (and, therefore,the associated operational quantities). Exploring thislink is left as an open question for future work. Acknowledgments.—
The authors thank MasahitoHayashi and Marco Tomamichel for their comments onan earlier version of this Letter. L.W. acknowledges sup-port from the US Air Force Office of Scientific Research(grant No. FA9550-11-1-0183) and the National ScienceFoundation (grant No. CCF-1017772). R.R. acknowl-edges support from the Swiss National Science Foun-dation (grant Nos. 200021-119868, 200020-135048, andthe NCCR “QSIT”) and the European Research Council(ERC) (grant No. 258932). ∗ Electronic address: [email protected] † Electronic address: [email protected][1] A. S. Holevo, IEEE Trans. Inform. Theory, , 269(1998).[2] B. Schumacher and M. D. Westmoreland, Phys. Rev. A, , 131 (1997).[3] M. Hayashi and H. Nagaoka, IEEE Trans. Inform. The-ory, , 1753 (2003).[4] D. Kretschmann and R. F. Werner, Phys. Rev. A, ,062323 (2005). [5] R. Renner, S. Wolf, and J. Wullschleger, in Proc. IEEEInt. Symp. Inform. Theory (Seattle, Washington, USA,2006).[6] L. Wang, R. Colbeck, and R. Renner, in
Proc. IEEE Int.Symp. Inform. Theory (Seoul, Korea, 2009) pp. 1804–1808.[7] Y. Polyanskiy, H. V. Poor, and S. Verd´u, in
Proc. IEEEInt. Symp. Inform. Theory (Toronto, Canada, 2008).[8] T. Ogawa and H. Nagaoka, IEEE Trans. Inform. Theory, , 2428 (2000).[9] T. Ogawa and H. Nagaoka, in Proc. IEEE Int. Symp.Inform. Theory (Lausanne, Switzerland, 2002).[10] M. Hayashi, Phys. Rev. A, , 062301 (2007).[11] M. Hayashi, Quantum Information: An Introduction ,Springer (2006) .[12] F. Buscemi and N. Datta, IEEE Trans. Inform. Theory, , 1447 (2010).[13] F. Hiai and D. Petz, Comm. Math. Phys., , 99 (1991).[14] L. Wang, Information-Theoretic Aspects of Optical Com-munications , Ph.D. dissertation, volume 6 of the ETHSeries in Information Theory and its Applications, ETHZurich (2011), editor: Amos Lapidoth.[15] M. Mosonyi and N. Datta, J. Math. Phys., , 072104(2009).[16] M. Mosonyi and F. Hiai, IEEE Trans. Inform. Theory, , 2474 (2011). [17] J. Renes and R. Renner, IEEE Trans. Inform. Theory, , 7377 (2011).[18] M. Tomamichel, R. Colbeck, and R. Renner, IEEETrans. Inform. Theory, , 5840 (2009).[19] N. Datta, IEEE Trans. Inform. Theory, , 2816 (2009).[20] In particular, in contrast to previous work, there is noneed no define channels as sequences of mappings.[21] It is also similar to a relative-entropy-type quantity usedin [19], although the precise relation to this quantity isnot known.[22] It is well known that, in single-user scenarios, the(asymptotic) capacity does not depend on whether theaverage (over uniformly chosen messages) or the maxi-mum probability of error is considered. In the one-shotcase, one can construct a code that has maximum proba-bility of error not larger than 2 ǫ from a code with averageprobability of error ǫ , thereby sacrificing one bit.[23] Here X n cannot be replaced by X × n , which only repre-sents product states.[24] While the expressions used in [3] look completely differ-ent, one can (non-operationally) prove their equivalence.See [14].[25] As n tends to infinity, the problem that n might not bedivisible by kk