[PDF] Polar codes for classical-quantum channels

Abstract

Holevo, Schumacher, and Westmoreland's coding theorem guarantees the existence of codes that are capacity-achieving for the task of sending classical data over a channel with classical inputs and quantum outputs. Although they demonstrated the existence of such codes, their proof does not provide an explicit construction of codes for this task. The aim of the present paper is to fill this gap by constructing near-explicit "polar" codes that are capacity-achieving. The codes exploit the channel polarization phenomenon observed by Arikan for the case of classical channels. Channel polarization is an effect in which one can synthesize a set of channels, by "channel combining" and "channel splitting," in which a fraction of the synthesized channels are perfect for data transmission while the other fraction are completely useless for data transmission, with the good fraction equal to the capacity of the channel. The channel polarization effect then leads to a simple scheme for data transmission: send the information bits through the perfect channels and "frozen" bits through the useless ones. The main technical contributions of the present paper are threefold. First, we leverage several known results from the quantum information literature to demonstrate that the channel polarization effect occurs for channels with classical inputs and quantum outputs. We then construct linear polar codes based on this effect, and the encoding complexity is O(N log N), where N is the blocklength of the code. We also demonstrate that a quantum successive cancellation decoder works well, in the sense that the word error rate decays exponentially with the blocklength of the code. For this last result, we exploit Sen's recent "non-commutative union bound" that holds for a sequence of projectors applied to a quantum state.

Full PDF

11 Polar codes for classical-quantum channels

Mark M. Wilde and Saikat Guha

Abstract —Holevo, Schumacher, and Westmoreland’s codingtheorem guarantees the existence of codes that are capacity-achieving for the task of sending classical data over a chan-nel with classical inputs and quantum outputs. Although theydemonstrated the existence of such codes, their proof does notprovide an explicit construction of codes for this task. The aimof the present paper is to ﬁll this gap by constructing near-explicit “polar” codes that are capacity-achieving. The codesexploit the channel polarization phenomenon observed by Arikanfor the case of classical channels. Channel polarization is aneffect in which one can synthesize a set of channels, by “channelcombining” and “channel splitting,” in which a fraction of thesynthesized channels are perfect for data transmission while theother fraction are completely useless for data transmission, withthe good fraction equal to the capacity of the channel. Thechannel polarization effect then leads to a simple scheme fordata transmission: send the information bits through the perfectchannels and “frozen” bits through the useless ones. The maintechnical contributions of the present paper are threefold. First,we leverage several known results from the quantum informationliterature to demonstrate that the channel polarization effectoccurs for channels with classical inputs and quantum outputs.We then construct linear polar codes based on this effect, and theencoding complexity is O ( N log N ) , where N is the blocklengthof the code. We also demonstrate that a quantum successivecancellation decoder works well, in the sense that the word errorrate decays exponentially with the blocklength of the code. Forthis last result, we exploit Sen’s recent “non-commutative unionbound” that holds for a sequence of projectors applied to aquantum state. I. I

NTRODUCTION

Shannon’s fundamental contribution was to establish thecapacity of a noisy channel as the highest rate at which asender can reliably transmit data to a receiver [1]. His methodof proof exploited the probabilistic method and was thus non-constructive. Ever since Shannon’s contribution, researchershave attempted to construct error-correcting codes that canreach the capacity of a given channel. Some of the mostsuccessful schemes for error correction are turbo codes andlow-density parity-check codes [2], with numerical resultsdemonstrating that these codes perform well for a variety ofchannels. In spite of the success of these codes, there is noproof that they are capacity achieving for channels other thanthe erasure channel [3].Recently, Arikan constructed polar codes and proved thatthey are capacity achieving for a wide variety of channels [4].Polar codes exploit the phenomenon of channel polarization ,in which a simple, recursive encoding synthesizes a set ofchannels that polarize, in the sense that a fraction of them be-come perfect for transmission while the other fraction are com-

Mark M. Wilde is with the School of Computer Science, McGill University,Montreal, Quebec H3A 2A7, Canada. Saikat Guha is with the Quantum In-formation Processing Group, Raytheon BBN Technologies, Cambridge, Mas-sachusetts, USA 02138. (E-mail: [email protected]; [email protected]) pletely noisy and thus useless for transmission. The fraction ofthe channels that become perfect for transmission is equal tothe capacity of the channel. In addition, the complexity of boththe encoding and decoding scales as O ( N log N ) , where N is the blocklength of the code. Arikan developed polar codesafter studying how the techniques of channel combining andchannel splitting affect the rate and reliability of a channel[5]. Arikan and others have now extended the methods ofpolar coding to many different settings, including arbitrarydiscrete memoryless channels [6], source coding [7], lossysource coding [3], [8], and the multiple access channel withtwo senders and one receiver [9].All of the above results are important for determining boththe limits on data transmission and methods for achievingthese limits on classical channels. The description of a classi-cal channel p Y | X arises from modeling the signaling alphabet,the physical transmission medium, and the receiver measure-ment. If we are interested in accurately evaluating and reachingthe true data-transmission limits of the physical channels, withan unspeciﬁed receiver measurement, and whose informationcarriers require a quantum-mechanical description, then itbecomes necessary to invoke the laws of quantum mechan-ics. Examples of such channels include deep-space opticalchannels and ultra-low-temperature quantum-noise-limited RFchannels. Achieving the classical communication capacity forsuch (quantum) channels often requires making collectivemeasurements at the receiver, an action for which no classicaldescription or implementation exists. The quantum-mechanicalapproach to information theory [10], [11] is not merely aformality or technicality—encoding classical information withquantum states and decoding with collective measurements onthe channel outputs [12], [13] can dramatically improve datatransmission rates, for example if the sender and receiver areoperating in a low-power regime for a pure-loss optical chan-nel (which is a practically relevant regime for long haul free-space terrestrial and deep-space optical communication) [14],[15]. Also, encoding with entangled inputs to the channelscan increase capacity for certain channels [16], a superadditiveeffect which simply does not occur for classical channels.The proof of one of the most important theorems of quantuminformation theory is due to Holevo [12], Schumacher, andWestmoreland [13] (HSW). They showed that the Holevoinformation of a quantum channel is an achievable rate forclassical communication over it. Their proof of the HSWtheorem bears some similarities with Shannon’s technique(including the use of random coding), but their main con-tribution was the construction of a quantum measurement atthe receiving end that allows for reliable decoding at theHolevo information rate. Since the proof of the HSW theorem,several researchers have improved the proof’s error analysis[17], and others have demonstrated different techniques for a r X i v : . [ qu a n t - ph ] J u l achieving the Holevo information [18], [19], [20], [21], [22].Very recently, Giovannetti et al . proved that a sequential de-coding approach can achieve the Holevo information [23]. Thesequential decoding approach has the receiver ask, througha series of dichotomic quantum measurements, whether theoutput of the channel was the ﬁrst codeword, the secondcodeword, etc. (this approach is similar in spirit to a classical“jointly-typical” decoder [24]). As long as the rate of thecode is less than the Holevo information, then this sequentialdecoder will correctly identify the transmitted codeword withasymptotically negligible error probability. Sen recently sim-pliﬁed the error analysis of this sequential decoding approach(rather signiﬁcantly) by introducing a “non-commutative unionbound” in order to bound the error probability of quantumsequential decoding [25].In spite of the large amount of effort placed on proving thatthe Holevo information is achievable, there has been relativelylittle work on devising explicit codes that approach the Holevoinformation rate. The aim of the present paper is to ﬁll thisgap by generalizing the polar coding approach to quantumchannels. In doing so, we construct the ﬁrst explicit class oflinear codes that approach the Holevo information rate withasymptotically small error probability.The main technical contributions of the present paper areas follows:1) We characterize rate with the symmetric Holevo infor-mation [27], [12], [10], [11] and reliability with theﬁdelity [28], [29], [10], [11] between channel outputscorresponding to different classical inputs. These pa-rameters generalize the symmetric Shannon capacityand the Bhattacharya parameter [4], respectively, tothe quantum case. We demonstrate that the symmetricHolevo information and the ﬁdelity polarize under arecursive channel transformation similar to Arikan’s [4],by exploiting Arikan’s proof ideas [4] and several toolsfrom the quantum information literature [30], [31], [32],[10], [33], [11].2) The second contribution of ours is the generalizationof Arikan’s successive cancellation decoder [4] to thequantum case. We exploit ideas from quantum hypothe-sis testing [34], [35], [36], [37], [38] in order to constructthe quantum successive cancellation decoder, and we useSen’s recent “non-commutative union bound” [25] inorder to demonstrate that the decoder performs reliablyin the limit of many channel uses, while achieving thesymmetric Holevo information rate.The complexity of the encoding part of our polar codingscheme is O ( N log N ) where N is the blocklength of thecode (the argument for this follows directly from Arikan’s[4]). However, we have not yet been able to show that the This is likely due to the large amount of effort that the quantuminformation community has put towards quantum error correction [26], whichis important for the task of transmitting quantum bits over a noisy quantumchannel or for building a fault-tolerant quantum computer. Also, there mightbe a general belief that classical coding strategies would extend easily forsending classical information over quantum channels, but this is not the casegiven that collective measurements on channel outputs are required to achievethe Holevo information rate and the classical strategies do not incorporatethese collective measurements. complexity of the decoding part is O ( N log N ) (as is thecase with Arikan’s decoder [4]). Determining how to simplifythe complexity of the decoding part is the subject of ongoingresearch. For now, we should regard our contribution in thispaper as a more explicit method for achieving the Holevoinformation rate (as compared to those from prior work [12],[13], [18], [19], [20], [21]).One might naively think from a casual glance at our paperthat Arikan’s results [4] directly apply to our quantum scenariohere, but this is not the case. If one were to impose single-symbol detection on the outputs of the quantum channels, such a procedure would induce a classical channel from inputto output. In this case, Arikan’s results do apply in thatthey can attain the Shannon capacity of this induced classicalchannel.However, the Shannon capacity of the best single-symboldetection strategy may be far below the Holevo limit [14],[15]. Attaining the Holevo information rate generally requiresthe receiver to perform collective measurements (physicaldetection of the quantum state of the entire codeword thatmay not be realizable by detecting single symbols one at atime). We should stress that what we are doing in this paper isdifferent from a naive application of Arikan’s results. First, ourpolar coding rule depends on a quantum parameter, the ﬁdelity,rather than the Bhattacharya distance (a classical parameter).The polar coding rule is then different from Arikan’s, andwe would thus expect a larger fraction of the channels to be“good” channels than if one were to impose a single-symbolmeasurement and exploit Arikan’s polar coding rule with theBhattacharya distance. Second, the quantum measurements inour quantum successive cancellation decoder are collectivemeasurements performed on all of the channel outputs. Wereit not so, then our polar coding scheme would not achieve theHolevo information rate in general.We organize the rest of the paper as follows. The nextsection provides an overview of polar coding for classical-quantum channels (channels with classical inputs and quantumoutputs). This overview states the main concepts and theimportant theorems, while saving their proofs for later inthe paper. The main concepts include channel combining,channel splitting, channel polarization, rate of polarization,quantum successive cancellation decoding, and polar codeperformance. Section III gives more detail on how recursivechannel combining and splitting lead to transformation of rateand reliability in the direction of polarization. Section IVproves that channel polarization occurs under the transforma-tions given in Section III (the proofs in Section IV are identicalto Arikan’s [4] because they merely exploit his martingaleapproach). We prove in Section V that the performance ofthe polar coding scheme is good, by analyzing the errorprobability under quantum successive cancellation decoding.We ﬁnally conclude in Section VI with a summary and someopen questions. For instance, all known conventional optical receivers are single-symboldetectors. They detect each modulated pulse individually, followed by classicalpostprocessing.

II. O

VERVIEW OF R ESULTS

Our setting involves a classical-quantum channel W with aclassical input x and a quantum output ρ x : W : x → ρ x , where x ∈ { , } and ρ x is a unit trace, positive operator calleda density operator . We can associate a probability distributionand a classical label with the states ρ and ρ by writing thefollowing classical-quantum state [11]: ρ XB ≡ | (cid:105)(cid:104) | X ⊗ ρ B + 12 | (cid:105)(cid:104) | X ⊗ ρ B . Two important parameters for characterizing any classical-quantum channel are its rate and reliability. We deﬁne therate in terms of the channel’s symmetric Holevo information I ( W ) where I ( W ) ≡ I ( X ; B ) ρ .I ( X ; B ) ρ is the quantum mutual information of the state ρ XB ,deﬁned as I ( X ; B ) ρ ≡ H ( X ) ρ + H ( B ) ρ − H ( XB ) ρ , and the von Neumann entropy H ( σ ) of any density operator σ is deﬁned as H ( σ ) ≡ − Tr { σ log σ } . (Observe that the von Neumann entropy of σ is equal to theShannon entropy of its eigenvalues.) It is also straightforwardto verify that I ( W ) = H (( ρ B + ρ B ) / − H ( ρ B ) / − H ( ρ B ) / . The symmetric Holevo information is non-negative by con-cavity of von Neumann entropy, and it can never exceedone if the system X is a classical binary system (as is thecase for the classical-quantum state ρ XB ). Additionally, thesymmetric Holevo information is equal to zero if there is nocorrelation between X and B . It is equal to the capacity of thechannel W for transmitting classical bits over it if the inputprior distribution is restricted to be uniform [12], [13]. It alsogeneralizes the symmetric capacity [4] to the quantum settinggiven above.We deﬁne the reliability of the channel W as the ﬁdelitybetween the states ρ and ρ [28], [29], [10], [11]: F ( ρ , ρ ) ≡ (cid:107)√ ρ √ ρ (cid:107) , where (cid:107) A (cid:107) is the nuclear norm of the operator A : (cid:107) A (cid:107) = Tr (cid:110) √ A † A (cid:111) . Let F ( W ) denote the reliability of the channel W : F ( W ) ≡ F ( ρ , ρ ) . The ﬁdelity is equal to a number between zero and one, andit characterizes how “close” two quantum states are to oneanother. It is equal to zero if and only if there exists a quantummeasurement that can perfectly distinguish the states, and it We are using the same terminology as Arikan [4]. Fig. 1. The channel W synthesized from the ﬁrst level of recursion. Thicklines denote classical systems while thin lines denote quantum systems (this isour convention for the other ﬁgures as well). The depicted gate acting on thechannel input is a classical controlled-NOT (CNOT) gate, where the ﬁlled-incircle acts on the source bit and the other circle acts on the target bit. Its truthtable is ( u , u ) → ( u ⊕ u , u ) . is equal to one if the states are indistinguishable by any mea-surement [10], [11]. The ﬁdelity generalizes the Bhattacharyaparameter used in the classical setting [4]. Naturally, we wouldexpect the channel W to be perfectly reliable if F ( W ) = 0 andcompletely unreliable if F ( W ) = 1 . The ﬁdelity also servesas a coarse bound on the probability of error in discriminatingthe states ρ and ρ [37], [39].We would expect the symmetric Holevo information I ( W ) ≈ if and only if the channel’s ﬁdelity F ( W ) ≈ and vice versa: I ( W ) ≈ ⇔ F ( W ) ≈ . The followingproposition makes this intuition rigorous, and it serves asa generalization of Arikan’s ﬁrst proposition regarding therelationship between rate and reliability. We provide its proofin the appendix. Proposition 1:

For any binary input classical-quantumchannel of the above form, the following bounds hold I ( W ) ≥ log (cid:32)

21 + (cid:112) F ( W ) (cid:33) , (1) I ( W ) ≤ (cid:112) − F ( W ) . (2) A. Channel Polarization

The channel polarization phenomenon occurs after synthe-sizing a set of N classical-quantum channels { W ( i ) N : 1 ≤ i ≤ N } from N independent copies of the classical-quantumchannel W . The effect is known as “polarization” because afraction of the channels W ( i ) N become perfect for data trans-mission, in the sense that I ( W ( i ) N ) ≈ for the channels inthis fraction, while the channels in the complementary fractionbecome completely useless in the sense that I ( W ( i ) N ) ≈ inthe limit as N becomes large. Also, the fraction of channelsthat do not exhibit polarization vanishes as N becomes large.One can induce the polarization effect by means of channelcombining and channel splitting.

1) Channel Combining:

The channel combining phasetakes copies of a classical-quantum channel W and builds fromthem an N -fold classical-quantum channel W N in a recursiveway, where N is any power of two: N = 2 n , n ≥ . Thezeroth level of recursion merely sets W ≡ W . The ﬁrst level One cannot expect to transmit more than one classical bit over a perfectqubit channel due to Holevo’s bound [27].

Fig. 2. The second level of recursion in the channel combining phase. of recursion combines two copies of W and produces thechannel W , deﬁned as W : u u → W B B ( u , u ) , (3)where W B B ( u , u ) ≡ ρ B u ⊕ u ⊗ ρ B u . Figure 1 depicts this ﬁrst level of recursion.The second level of recursion takes two copies of W andproduces the channel W : W : u u u u → W B B B B ( u , u , u , u ) , (4)where W B B B B ( u , u , u , u ) ≡ W B B ( u ⊕ u , u ⊕ u ) ⊗ W B B ( u , u ) , so that W B B B B ( u , u , u , u )= ρ B u ⊕ u ⊕ u ⊕ u ⊗ ρ B u ⊕ u ⊗ ρ B u ⊕ u ⊗ ρ B u . Figure 2 depicts the second level of recursion.The operation R in Figure 2 is a permutation that takes ( u , u , u , u ) → ( u , u , u , u ) . One can then readilycheck that the mapping from the row vector u to the channelinputs x is a linear map given by x = u G with G ≡   . The general recursion at the n th level is to take two copiesof W N/ and synthesize a channel W N from them. The ﬁrstpart is to transform the input sequence u N according to thefollowing rule for all i ∈ { , . . . , N/ } : s i − = u i − ⊕ u i ,s i = u i . The next part of the transformation is a “reverse shufﬂe” R N that performs the transformation: ( s , s , s , s , . . . , s N − , s N ) → ( s , s , . . . , s N − , s , s , . . . , s N ) . The resulting bit sequence is the input to the two copies of W N/ .The overall transformation on the input sequence u N is alinear transformation given by x N = u N G N where G N = B N F ⊗ n , (5)where F ≡ (cid:20) (cid:21) , and B N is a permutation matrix known as a “bit-reversal”operation [4].

2) Channel Splitting:

The channel splitting phase consistsof taking the channels W N induced by the transformation G N and deﬁning new channels W ( i ) N from them. Let ρ u N denotethe output of the channel W N when inputting the bit sequence u N . We deﬁne the i th split channel W ( i ) N as follows: W ( i ) N : u i → ρ U i − B N ( i ) ,u i , (6)where ρ U i − B N ( i ) ,u i ≡ (cid:88) u i − i − (cid:12)(cid:12) u i − (cid:11)(cid:10) u i − (cid:12)(cid:12) U i − ⊗ ρ B N u i , (7) ρ B N u i ≡ (cid:88) u Ni +1 N − i ρ B N u N . (8)We can also write as an alternate notation W ( i ) N ( u i ) = ρ U i − B N ( i ) ,u i . These channels have the same interpretation as Arikan’s splitchannels [4]—they are the channels induced by a “genie-aided” quantum successive cancellation decoder, in which the i th decision measurement estimates u i given that the channeloutput ρ B N u N is available, after observing the previous bits u i − correctly, and if the distribution over u Ni +1 is uniform. Thesesplit channels arise in our analysis of the error probability forquantum successive cancellation decoding.

3) Channel Polarization:

Our channel polarization theorembelow is similar to Arikan’s Theorem 1 [4], though oursapplies for classical-quantum channels with binary inputs andquantum outputs:

Theorem 2 (Channel Polarization):

The classical-quantumchannels W ( i ) N synthesized from the channel W ⊗ N polarize,in the sense that the fraction of indices i ∈ { , . . . , N } forwhich I ( W ( i ) N ) ∈ (1 − δ, goes to the symmetric Holevoinformation I ( W ) and the fraction for which I ( W ( i ) N ) ∈ [0 , δ ) goes to − I ( W ) for any δ ∈ (0 , as N goes to inﬁnitythrough powers of two.The proof of the above theorem is identical to Arikan’sproof with a martingale approach [4]. For completeness, weprovide a brief proof in Section IV.

4) Rate of Polarization:

It is important to characterize thespeed with which the polarization phenomenon comes intoplay for the purpose of proving this paper’s polar codingtheorem. We exploit the ﬁdelity F ( W ( i ) ) of the split channelsin order to characterize the rate of polarization: F ( W ( i ) N ) ≡ F ( ρ U i − B N ( i ) , , ρ U i − B N ( i ) , ) . (9)The theorem below exploits the exponential convergenceresults of Arikan and Telatar [40], which improved uponArikan’s original convergence results [4] (note that we couldalso use the more general results in Ref. [41]): Theorem 3 (Rate of Polarization):

Given any classical-quantum channel W with I ( W ) > , any R < I ( W ) ,and any constant β < / , there exists a sequence of sets A N ⊂ { , . . . , N } with |A N | ≥ N R such that (cid:88) i ∈A N (cid:113) F ( W ( i ) N ) = o (2 − N β ) . Conversely, suppose that

R > and β > / . Then for anysequence of sets A N ⊂ { , . . . , N } with |A N | ≥ N R , thefollowing result holds max (cid:26)(cid:113) F ( W ( i ) N ) : i ∈ A N (cid:27) = ω (2 − N β ) . The proof of this theorem exploits our results in Section IIIand Theorem 1 of Ref. [40].

B. Polar Coding

The idea behind polar coding is to exploit the polarizationeffect for the construction of a capacity-achieving code. Thesender should transmit the information bits only throughthe split channels W ( i ) N for which the reliability parameter F ( W ( i ) N ) is close to zero. In doing so, the sender and receivercan achieve the symmetric Holevo information I ( W ) of thechannel W .

1) Coset Codes:

Polar codes arise from a special class ofcodes that Arikan calls “ G N -coset codes” [4]. These G N -coset codes are given by the following mapping from the inputsequence u N to the channel input sequence x N : x N = u N G N , where G N is the encoding matrix deﬁned in (5). Suppose that A is some subset of { , . . . , N } . Then we can write the abovetransformation as follows: x N = u A G N ( A ) ⊕ u A c G N ( A c ) , (10)where G N ( A ) denotes the submatrix of G N constructed fromthe rows of G N with indices in A and ⊕ denotes vector binaryaddition.Suppose that we ﬁx the set A and the bit sequence u A c . Themapping in (10) then speciﬁes a transformation from the bitsequence u A to the channel input sequence x N . This mappingis equivalent to a linear encoding for a code that Arikan callsa G N -coset code where the sequence u A c G N ( A c ) identiﬁesthe coset. We can fully specify a coset code by the parametervector ( N, K, A , u A c ) where N is the length of the code, K = |A| is the number of information bits, A is a set that identiﬁesthe indices for the information bits, and u A c is the vector offrozen bits. The polar coding rule speciﬁes a way to choosethe indices for the information bits based on the channel overwhich the sender is transmitting data.

2) A Quantum Successive Cancellation Decoder:

The spec-iﬁcation of the quantum successive cancellation decoder iswhat mainly distinguishes Arikan’s polar codes for classicalchannels from ours developed here for classical-quantumchannels. Let us begin with a G N -coset code with parametervector ( N, K, A , u A c ) . The sender encodes the information bitvector u A along with the frozen vector u A c according to thetransformation in (10). The sender then transmits the encodedsequence x N through the classical-quantum channel, leadingto a state ρ x ⊗· · ·⊗ ρ x N , which is equivalent to a state ρ u N upto the transformation G N . It is then the goal of the receiver toperform a sequence of quantum measurements on the state ρ u N in order to determine the bit sequence u N . We are assumingthat the receiver has full knowledge of the frozen vector u A c so that he does not make mistakes when decoding these bits.Corresponding to the split channels W ( i ) N in (6) are thefollowing projectors that can attempt to decide whether theinput of the i th split channel is zero or one: Π U i − B N ( i ) , ≡ (cid:40)(cid:114) ρ U i − B N ( i ) , − (cid:114) ρ U i − B N ( i ) , ≥ (cid:41) , Π U i − B N ( i ) , ≡ I − Π U i − B N ( i ) , = (cid:40)(cid:114) ρ U i − B N ( i ) , − (cid:114) ρ U i − B N ( i ) , < (cid:41) , where √ A denotes the square root of a positive operator A , { B ≥ } denotes the projector onto the positive eigenspace ofa Hermitian operator B , and { B < } denotes the projectiononto the negative eigenspace of B . After some calculations,we can readily see that Π U i − B N ( i ) , = (cid:88) u i − (cid:12)(cid:12) u i − (cid:11)(cid:10) u i − (cid:12)(cid:12) U i − ⊗ Π B N ( i ) ,u i − , (11) Π U i − B N ( i ) , = (cid:88) u i − (cid:12)(cid:12) u i − (cid:11)(cid:10) u i − (cid:12)(cid:12) U i − ⊗ Π B N ( i ) ,u i − , (12)where Π B N ( i ) ,u i − ≡ (cid:110)(cid:113) ρ B N u i − − (cid:113) ρ B N u i − ≥ (cid:111) , Π B N ( i ) ,u i − ≡ (cid:110)(cid:113) ρ B N u i − − (cid:113) ρ B N u i − < (cid:111) . The above observations lead to a method for a successivecancellation decoder similar to Arikan’s [4], with the followingdecoding rule: ˆ u i = (cid:26) u i if i ∈ A c h (cid:0) ˆ u i − (cid:1) if i ∈ A , where h (cid:0) ˆ u i − (cid:1) is the outcome of the following i th measure-ment on the output of the channel (after i − measurementshave already been performed): (cid:110) Π B N ( i ) , ˆ u i − , Π B N ( i ) , ˆ u i − (cid:111) . We are assuming that the measurement device outputs “0” ifthe outcome Π B N ( i ) , ˆ u i − occurs and it outputs “1” otherwise.(Note that we can set Π B N ( i ) , ˆ u i − u i = I if the bit u i is a frozenbit.) The above sequence of measurements for the whole bitstream u N corresponds to a positive operator-valued measure(POVM) { Λ u N } where Λ u N ≡ Π B N (1) ,u · · · Π B N ( i ) ,u i − u i · · ·· · · Π B N ( N ) ,u N − u N · · · Π B N ( i ) ,u i − u i · · · Π B N (1) ,u , (cid:88) u A Λ u N = I B N . The above decoding strategy is suboptimal in two regards.First, the decoder assumes that the future bits are unknown(and random) even if the receiver has full knowledge ofthe future frozen bits (this suboptimality is similar to thesuboptimality of Arikan’s decoder [4]). Second, the measure-ment operators for making a decision are suboptimal as wellbecause we choose them to be projectors onto the positiveeigenspace of the difference of the square roots of two densityoperators. The optimal bitwise decision rule is to choose theseoperators to be the Helstrom-Holevo projector onto the positiveeigenspace of the difference of two density operators [34],[35]. Having our quantum successive cancellation decoderoperate in these two different suboptimal ways allows forus to analyze its performance easily (though, note that wecould just as well have used Helstrom-Holevo measurementsto obtain bounds on the error probability). This suboptimalityis asymptotically negligible because the symmetric Holevoinformation is still an achievable rate for data transmissioneven for the above choice of measurement operators.

3) Polar Code Performance:

The probability of error P e ( N, K, A , u A c ) for code length N , number K of informa-tion bits, set A of information bits, and choice u A c for thefrozen bits is as follows: P e ( N, K, A , u A c )= 12 K (cid:88) u A Tr { ( I − Λ u N ) ρ u N } = 1 − K (cid:88) u A Tr { Λ u N ρ u N } = 1 − K (cid:88) u A Tr (cid:26) Π B N ( N ) ,u N − u N · · · Π B N ( i ) ,u i − u i · · ·· · · Π B N (1) ,u ρ u N Π B N (1) ,u · · · Π B N ( i ) ,u i − u i · · · Π B N ( N ) ,u N − u N (cid:27) , where we are assuming a particular choice of the bits u A c in the sequence of projectors Π B N ( N ) ,u N − u N · · · Π B N ( i ) ,u i − u i · · · Π B N (1) ,u and the convention mentioned before that Π B N ( i ) ,u i − u i = I if u i is a frozen bit. We are also assuming thatthe sender transmits the information sequence u A with uni-form probability − K . The probability of error P e ( N, K, A ) averaged over all choices of the frozen bits is then P e ( N, K, A )= 12 N − K (cid:88) u A c P e ( N, K, A , u A c )= 1 − N (cid:88) u N Tr (cid:26) Π B N ( N ) ,u N − u N · · · Π B N ( i ) ,u i − u i · · ·· · · Π B N (1) ,u ρ u N Π B N (1) ,u · · · Π B N ( i ) ,u i − u i · · · Π B N ( N ) ,u N − u N (cid:27) . (13)One of the main contributions of this paper is the followingproposition regarding the average ensemble performance ofpolar codes with a quantum successive cancellation decoder: Proposition 4:

For any classical-quantum channel W withbinary inputs and quantum outputs and any choice of ( N, K, A ) , the following bound holds P e ( N, K, A ) ≤ (cid:115)(cid:88) i ∈A (cid:113) F ( W ( i ) N ) . Thus, there exists a frozen vector u A c for each ( N, K, A ) suchthat P e ( N, K, A , u A c ) ≤ (cid:115)(cid:88) i ∈A (cid:113) F ( W ( i ) N ) .

4) Polar Coding Theorem:

Proposition 4 immediately leadsto the deﬁnition of polar codes for classical-quantum channels:

Deﬁnition 5 (Polar Code):

A polar code for W is a G N -coset code with parameters ( N, K, A , u A c ) where the infor-mation set A is such that |A| = K and F ( W ( i ) N ) ≤ F ( W ( j ) N ) for all i ∈ A and j ∈ A c . We can ﬁnally state the polar coding theorem for classical-quantum channels. Consider a classical-quantum channel W and a real number R ≥ . Let P e ( N, R ) = P e ( N, (cid:98) N R (cid:99) , A ) , with the information bit set chosen according to the polarcoding rule in Deﬁnition 5. So P e ( N, A ) is the block errorprobability for polar coding over W with blocklength N , rate R , and quantum successive cancellation decoding averageduniformly over the frozen bits u A c . Theorem 6 (Polar Coding Theorem):

For any classical-quantum channel W with binary inputs and quantum outputs,a ﬁxed R < I ( W ) , and β < / , the block error probability P e ( N, R ) satisﬁes the following bound: P e ( N, R ) = o (2 − N β ) . The polar coding theorem above follows as a straightfor-ward corollary of Theorem 3 and Proposition 4.III. R

ECURSIVE C HANNEL T RANSFORMATIONS

This section delves into more detail regarding recursivechannel combining and channel splitting. Recall the channelcombining in (3-5) and the channel splitting in (6). Theseallowed for us to take N independent copies of a classical-quantum channel W ⊗ N and transform them into the N split Fig. 3. The channels W − and W + induced from channel combining andchannel splitting. The channel W − with input u is induced by selecting thebit U uniformly at random, passing both u and U through the encoder,and then through the two channel uses. The channel W + with input u isinduced by selecting U uniformly at random, copying it to another bit (viathe classical CNOT gate), sending both U and u through the encoder, andthe outputs are the quantum outputs and the bit U . channels W (1) N , . . . , W ( N ) N . We show here how to breakthe channel transformation into a series of single-step trans-formations. Much of the discussion here parallels Arikan’sdiscussion in Sections II and III of Ref. [4].We obtain a pair of channels W − and W + from twoindependent copies of a channel W : x → ρ x by a single-step transformation if it holds that W − : u → ρ − u , where ρ − u ≡ (cid:88) u ρ B u ⊕ u ⊗ ρ B u . (14)Also, it should hold that W + : u → ρ + u , where ρ + u ≡ (cid:88) u | u (cid:105)(cid:104) u | U ⊗ ρ B u ⊕ u ⊗ ρ B u (15) = (cid:32)(cid:88) u | u (cid:105)(cid:104) u | U ⊗ ρ B u ⊕ u (cid:33) ⊗ ρ B u . We use the following notation to denote such a transformation: ( W, W ) → (cid:0) W − , W + (cid:1) . Additionally, we choose the notation W − and W + so that W − denotes the worse channel and W + denotes the betterchannel. Figure 3 depicts the channels W − and W + . Thus, from the above, we can write ( W, W ) → ( W (1)2 , W (2)2 ) because, by the deﬁnition in (6), we have W (1)2 ( u ) = (cid:88) u ρ B u ⊕ u ⊗ ρ B u ,W (2)2 ( u ) = (cid:88) u | u (cid:105)(cid:104) u | U ⊗ ρ B u ⊕ u ⊗ ρ B u . We can actually write more generally ( W ( i ) N , W ( i ) N ) → ( W (2 i − N , W (2 i )2 N ) , (16)which follows as a corollary to Proposition 7:

For any n ≥ , N = 2 n , and ≤ i ≤ N , itholds that W (2 i − N ( u i − ) = (cid:88) u i W ( i ) N ( u i − ⊕ u i ) ⊗ W ( i ) N ( u i ) , (17) W (2 i )2 N ( u i ) = W ( i ) N ( u i − ⊕ u i ) ⊗ W ( i ) N ( u i ) , (18)with W ( i ) N deﬁned in (6). Proof:

The proof of the above proposition is similar tothe proof of Arikan’s Proposition 3 [4].We can justify the relationship in (16) by observing that(17) and (18) have the same form as (14) and (15) with thefollowing substitutions: W ← W ( i ) N ,W + ← W (2 i )2 N ,W − ← W (2 i − N ,u ← u i − ,u ← u i . A. Transformation of Rate and Reliability

This section considers how both the rate I ( W ( i ) N ) andreliability F ( W ( i ) N ) evolve under the general transformationin (16). All proofs of the results in this section appear in theappendix. Proposition 8:

Suppose that ( W, W ) → ( W − , W + ) forsome channels satisfying (14-15). Then the following rateconservation and polarizing relations hold I (cid:0) W − (cid:1) + I (cid:0) W + (cid:1) = 2 I ( W ) , (19) I (cid:0) W − (cid:1) ≤ I (cid:0) W + (cid:1) . (20)We can conclude from the above two relations that I (cid:0) W − (cid:1) ≤ I ( W ) ≤ I (cid:0) W + (cid:1) . The following proposition states how the reliability evolvesunder the channel transformation:

Proposition 9:

Suppose ( W, W ) → ( W − , W + ) for somechannels satisfying (14-15). Then (cid:112) F ( W + ) = F ( W ) , (21) (cid:112) F ( W − ) ≤ (cid:112) F ( W ) − F ( W ) , (22) F (cid:0) W − (cid:1) ≥ F ( W ) ≥ F (cid:0) W + (cid:1) . (23) By combining (21) with (22), we observe that the reliabilityonly improves under a single-step transformation: (cid:112) F ( W − ) + (cid:112) F ( W + ) ≤ (cid:112) F ( W ) . The above propositions for the single-step transformationlead us to the following proposition in the general case:

Proposition 10:

For any classical-quantum channel W , N = 2 n , n ≥ , and ≤ i ≤ N , the local transformation in(16) preserves rate and improves reliability in the followingsense: I ( W (2 i − N ) + I ( W (2 i )2 N ) = 2 I ( W ( i ) N ) , (24) (cid:113) F ( W (2 i − N ) + (cid:113) F ( W (2 i )2 N ) ≤ (cid:113) F ( W ( i ) N ) . (25)Channel splitting moves rate and reliability “away from thecenter”: I ( W (2 i − N ) ≤ I ( W ( i ) N ) ≤ I ( W (2 i )2 N ) , (cid:113) F ( W (2 i − N ) ≥ (cid:113) F ( W ( i ) N ) ≥ (cid:113) F ( W (2 i )2 N ) . The reliability terms satisfy (cid:113) F ( W (2 i )2 N ) = F ( W ( i ) N ) , (26) (cid:113) F ( W (2 i − N ) ≤ (cid:113) F ( W ( i ) N ) − F ( W ( i ) N ) , (27)and the cumulative rate and reliability satisfy N (cid:88) i =1 I ( W ( i ) N ) = N I ( W ) , (28) N (cid:88) i =1 (cid:113) F ( W ( i ) N ) ≤ N (cid:112) F ( W ) . (29)The above proposition follows directly from Propositions 7,8, and 9. The relations in (28) and (29) follow from applying(24) and (25) repeatedly.IV. C HANNEL P OLARIZATION

We are now in a position to prove Theorem 2 on channelpolarization. The idea behind the proof of this theorem isidentical to Arikan’s proof of his Theorem 1 in Ref. [4]—withthe relationships in Propositions 8 and 9 already established,we can readily exploit the martingale proof technique. Thus,we only provide a brief summary of the proof of Theorem 2by following the presentation in Chapter 2 of Ref. [3].Consider the channel W ( i ) N . Let b · · · b n denote an n -bitbinary expansion of the channel index i and let W ( b ··· b n ) = W ( i ) N . Then we can construct the channel W ( b ··· b k ) by com-bining two copies of W ( b ··· b k − ) according to (17) if b k = 0 or by combining two copies of W ( b ··· b k − ) according to (18)if b k = 1 . We repeatedly construct all the way from b until b n with the above rule.Arikan’s idea was to represent the channel construction as arandom birth process in order to analyze its limiting behavior.In order to do so, we let { B n : n ≥ } be a sequence of IIDuniform Bernoulli random variables, where we deﬁne eachover a probability space (Ω , F , P ) . Let F denote the trivial σ -ﬁeld. Also, let {F n : n ≥ } denote the σ -ﬁelds that the random variables ( B , . . . , B n ) generate. We also assume that F ⊆ F ⊆ · · · ⊆ F n . Let W = W and let { W n : n ≥ } denote a sequence of operator-valued random variables thatforms a tree process where W n +1 is constructed from twocopies of W n according to (17) if B n = 0 and accordingto (18) if B n = 1 . The output space of the operator-valuedrandom variable W n is equal to { W ( i )2 n } n i =1 . We are not reallyconcerned with the channel process { W n : n ≥ } but moreso with the ﬁdelities { F ( W ( i ) N ) } and Holevo informations { I ( W ( i ) N ) } . Thus, we can simply analyze the limiting behaviorof the two random processes { F n : n ≥ } ≡ { (cid:112) F ( W n ) : n ≥ } and { I n : n ≥ } ≡ { I ( W n ) : n ≥ } . By thedeﬁnitions of the random variables F n and I n , it follows that Pr { I n ∈ ( a, b ) } = 12 n |{ i : I ( W ( i )2 n ) ∈ ( a, b ) }| , Pr { F n ∈ ( a, b ) } = 12 n |{ i : F ( W ( i )2 n ) ∈ ( a, b ) }| . We then have the following lemma.

Lemma 11:

The sequence { ( F n , F n ) : n ≥ } is a boundedsuper-martingale, and the sequence { ( I n , F n ) : n ≥ } is abounded martingale. Proof:

Let b · · · b n be a particular realization of therandom sequence B · · · B n . Then the conditional expectationsatisﬁes E { I n +1 | B = b , . . . , B n = b n } = 12 I (cid:0) W ( b ,...,b n , (cid:1) + 12 I (cid:0) W ( b ,...,b n , (cid:1) = I (cid:0) W ( b ,...,b n ) (cid:1) = I n , where the second equality follows from the deﬁnition of W ( b ,...,b n , and W ( b ,...,b n , and Proposition 10. The prooffor { F n } similarly follows from the deﬁnitions and Propo-sition 10. The boundedness condition follows because ≤ I ( W ) , F ( W ) ≤ for any classical-quantum channel W withbinary inputs and quantum outputs.We can now ﬁnally prove Theorem 2 regarding channelpolarization. Given that { I n } is a bounded martingale and { F n } is a bounded super-martingale, the limits lim n →∞ I n and lim n →∞ F n converge almost surely and in L to therandom variables I ∞ and F ∞ . The convergence implies that E {| F n +1 − F n |} → as n → ∞ . By the deﬁnition of theprocess { F n } , it holds that F n +1 = F n with probability , sothat E {| F n +1 − F n |} ≥ E {| F n (1 − F n ) |} ≥ . It then follows that E {| F n (1 − F n ) |} → as n → ∞ , whichin turn implies that E {| F ∞ (1 − F ∞ ) |} = 0 . We concludethat F ∞ ∈ { , } almost surely. Combining this result withProposition 1 proves that I ∞ ∈ { , } almost surely. Finally,we have that Pr { I ∞ = 1 } = E { I ∞ } = E { I } = I ( W ) because I n is a martingale.V. P ERFORMANCE OF P OLAR C ODING

We can now analyze the performance under the abovesuccessive cancellation decoding scheme and provide a proof of Proposition 4. The proof of Theorem 6 readily follows byapplying Proposition 4 and Theorem 3.First recall the following “non-commutative union bound”of Sen (Lemma 3 in Ref. [25]): − Tr { Π N · · · Π ρ Π · · · Π N } ≤ (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 Tr { ( I − Π i ) ρ } , (30)which holds for projectors Π , . . . , Π N and a density operator ρ . We begin by applying the above inequality to P e ( N, K, A ) (deﬁned in (13)): P e ( N, K, A )= 12 N (cid:88) u N (cid:18) − Tr (cid:26) Π B N ( N ) ,u N − u N · · · Π B N ( i ) ,u i − u i · · ·· · · Π B N (1) ,u ρ u N Π B N (1) ,u · · · Π B N ( i ) ,u i − u i · · · Π B N ( N ) ,u N − u N (cid:27)(cid:19) ≤ N (cid:88) u N (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 Tr (cid:110)(cid:16) I − Π B N ( i ) ,u i − u i (cid:17) ρ u N (cid:111) = 12 N (cid:88) u N (cid:115)(cid:88) i ∈A Tr (cid:110)(cid:16) I − Π B N ( i ) ,u i − u i (cid:17) ρ u N (cid:111) ≤ (cid:115) N (cid:88) u N (cid:88) i ∈A Tr (cid:110)(cid:16) I − Π B N ( i ) ,u i − u i (cid:17) ρ u N (cid:111) where the second equality follows from our convention that Π B N ( i ) ,u i − u i = I if u i is a frozen bit and the second inequalityfollows from concavity of the square root. Continuing, we have = 2 (cid:115)(cid:88) i ∈A (cid:88) u N N Tr (cid:110) ˆΠ B N ( i ) ,u i − u i ρ u N (cid:111) = 2 (cid:118)(cid:117)(cid:117)(cid:116)(cid:88) i ∈A (cid:88) u i − i − (cid:88) u i (cid:88) u Ni +1 N − i Tr (cid:110) ˆΠ B N ( i ) ,u i − u i ρ u N (cid:111) = 2 (cid:118)(cid:117)(cid:117)(cid:117)(cid:116)(cid:88) i ∈A (cid:88) u i − i − (cid:88) u i Tr  ˆΠ B N ( i ) ,u i − u i (cid:88) u Ni +1 N − i ρ u N  , where we deﬁne ˆΠ B N ( i ) ,u i − u i = I − Π B N ( i ) ,u i − u i . The ﬁrst equality follows from exchanging the sums. Thesecond equality follows from expanding the sum and normal-ization (cid:80) u N N . The third equality follows from bringing the We say that Sen’s bound is a “non-commutative union bound” becauseit is analogous to the following union bound from probability theory: Pr { ( A ∩ · · · ∩ A N ) c } = Pr (cid:8) A c ∪ · · · ∪ A cN (cid:9) ≤ (cid:80) Ni =1 Pr (cid:8) A ci (cid:9) , where A , . . . , A N are events. The analogous bound for projector logic wouldbe Tr { ( I − Π · · · Π N · · · Π ) ρ } ≤ (cid:80) Ni =1 Tr { ( I − Π i ) ρ } , if we think of Π · · · Π N as a projector onto the intersection of subspaces. Though, theabove bound only holds if the projectors Π , . . . , Π N are commuting (choos-ing Π = | + (cid:105)(cid:104) + | , Π = | (cid:105)(cid:104) | , and ρ = | (cid:105)(cid:104) | gives a counterexample). Ifthe projectors are non-commuting, then Sen’s bound in (30) is the next bestthing and sufﬁces for our purposes here. sum (cid:80) u Ni +1 N − i inside the trace. Continuing, = 2 (cid:118)(cid:117)(cid:117)(cid:116)(cid:88) i ∈A (cid:88) u i − i − (cid:88) u i Tr (cid:110)(cid:16) I − Π B N ( i ) ,u i − u i (cid:17) ρ B N u i (cid:111) = 2 (cid:118)(cid:117)(cid:117)(cid:116)(cid:88) i ∈A (cid:88) u i (cid:88) u i − i − Tr (cid:110)(cid:16) I − Π B N ( i ) ,u i − u i (cid:17) ρ B N u i (cid:111) = 2 (cid:18) (cid:88) i ∈A (cid:88) u i Tr (cid:8) ( I − (cid:88) u i − (cid:12)(cid:12) u i − (cid:11)(cid:10) u i − (cid:12)(cid:12) U i − ⊗ Π B N ( i ) ,u i − u i ) (cid:88) u i − i − (cid:12)(cid:12) u i − (cid:11)(cid:10) u i − (cid:12)(cid:12) U i − ⊗ ρ B N u i (cid:9)(cid:19) − The ﬁrst equality is from the deﬁnition in (8). The secondequality is from exchanging sums. The third equality is fromthe fact that (cid:88) x p ( x ) Tr { A x ρ x } = Tr (cid:40)(cid:32)(cid:88) x | x (cid:105)(cid:104) x | ⊗ A x (cid:33)(cid:32)(cid:88) x (cid:48) p ( x (cid:48) ) | x (cid:48) (cid:105)(cid:104) x (cid:48) | ⊗ ρ x (cid:48) (cid:33)(cid:41) . Continuing, = 2 (cid:115)(cid:88) i ∈A (cid:88) u i Tr (cid:110)(cid:16) I − Π U i − B N ( i ) ,u i (cid:17) ρ U i − B N ( i ) ,u i (cid:111) ≤ (cid:115)(cid:88) i ∈A (cid:113) F (cid:0) W ( i ) (cid:1) The ﬁrst equality is from the observations in (11-12) and thedeﬁnition in (6). The ﬁnal inequality follows from Lemma 3.2of Ref. [37] and the deﬁnition in (9). This completes the proofof Proposition 4.We state the proof of Theorem 6 for completeness. InvokingTheorem 3, there exists a sequence of sets A N with size |A N | ≥ N R for any

R < I ( W ) and β < / such that (cid:88) i ∈A N (cid:113) F (cid:0) W ( i ) (cid:1) = o (2 − N β ) , and thus (cid:115) (cid:88) i ∈A N (cid:113) F (cid:0) W ( i ) (cid:1) = o (2 − N β ) . This bound holds if we choose the set A N according to thepolar coding rule because this rule minimizes the above sumby deﬁnition. Theorem 6 follows by combining Proposition 4with this fact about the polar coding rule.VI. C ONCLUSION

We have shown how to construct polar codes for channelswith classical binary inputs and quantum outputs, and weshowed that they can achieve the symmetric Holevo informa-tion rate for classical communication. In fact, for a quantumchannel with binary pure state outputs, such as a binary-phase-shift-keyed (BPSK) coherent-state optical communica-tion alphabet, the symmetric Holevo information rate is the ultimate channel capacity [15], which is therefore achieved byour polar code [42]. The general idea behind the constructionis similar to Arikan’s [4], but we required several technicaladvances in order to demonstrate both channel polarizationat the symmetric Holevo information rate and the operationof the quantum successive cancellation decoder. To provethat channel polarization takes hold, we could exploit severalresults in the quantum information literature [30], [31], [32],[10], [33], [11] and some of Arikan’s tools. To prove thatthe quantum successive cancellation decoder works well, weexploited some ideas from quantum hypothesis testing [34],[35], [36], [37], [38] and Sen’s recent “non-commutative unionbound” [25]. The result is a near-explicit code construction thatachieves the symmetric Holevo information rate for channelswith classical inputs and quantum outputs. (When we say“near-explicit,” we mean that it still remains open in thequantum case to determine which synthesized channels aregood or bad.) Also, several works have now appeared onpolar coding for private classical communication and quantumcommunication [43], [44], [43], [45], [46], [47], most of whichuse the results developed in this paper.One of the main open problems going forward from hereis to simplify the quantum successive cancellation decoder.Arikan could show how to calculate later estimates by exploit-ing the results of earlier estimates in an “FFT-like” fashion,and this observation reduced the complexity of the decodingto O ( N log N ) . It is not clear to us yet how to reduce thecomplexity of the quantum successive cancellation decoderbecause it is not merely a matter of computing formulas, butrather a sequence of physical operations (measurements) thatthe receiver needs to perform on the channel output systems.If there were some way to perform the measurements onsmaller systems and then adaptively perform other measure-ments based on earlier results, then this would be helpful indemonstrating a reduced complexity.Another important open question is to devise an efﬁcientconstruction of the polar codes, something that remains anopen problem even for classical polar codes. However, therehas been recent work on efﬁcient suboptimal classical polarcode constructions [48], which one might try to extend to polarcodes for the classical-quantum channel. Finally, extending ourcode and decoder construction to a classical-quantum channelwith a non-binary (M-ary) alphabet remains a good open linefor investigation.VII. A CKNOWLEDGMENTS

MMW acknowledges ﬁnancial support from the MDEIE(Qu´ebec) PSR-SIIRI international collaboration grant. SG wassupported by the DARPA Information in a Photon (InPho)program under contract number HR0011-10-C-0159. We thankDavid Forney, MIT for suggesting us to try polar codes forthe quantum channel. We also thank Emre Telatar, EPFL foran intuitive tutorial on channel polarization at ISIT 2011.A

PPENDIX

Proof of Proposition 1:

The ﬁrst bound in (1) followsfrom Holevo’s characterization of the quantum cutoff rate (Proposition 1 of Ref. [32]). In particular, Holevo proved thatthe following inequality holds for all s ∈ [0 , : I ( X ; B ) ω ≥ − log Tr (cid:40)(cid:32) (cid:88) x ∈X p X ( x )( ω x ) s (cid:33) s (cid:41) , where the entropy on the LHS is with respect to a classical-quantum state ω XB ≡ (cid:88) x ∈X p X ( x ) | x (cid:105)(cid:104) x | X ⊗ ω Bx . By setting s = 1 , the alphabet X = { , } , and the distribution p X ( x ) to be uniform, we obtain the bound I ( W ) ≥ − log (cid:18) Tr (cid:26)(cid:18)

12 ( √ ρ + √ ρ ) (cid:19) (cid:27)(cid:19) = − log (cid:18) Tr { ρ + √ ρ √ ρ + √ ρ √ ρ + ρ } (cid:19) = − log (cid:18)

12 (1 + Tr {√ ρ √ ρ } ) (cid:19) = log (cid:32)

21 + Tr (cid:8) √ ρ √ ρ (cid:9) (cid:33) ≥ log (cid:32)

21 + (cid:112) F ( W ) (cid:33) , where the last line follows fromTr {√ ρ √ ρ } ≤ Tr {|√ ρ √ ρ |} = (cid:107)√ ρ √ ρ (cid:107) = (cid:112) F ( W ) . The other inequality in (2) follows from (21) in Ref. [33].In particular, they showed that I ( W ) ≤ H (cid:18) (cid:16) − (cid:112) F ( W ) (cid:17)(cid:19) , where the binary entropy H ( x ) ≡ − x log x − (1 − x ) log (1 − x ) . Combining this with the followingobservation that holds for all ≤ F ( W ) ≤ gives the secondinequality: H (cid:18) (cid:16) − (cid:112) F ( W ) (cid:17)(cid:19) ≤ (cid:112) − F ( W ) . Proof of Proposition 8:

These follow from the same lineof reasoning as in the proof of Arikan’s Proposition 4 [4]. Weprove the ﬁrst equality. Consider the mutual information I ( U U ; B B ) = I ( X X ; B B )= I ( X ; B ) + I ( X ; B )= 2 I ( W ) . By the chain rule for quantum mutual information [11], wehave I ( U U ; B B ) = I ( U ; B B ) + I ( U ; B B U )= I (cid:0) W − (cid:1) + I (cid:0) W + (cid:1) . The inequality follows because I (cid:0) W + (cid:1) = I ( U ; B B U )= I ( U ; B ) + I ( U ; B U | B )= I ( W ) + I ( U ; B U | B ) . Thus, I (cid:0) W + (cid:1) ≥ I ( W ) because I ( U ; B U | B ) ≥ [30], [10], [11]. We then have I (cid:0) W + (cid:1) ≥ I ( W )= I (cid:0) W − (cid:1) + I (cid:0) W + (cid:1) , and the inequality follows. Proof of Proposition 9:

We begin with the ﬁrst equality.Consider that (cid:112) F ( W + )= (cid:113) F (cid:0) ρ +0 , ρ +1 (cid:1) = (cid:118)(cid:117)(cid:117)(cid:116) F (cid:32) (cid:88) u | u (cid:105)(cid:104) u | ⊗ ρ B u , (cid:88) u | u ⊕ (cid:105)(cid:104) u ⊕ | ⊗ ρ B u (cid:33) × (cid:114) F (cid:16) ρ B , ρ B (cid:17) = 12 (cid:32)(cid:114) F (cid:16) ρ B , ρ B (cid:17) + (cid:114) F (cid:16) ρ B , ρ B (cid:17)(cid:33)(cid:114) F (cid:16) ρ B , ρ B (cid:17) = F ( ρ , ρ )= F ( W ) The ﬁrst two equalities follow by deﬁnition. The third equal-ity follows from the multiplicativity of ﬁdelity under tensorproduct states [10], [11]: F ( ρ ⊗ σ, τ ⊗ ω ) = F ( ρ, τ ) F ( σ, ω ) . The fourth equality follows from the following formula thatholds for the ﬁdelity of classical-quantum states: (cid:118)(cid:117)(cid:117)(cid:116) F (cid:32)(cid:88) x p ( x ) | x (cid:105)(cid:104) x | ⊗ ρ x , (cid:88) x p ( x ) | x (cid:105)(cid:104) x | ⊗ σ x (cid:33) = (cid:88) x p ( x ) (cid:112) F ( ρ x , σ x ) . We now consider the second inequality. The ﬁdelity also hasthe following characterization as the minimum Bhattacharyaoverlap between distributions induced by a POVM on thestates [31], [10], [11]: F ( ρ , ρ ) = min { Λ m } (cid:32)(cid:88) m (cid:112) Tr { Λ m ρ } Tr { Λ m ρ } (cid:33) . So (cid:112) F ( W − ) = min (cid:110) Γ B B m (cid:111) (cid:88) m (cid:114) Tr (cid:110) Γ B B m ρ − (cid:111) Tr (cid:110) Γ B B m ρ − (cid:111) Let Λ m denote the POVM that achieves the minimum for (cid:112) F ( W ) : (cid:112) F ( W ) = (cid:112) F ( ρ , ρ )= min { Λ m } (cid:88) m (cid:112) Tr { Λ m ρ } Tr { Λ m ρ } . Then the POVM { Λ l ⊗ Λ m } is a particular POVM that cantry to distinguish the states ρ − and ρ − . We then have (cid:112) F ( W − ) ≤ (cid:88) l,m (cid:113) Tr (cid:8) (Λ l ⊗ Λ m ) (cid:0) ρ − (cid:1)(cid:9) Tr (cid:8) (Λ l ⊗ Λ m ) (cid:0) ρ − (cid:1)(cid:9) = (cid:88) l,m (cid:115) Tr (cid:26) (Λ l ⊗ Λ m ) 12 (cid:16) ρ B ⊗ ρ B + ρ B ⊗ ρ B (cid:17)(cid:27) × (cid:115) Tr (cid:26) (Λ l ⊗ Λ m ) 12 (cid:16) ρ B ⊗ ρ B + ρ B ⊗ ρ B (cid:17)(cid:27) = 12 (cid:88) l,m (cid:20)(cid:18) Tr (cid:110) Λ l ρ B (cid:111) Tr (cid:110) Λ m ρ B (cid:111) + Tr (cid:110) Λ l ρ B (cid:111) Tr (cid:110) Λ m ρ B (cid:111)(cid:19)(cid:18) Tr (cid:110) Λ l ρ B (cid:111) Tr (cid:110) Λ m ρ B (cid:111) + Tr (cid:110) Λ l ρ B (cid:111) Tr (cid:110) Λ m ρ B (cid:111)(cid:19)(cid:21) / Making the assignments α m ≡ Tr (cid:110) Λ m ρ B (cid:111) ,β l ≡ Tr (cid:110) Λ l ρ B (cid:111) ,γ l ≡ Tr (cid:110) Λ l ρ B (cid:111) ,δ m ≡ Tr (cid:110) Λ m ρ B (cid:111) , the above expression is equal to (cid:88) l,m (cid:112) β l α m + γ l δ m (cid:112) γ l α m + β l δ m We can then exploit Arikan’s inequality in Appendix D ofRef. [4] to have (cid:88) l,m (cid:112) β l α m + γ l δ m (cid:112) γ l α m + β l δ m ≤ (cid:88) l,m (cid:16)(cid:112) β l α m + (cid:112) γ l δ m (cid:17)(cid:16) √ γ l α m + (cid:112) β l δ m (cid:17) − (cid:88) l,m (cid:112) β l α m γ l δ m = (cid:88) l,m (cid:16) α m (cid:112) β l γ l + γ l (cid:112) δ m α m + β l (cid:112) α m δ m + δ m (cid:112) γ l β l (cid:17) − (cid:88) l (cid:112) β l γ l (cid:88) m (cid:112) α m δ m = (cid:88) l (cid:112) β l γ l + (cid:88) m (cid:112) δ m α m − (cid:88) l (cid:112) β l γ l (cid:88) m (cid:112) α m δ m = 2 (cid:112) F ( W ) − F ( W ) . The inequality F ( W − ) ≥ F ( W ) follows from concavityof ﬁdelity and its multiplicativity under tensor products [10],[11]: F (cid:0) W − (cid:1) = F (cid:0) ρ − , ρ − (cid:1) ≥ F (cid:16) ρ B ⊗ ρ B , ρ B ⊗ ρ B (cid:17) + 12 F (cid:16) ρ B ⊗ ρ B , ρ B ⊗ ρ B (cid:17) = 12 F (cid:16) ρ B , ρ B (cid:17) F (cid:16) ρ B , ρ B (cid:17) + 12 F (cid:16) ρ B , ρ B (cid:17) F (cid:16) ρ B , ρ B (cid:17) = 12 F (cid:16) ρ B , ρ B (cid:17) + 12 F (cid:16) ρ B , ρ B (cid:17) = F ( W ) The inequality F ( W ) ≥ F ( W + ) follows from the relation (cid:112) F ( W + ) = F ( W ) and the fact that ≤ F ≤ .R EFERENCES[1] C. E. Shannon, “A mathematical theory of communication,”

Bell SystemTechnical Journal , vol. 27, pp. 379–423, 1948.[2] T. Richardson and R. Urbanke,

Modern Coding Theory . CambridgeUniversity Press, 2008.[3] S. B. Korada, “Polar codes for channel and source coding,” Ph.D.dissertation, ´Ecole Polytechnique F´ed´erale de Lausanne, July 2009.[4] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”

IEEETransactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, July2009, arXiv:0807.3917.[5] ——, “Channel combining and splitting for cutoff rate improvement,”

IEEE Transactions on Information Theory , vol. 52, no. 2, pp. 628–639,February 2006, arXiv:cs/0508034.[6] E. Sasoglu, E. Telatar, and E. Arikan, “Polarization for arbitrary dis-crete memoryless channels,” in

Proceedings of the 2009 InformationTheory Workshop , Taormina, Sicily, Italy, October 2009, pp. 144–148,arXiv:0908.0302.[7] E. Arikan, “Source polarization,” in

Proceedings of the 2010 IEEEInternational Symposium on Information Theory , Austin, Texas, USA,June 2010, pp. 899–903, arXiv:1001.3087.[8] S. B. Korada and R. Urbanke, “Polar codes are optimal for lossy sourcecoding,”

IEEE Transactions on Information Theory , vol. 56, no. 4, pp.1751–1768, April 2010, arXiv:0903.0307.[9] E. Sasoglu, E. Telatar, and E. Yeh, “Polar codes for the two-usermultiple-access channel,” June 2010, arXiv:1006.4255.[10] M. A. Nielsen and I. L. Chuang,

Quantum Computation and QuantumInformation . Cambridge University Press, 2000.[11] M. M. Wilde,

From Classical to Quantum Shannon Theory , June 2011,arXiv:1106.1445.[12] A. S. Holevo, “The capacity of the quantum channel with general signalstates.”

IEEE Transactions on Information Theory , vol. 44, no. 1, pp.269–273, January 1998, arXiv:quant-ph/9611023.[13] B. Schumacher and M. D. Westmoreland, “Sending classical informationvia noisy quantum channels,”

Physical Review A , vol. 56, no. 1, pp. 131–138, July 1997.[14] V. Giovannetti, S. Guha, S. Lloyd, L. Maccone, J. H. Shapiro, andH. P. Yuen, “Classical capacity of the lossy bosonic channel: The exactsolution,”

Physical Review Letters , vol. 92, no. 2, p. 027902, January2004.[15] S. Guha, “Structured optical receivers to attain superadditive capacityand the Holevo limit,”

Physical Review Letters , vol. 106, p. 240502,June 2011, arXiv:1101.1550.[16] M. B. Hastings, “Superadditivity of communication capacity usingentangled inputs,”

Nature Physics , vol. 5, pp. 255–257, April 2009,arXiv:0809.3972.[17] M. Hayashi and H. Nagaoka, “General formulas for capacity of classical-quantum channels,”

IEEE Transactions on Information Theory , vol. 49,no. 7, pp. 1753–1768, 2003, arXiv:quant-ph/0206186.[18] A. S. Holevo, “Coding theorems for quantum channels,” Tamagawa Uni-versity Research Review, Tech. Rep. 4, 1998, arXiv:quant-ph/9809023. [19] A. Winter, “Coding theorem and strong converse for quantum channels,”

IEEE Transactions on Information Theory , vol. 45, no. 7, pp. 2481–2485, 1999.[20] N. Datta and T. Dorlas, “A quantum version of Feinstein’s theoremand its application to channel coding,” in

Proceedings of the IEEEInternational Symposium on Information Theory , Seattle, Washington,USA, 2006, pp. 441–445.[21] T. Ogawa and H. Nagaoka, “Making good codes for classical-quantumchannel coding via quantum hypothesis testing,”

IEEE Transactions onInformation Theory , vol. 53, no. 6, pp. 2261–2266, June 2007.[22] L. Wang and R. Renner, “One-shot classical-quantum capacity andhypothesis testing,”

Physical Review Letters , vol. 108, p. 200501, May2012.[23] V. Giovannetti, S. Lloyd, and L. Maccone, “Achieving the Holevo boundvia sequential measurements,”

Physical Review A , vol. 85, p. 012302,January 2012.[24] T. M. Cover and J. A. Thomas,

Elements of Information Theory . Wiley-Interscience, 1991.[25] P. Sen, “Achieving the Han-Kobayashi inner bound for the quan-tum interference channel by sequential decoding,” September 2011,arXiv:1109.0802.[26] S. J. Devitt, K. Nemoto, and W. J. Munro, “Quantum error correctionfor beginners,” May 2009, arXiv:0905.2794.[27] A. S. Holevo, “Bounds for the quantity of information transmitted by aquantum communication channel,”

Problems of Information Transmis-sion , vol. 9, pp. 177–183, 1973.[28] A. Uhlmann, “The “transition probability” in the state space of a *-algebra,”

Reports on Mathematical Physics , vol. 9, no. 2, pp. 273–279,1976.[29] R. Jozsa, “Fidelity for mixed quantum states,”

Journal of Modern Optics ,vol. 41, no. 12, pp. 2315–2323, 1994.[30] E. H. Lieb and M. B. Ruskai, “Proof of the strong subadditivity ofquantum-mechanical entropy,”

Journal of Mathematical Physics , vol. 14,pp. 1938–1941, 1973.[31] C. A. Fuchs and J. van de Graaf, “Cryptographic distinguishabilitymeasures for quantum-mechanical states,”

IEEE Transactions on Infor-mation Theory , vol. 45, no. 4, pp. 1216–1227, May 1999, arXiv:quant-ph/9712042.[32] A. S. Holevo, “Reliability function of general classical-quantum chan-nel,”

IEEE Transactions on Information Theory , vol. 46, no. 6, pp. 2256–2261, September 2000, arXiv:quant-ph/9907087.[33] W. Roga, M. Fannes, and K. ˙Zyczkowski, “Universal bounds forthe Holevo quantity, coherent information, and the Jensen-Shannondivergence,”

Physical Review Letters , vol. 105, p. 040505, July 2010,arXiv:1004.4782.[34] C. W. Helstrom, “Quantum detection and estimation theory,”

Journalof Statistical Physics , vol. 1, pp. 231–252, 1969. [Online]. Available:http://dx.doi.org/10.1007/BF01007479[35] A. S. Holevo, “An analog of the theory of statistical decisions innoncommutative theory of probability,”

Trudy Moscov Mat. Obsc. ,vol. 26, pp. 133–149, 1972, english translation: Trans. Moscow MathSoc. 26, 133–149 (1972).[36] C. W. Helstrom,

Quantum Detection and Estimation Theory . New York:Academic, 1976.[37] M. Hayashi,

Quantum Information: An Introduction . Berlin Heidelberg:Springer-Verlag, 2006.[38] H. Nagaoka and M. Hayashi, “An information-spectrum approach toclassical and quantum hypothesis testing for simple hypotheses,”

IEEETransactions on Information Theory , vol. 53, no. 2, pp. 534–549,February 2007, arXiv:quant-ph/0206185.[39] J. Calsamiglia, R. Mu˜noz Tapia, L. Masanes, A. Acin, and E. Bagan,“Quantum Chernoff bound as a measure of distinguishability betweendensity matrices: Application to qubit and Gaussian states,”

PhysicalReview A , vol. 77, p. 032311, March 2008, arXiv:0708.2343.[40] E. Arikan and E. Telatar, “On the rate of channel polarization,” in

Pro-ceedings of the 2009 International Symposium on Information Theory ,Seoul, Korea, June 2009, pp. 1493–1495, arXiv:0807.3806.[41] S. B. Korada, E. Sasoglu, and R. Urbanke, “Polar codes: Characteri-zation of exponent, bounds, and constructions,”

IEEE Transactions onInformation Theory , vol. 56, no. 12, pp. 6253–6264, December 2010,arXiv:0901.0536.[42] S. Guha and M. M. Wilde, “Polar coding to achieve the holevocapacity of a pure-loss optical channel,”

Proceedings of the 2012International Symposium on Information Theory , pp. 551–555, July2012, arXiv:1202.0533.[43] M. M. Wilde and S. Guha, “Polar codes for degradable quantumchannels,” September 2011, arXiv:1109.5346. [44] J. M. Renes, F. Dupuis, and R. Renner, “Efﬁcient quantum polar coding,”September 2011, arXiv:1109.3195.[45] M. M. Wilde and J. M. Renes, “Quantum polar codes for arbitrary chan-nels,” Proceedings of the 2012 International Symposium on InformationTheory , pp. 339–343, July 2012, arXiv:1201.2906.[46] ——, “Polar codes for private classical communication,” 2012,arXiv:1203.5794.[47] Z. Dutton, S. Guha, and M. M. Wilde, “Performance of polar codesfor quantum and private classical communication,”