[PDF] The minimum entropy output of a quantum channel is locally additive

Abstract

We show that the minimum von-Neumann entropy output of a quantum channel is locally additive. Hasting's counterexample for the additivity conjecture, makes this result quite surprising. In particular, it indicates that the non-additivity of the minimum entropy output is a global effect of quantum channels.

Full PDF

aa r X i v : . [ qu a n t - ph ] D ec The minimum entropy output of a quantum channel is locally additive

Gilad Gour ∗ and Shmuel Friedland † Institute for Quantum Information Science and Department of Mathematics and Statistics,University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 Department of Mathematics, Statistics and Computer Science Universityof Illinois at Chicago, 851 S. Morgan Street, Chicago, IL 60607-7045

We show that the minimum von-Neumann entropy output of a quantum channel is locally addi-tive. Hasting’s counterexample for the additivity conjecture, makes this result quite surprising. Inparticular, it indicates that the non-additivity of the minimum entropy output is a global eﬀect ofquantum channels.

PACS numbers:

I. INTRODUCTION

One of the most fundamental questions in quantum information concerns with the amount of information that canbe transmitted reliably through a quantum channel. Despite of the signiﬁcant progress in recent years [1, 2, 4, 5,7, 11, 14, 15, 17, 19–21, 25, 26, 26, 27], as pointed out in [4], this question remained surprisingly wide open. Themain reason for that is related to the additivity nature of the classical or quantum capacities of quantum channels totransmit information [15]. Recently, it was shown that both the Holevo expression for the classical capacity [14] andthe quantum capacity [27] are not additive in general. The additivity of the Holevo expression for the classical capacitywas an open problem for more than a decade and was shown by Shor [26] to be equivalent to three other additivityconjectures; namely, the additivity of entanglement of formation, the strong super-additivity of entanglement offormation, and the additivity of the minimum entropy output of a quantum channel.In [14] Hastings gave a counterexample to the last of the above additivity conjectures and thereby proved that theyare all false. Hastings counterexamples (see also [5]) exist in very high dimensions and an estimate of these extremelyhigh dimensions can be found in [11]. Earlier, in [26], Shor pointed out that if the additivity conjectures were true,perhaps the ﬁrst step towards proving them would be to prove local additivity. We show here that this local additivityconjecture is indeed true, despite the existence of counterexamples to the original additivity conjectures. Our resultstherefore demonstrate that the counterexamples to the original additivity conjecture exhibit a global eﬀect of quantumchannels.As we pointed out in Appendix B of [10], both the local and global additivity conjectures are false over the realnumbers. This in turn implies that a straightforward argument involving just directional derivatives could not providea proof of local additivity in the general complex case. Hence, to show local additivity we use strongly the complexstructure.In quantum information theory, quantum channels are the natural generalizations of stochastic communicationchannels in classical information theory. They are described in terms of completely-positive trace preserving linearmaps (CPT maps). A CPT map N : H d in → H d out takes the set of d in × d in Hermitian matrices H d in to a subset ofthe set of all d out × d out Hermitian matrices H d out . Any ﬁnite dimensional quantum channel can be characterized interms of a unitary embedding followed by a partial trace (the Stinespring dilation theorem): for any CPT map N there exists an ancillary space of Hermitian matrices H E such that N ( ρ ) = Tr E (cid:2) U ( ρ ⊗ | i E h | ) U † (cid:3) where ρ ∈ H d in and U is a unitary matrix mapping states | ψ i| i E with | ψ i ∈ H d in to H d out ⊗ H E .The minimum entropy output of a quantum channel N is deﬁned by S min ( N ) ≡ min ρ ∈ H d in , + , S ( N ( ρ )) , ∗ Electronic address: [email protected] † Electronic address: [email protected] where H d in , + , ⊂ H d in is the set of all d in × d in positive semi-deﬁnite matrices with trace one (i.e. density matrices),and S ( ρ ) = − Tr ( ρ log ρ ) is the von-Neumann entropy. Since the von-Neumann entropy is concave it follows that theminimization can be taken over all rank one matrices ρ = | ψ ih ψ | in H d in , + , .For any such rank one density matrix ρ we can deﬁne a bipartite pure state | Ψ i = U | ψ i| i E in the bipartite subspace K ≡ {| Ψ i (cid:12)(cid:12) | ψ i ∈ H d in } . We therefore ﬁnd that the minimum entropy output of the channel N can be expressed interms of the entanglement of the bipartite subspace K deﬁned by E ( K ) ≡ min | φ i∈K , k φ k =1 E ( | φ i ) , where E ( | φ i ) ≡ S (Tr E ( | φ ih φ | )) is the entropy of entanglement. In [13] it was pointed out that E ( K ) = 0 unlessdim K ≤ ( d out − H E − d out + dim H E − K ⊂ C n ⊗ C m and K ⊂ C n ⊗ C m such that E ( K ⊗ K ) < E ( K ) + E ( K ) . In what follows we will prove the local additivity of entanglement of subspaces, which is equivalent to the localadditivity of the minimum entropy output.The rest of this paper is organized as follows. In section II we ﬁnd and simplify the ﬁrst and second directionalderivatives of the von-Neumann entropy of entanglement. In section III we prove our main result of local additivitywhich is stated in Theorem 5 for the non-singular case. In section IV we prove Theorem 5 for the singular case. Weend with a discussion in section V.

II. LOCAL MINIMUM

Let

K ⊂ C n ⊗ C m be a subspace of bipartite entangled states. Since the bipartite Hilbert space C n ⊗ C m is isomorphicto the Hilbert space of all n × m complex matrices C n × m , we can view any bipartite state | ψ i AB = P i,j x ij | i i| j i in K as an n × m matrix x . The reduced density matrix of | ψ i AB is then given by ρ r ≡ Tr B | ψ i AB h ψ | = xx ∗ , and theentropy of entanglement of | ψ i AB is given by E ( x ) ≡ − Tr ( xx ∗ log xx ∗ ) . (1)In our notations, instead of using a dagger, we use x ∗ to denote the hermitian conjugate of the matrix x .If x ∈ K is a local minimum of E in K , then there exists a neighbourhood of x in K such that x is the minimum inthat neighbourhood. Any state in the neighbourhood of x can be written as ax + by , where a, b ∈ C and y ∈ K is anorthogonal matrix to x ; i.e. Tr ( xy ∗ ) = 0. We also assume that the state is normalized so that | a | + | b | = 1. Now,since the function E ( x ) is independent on global phase, we can assume that a is a positive real number. We can alsoassume that b is real since we can absorb its phase into y (adding a phase to y will not change its orthogonality to x ). Thus, any normalized state in the neighbourhood of x can be written as x + ty √ t with Tr ( xy ∗ ) = 0 , where t ≡ b/a is a small real number and y is normalized (i.e. Tr ( yy ∗ ) = 1). Deﬁnition 1.(a)

A matrix x ∈ K is said to be a critical point of E ( x ) in K if D y E ( x ) ≡ ddt E (cid:18) x + ty √ t (cid:19) (cid:12)(cid:12)(cid:12) t =0 = 0 ∀ y ∈ x ⊥ where the notation D y E ( x ) indicate that we are taking the directional derivative of E in the direction of y , and x ⊥ ⊂ K denotes the subspace of all the matrices y in K for which Tr ( xy ∗ ) = 0. (b) A matrix x ∈ K is said to be a non-degenerate local minimum of E ( x ) in K if it is critical and D y E ( x ) ≡ d dt E (cid:18) x + ty √ t (cid:19) (cid:12)(cid:12)(cid:12) t =0 > ∀ y ∈ x ⊥ , were we also allow D y E ( x ) = + ∞ . Moreover, a critical x ∈ K is said to be degenerate if there exists at least onedirection y such that D y E ( x ) = 0.In order to prove local additivity we will need to calculate the above directional derivatives. This can be doneby expressing the logarithm as an integral [28] (see also [22, 23]). However, in this technique all the quantities areexpressed by integrals, and some of these integral expressions do not lead to additivity in a transparent way, as thedivided diﬀerence method does. We therefore apply below a new technique that is based on the divided diﬀerence [16,(6.1.17)]. One of the advantages of the divided diﬀerence approach, is that it enables one to calculate and express alldirectional derivatives explicitly with no integrals involved. Before introducing the divided diﬀerence approach, wewill ﬁrst discuss brieﬂy the aﬃne parametrization.In our calculations we will assume that x is diagonal (or equivalently, the bipartite state x represents is given in itsSchmidt form). This assumption follows from the singular value decomposition theorem; namely, we can always ﬁndunitary matrices u ∈ C n × n and v ∈ C m × m such that uxv is an n × m diagonal matrix with non-negative real numbers(the singular values of x ) on the diagonal. Since E ( x ) = E ( uxv ) we can assume without loss of generality that x is adiagonal matrix. A. The Aﬃne Parametrization

Up to second order in t we have ρ ( t ) ≡ ( x + ty )( x ∗ + ty ∗ )1 + t = (cid:0) xx ∗ + t ( xy ∗ + yx ∗ ) + t yy ∗ (cid:1) (1 − t )= xx ∗ + t ( xy ∗ + yx ∗ ) + t ( yy ∗ − xx ∗ ) = ρ + tγ + t γ , (2)where ρ = xx ∗ , γ ≡ xy ∗ + yx ∗ , and γ ≡ yy ∗ − xx ∗ . Note that Tr ρ = 1 and Tr γ = Tr γ = 0, where without lossof generality we assumed Tr ( yy ∗ ) = 1 since we can absorb the normalization factor of y into t . We are interested intaking the ﬁrst and second derivative of E (cid:18) x + ty √ t (cid:19) = S ( ρ ( t )) = S ( ρ + tγ + t γ ) . In this section we assume that ρ = xx ∗ is an n × n non-singular matrix. Denote σ ( t ) ≡ ρ + tγ . In the next proposition we relate S ( ρ ( t )) with S ( σ ( t )). Proposition 1.

Let ρ ( t ) , σ ( t ) , ρ, γ and γ as above. Then S ( ρ ( t )) = S ( σ ( t )) − t Tr [ γ log ρ ] + O ( t ) (3) Proof.

Since ρ is non-singular, also ρ ( t ) and σ ( t ) are non-singular for small enough t . Thus, I − ρ ( t ) < I for small t .Using the Taylor expansion log ρ ( t ) = log[ I − ( I − ρ ( t ))] = − ∞ X n =1 (cid:0) I − σ ( t ) − t γ (cid:1) n n , we get − Tr [ ρ log ρ ( t )] = ∞ X n =1 n Tr h ρ (cid:0) I − σ ( t ) − t γ (cid:1) n i . Expanding the term in the trace above up to second order in t givesTr h ρ (cid:0) I − σ ( t ) − t γ (cid:1) n i = Tr [ ρ ( I − σ ( t )) n ] + t n Tr (cid:2) ρ ( I − ρ ) n − γ (cid:3) + O ( t ) . We therefore have − Tr [ ρ log ρ ( t )] = − Tr [ ρ log σ ( t )] + t ∞ X n =1 Tr (cid:2) ρ ( I − ρ ) n − γ (cid:3) + O ( t ) . Since ρ − = P ∞ n =1 ( I − ρ ) n − and Tr ( γ ) = 0 we concludeTr [ ρ log ρ ( t )] = Tr [ ρ log σ ( t )] + O ( t ) . Thus, Tr [ ρ ( t ) log ρ ( t )] = Tr [ σ ( t ) log σ ( t )] + t Tr [ γ log ρ ] + O ( t ) . This completes the proof.This simple relation between S ( ρ ( t )) and S ( σ ( t )) is very useful since now we can focus on the Taylor expansion ofthe simpler function S ( σ ( t )). B. The method of divided diﬀerence

To calculate the ﬁrst and second derivatives of S ( σ ( t ), we ﬁrst evaluate the Taylor expansion of a complex valuedfunction f : C → C , which we later assume can be extended to act on n × n complex matrices.We will make use of the notion of the divided diﬀerence for f , which we refer the reader to [16, (6.1.17)] for moredetails. The divided diﬀerence for a function f : C → C , given a sequence of distinct complex points, α i ∈ C , i =1 , . . . , n , is deﬁned for i = 0 , △ f ( α ) := f ( α ) (4) △ f ( α , α ) ≡ △ f ( α , α ) := f ( α ) − f ( α ) α − α , (5)and deﬁned inductively by △ i f ( α , . . . , α i , α i +1 ) = △ i − f ( α , . . . , α i − , α i ) − △ i − f ( α , . . . , α i − , α i +1 ) α i − α i +1 , (6)for i = 2 , , . . . , n . It is well known that △ i f ( α , . . . , α i , α i +1 ) is a symmetric function in α , . . . , α i +1 , e.g. [16, p’393].For points that are not distinct it is deﬁned by an appropriate limit. For example, for x = y we have △ f ( x, x ) = f ′ ( x ) △ f ( x, x, y ) = f ′ ( x )( x − y ) − f ( x ) − f ( y )( x − y ) (7) △ f ( x, x, x ) = 12 f ′′ ( x ) . (8)Note that (8) can be obtained from (7) by setting h ≡ y − x → f ( y ) = f ( x + h ) = f ( x ) + hf ′ ( x ) + h f ′′ ( x ) + O ( h ). Theorem 2.

Let A = diag( α , . . . , α n ) ∈ C n × n be a diagonal square matrix, and B = [ b ij ] ∈ C n × n be a complexsquare matrix. Assume that f ( x ) : C → C satisfy one of the following conditions:1. f ( x ) is an analytic function in some domain D ⊂ C which contains α , . . . , α n , and can be approximateduniformly in D by polynomials.2. α , . . . , α n are in a real open interval ( a, b ) and f has two continuous derivatives in ( a, b ) .Then f ( A + tB ) = f ( A ) + tL A ( B ) + t Q A ( B ) + O ( t ) (9) Here L A : C n × n → C n × n is a linear operator, and Q B : C n × n → C n × n is a quadratic homogeneous noncommutativepolynomial in B . For i, j = 1 , . . . , n we have [ L A ( B )] ij = △ f ( α i , α j ) b ij = f ( α i ) − f ( α j ) α i − α j b ij (10)[ Q A ( B )] ij = n X k =1 △ f ( α i , α k , α j ) b ik b kj . (11) In particular

Tr ( L A ( B )) = n X j =1 f ′ ( α j ) b jj (12)Tr ( Q A ( B )) = n X i,j =1 f ′ ( α i ) − f ′ ( α j )2( α i − α j ) b ij b ji . (13) Remark.

The expansion above can be naturally generalized to higher than the second order, but for the purpose ofthis article, we will only need to expand f ( A + tB ) up to the second order in t . Moreover, for our purposes we willonly need to assume that the α i are real and the condition 2 on f holds. We kept condition 1 on f in the theoremjust to be a bit more general.Note that in all the expressions above, one must identify α i = α j with the limit α j → α i . For example, the term f ′ ( α i ) − f ′ ( α j )2( α i − α j ) = 12 f ′′ ( α i ) for α i = α j . In particular, note that if B is diagonal, Eq. (13) gives the known second order term of the Taylor expansion. Proof.

From the conditions on f , it is enough to prove the theorem assuming f is a polynomial. By linearity, it isenough to prove all the claims for f ( x ) = x m . Clearly, in the expension( A + tB ) m = A m + tL A ( B ) + t Q A ( B ) + O ( t )we must have L A ( B ) = X ≤ p,q, p + q = m − A p BA q , (14) Q A ( B ) = X ≤ p,q,r, p + q + r = m − A p BA q BA r , (15)where we expanded ( A + tB ) m up to ﬁrst and second order in t . All that is left to show is that these matrices coincidewith the ones deﬁned in Eqs. (10,11).Indeed, since A is diagonal, the matrix elements of the L A ( B ) in Eq.(14) are given by[ L A ( B )] ij = X ≤ p,q, p + q = m − α pi α qj b ij = α mi − α mj α i − α j b ij , which is equal to the exact same matrix elements given in Eq.(10).In the same way, since A is diagonal, observe that the matrix elements of the Q A ( B ) in Eq.(15) are given by[ Q A ( B )] ij = n X k =1 X ≤ p,q,r, p + q + r = m − α pi α qk α rj b ik b kj . On the other hand, a straightforward calculation gives for f ( x ) = x m △ x m ( α i , α k , α j ) = X ≤ p,q,r, p + q + r = m − α pi α qk α rj . Thus, the expressions in Eq. (11) and Eq. (15) for Q A ( B ) are the same.We now prove Eq. (13). Observe ﬁrst that Eq. (11) yieldsTr ( Q A ( B )) = n X i,j =1 △ f ( α i , α i , α j ) b ij b ji , (16)where we have used the symmetry △ f ( α i , α j , α i ) = △ f ( α i , α i , α j ). Now, since b ij b ji is symmetric under an exchangebetween i and j , we can replace △ f ( α i , α i , α j ) in Eq. (16) with12 (cid:2) △ f ( α i , α i , α j ) + △ f ( α j , α j , α i ) (cid:3) = 12 △ f ′ ( α i , α j ) , where for the last equality we used Eq. (7). This completes the proof.We now use the above theorem for the Taylor expansion of the function S ( σ ( t )) in the neighbourhood of t = 0. C. The ﬁrst and second derivatives of E ( x ) We ﬁrst assume that ρ is non singular. The case where ρ is singular will be treated separately in section IV. Theorem 3.

Let ρ = diag { p , . . . , p n } with p j > for j = 1 , . . . , n . For this case, we get the following expressions: D y E ( x ) ≡ ddt S ( ρ ( t )) (cid:12)(cid:12)(cid:12) t =0 = − Tr ( γ log ρ ) D y E ( x ) ≡ d dt S ( ρ ( t )) (cid:12)(cid:12)(cid:12) t =0 = −  Tr [ γ log ρ ] + X j,k log p j − log p k p j − p k ) | ( γ ) jk |  . (17) Remark.

The condition for x ∈ K to be critical is D y E ( x ) = 0 which is equivalent to Tr [( xy ∗ + yx ∗ ) log xx ∗ ] = 0 forall y ∈ K such that Tr ( xy ∗ ) = 0. Moreover, if x is critical then we also have D iy E ( x ) = 0 for all y ∈ x ⊥ ⊂ K . Hence,if x is critical we must have Tr ( xy ∗ log xx ∗ ) = 0 (18)for all y ∈ x ⊥ ⊂ K . Proof.

Theorem 2 implies that S ( ρ + tγ ) = S ( ρ ) + tL ρ ( γ ) + t Q ρ ( γ ) + O ( t ) . where L ρ and Q ρ are the following linear and quadratic forms L ρ ( γ ) ≡ n X i =1 g ′ ( p i )( γ ) ii Q ρ ( γ ) ≡ n X i =1 n X j =1 g ′ ( p i ) − g ′ ( p j )2( p i − p j ) ( γ ) ij ( γ ) ji , and g ( t ) ≡ − t log t . Note that the expressions for L ρ ( γ ) and Q ρ ( γ ) above are the traces of the analogous expressionsgiven in theorem 2, since S ( ρ ) is deﬁned as the trace of the matrix g ( ρ ) = − ρ log ρ .Since γ is hermitian with zero trace, and g ′ ( t ) = − − log t , we get L ρ ( γ ) = − Tr ( γ log ρ ) Q ρ ( γ ) = − X j,k log p j − log p k p j − p k ) | ( γ ) j,k | . (19)Combining this with proposition 1 proves the theorem.In the following lemma, we rewrite the expression in Eq. (17), which will be useful for the proof of local additivity. Lemma 4.

Denote w = ( y + y ∗ ) / , and z = i ( y − y ∗ ) / . Denote also r jk = p p j /p k , where { p j } ni =1 are the eigenvaluesof ρ = xx ∗ . Then, the expression in Eq.(17) for D y E ( x ) can be rewritten as D y E ( x ) = − E ( x ) − Tr [( yy ∗ + y ∗ y ) log ρ ] − X j,k (cid:0) | w jk | Φ( r jk ) + | z jk | Φ( − r jk ) (cid:1) , (20) where Φ( r ) ≡ r + 1 r − r , r ∈ R , (21) with the identiﬁcation Φ(1) = 2 . Proof.

The expression in Eq.(17) for D y E ( x ) involves the terms | ( γ ) jk | . The matrix γ = xy ∗ + yx ∗ = xy ∗ + yx where x = diag {√ p , . . . , √ p n } . Note that y ∗ = w + iz and y = w − iz , where w and z are the Hermitian matricesdeﬁned in the lemma. Thus, γ = xw + wx + i ( xz − zx ) . In terms of the matrix elements w jk and z jk of w and z , we have( γ ) jk = ( √ p k + √ p j ) w jk + i ( √ p j − √ p k ) z jk . The square of this expression can be written as | ( γ ) jk | = ( √ p j + √ p k ) | w jk | + ( √ p j − √ p k ) | z jk | + i ( p j − p k )( w ∗ jk z jk − w jk z ∗ jk )Moreover, expressing back w and z interms of y gives i ( w ∗ jk z jk − w jk z ∗ jk ) = ( | y kj | − | y jk | ) /

2. We can therefore write | ( γ ) jk | = ( √ p j + √ p k ) | w jk | + ( √ p j − √ p k ) | z jk | + 12 ( p j − p k )( | y kj | − | y jk | ) . Substituting this expression, and the value for γ = yy ∗ − xx ∗ , into Eq.(17) gives − D y E ( x ) = E ( x )+Tr [ yy ∗ log ρ ]+ X j,k log (cid:18) p j p k (cid:19) ( ( √ p j + √ p k ) p j − p k ) | w jk | + ( √ p j − √ p k ) p j − p k ) | z jk | + 14 ( | y kj | − | y jk | ) ) . Note ﬁrst that the term 14 X j,k log (cid:18) p j p k (cid:19) ( | y kj | − | y jk | ) = 12 Tr [( y ∗ y − yy ∗ ) log ρ ] . Moreover, denoting r jk = p p j /p k we get( √ p j + √ p k ) p j − p k ) log (cid:18) p j p k (cid:19) = ( r jk + 1) r jk −

1) log r jk = 12 r jk + 1 r jk − r jk ≡ Φ( r jk )Similarly, ( √ p j + √ p k ) p j − p k ) log (cid:18) p j p k (cid:19) = 12 r jk − r jk + 1 log r jk = Φ( − r jk )With these notations we get − D y E ( x ) = E ( x ) + 12 Tr [( yy ∗ + y ∗ y ) log ρ ] + X j,k (cid:0) | w jk | Φ( r jk ) + | z jk | Φ( − r jk ) (cid:1) This complete the proof.In the rest of the paper we will use the notations M x ( y ) ≡ n X j,k =1 (cid:0) | w jk | Φ( r jk ) + | z jk | Φ( − r jk ) (cid:1) = Tr (cid:2) w Φ + ρ ( w ) + z Φ − ρ ( z ) (cid:3) Γ x ( y ) ≡ − E ( x ) −

12 Tr [( y ∗ y + yy ∗ ) log xx ∗ ] . (22)where Φ ± ρ are self-adjoint linear operators deﬁning in terms of the Hadamard product between the input matrix andthe matrix with elements Φ( ± r jk ). That is, (cid:2) Φ ± ρ ( w ) (cid:3) jk = Φ( ± r jk ) w jk . With these notations we get that D y E ( x ) > M x ( y ) < Γ x ( y ) . (23) D. The complex structure and additional necessary condition If D y E ( x ) > y orthogonal to x , then D iy E ( x ) is also positive since iy is orthogonal to x . That is, M x ( iy ) < Γ x ( iy ) = Γ x ( y ) . (24)Therefore, we get from Eqs. (23,24) that if x is a non-degenerate local minimum then12 ( M x ( y ) + M x ( iy )) = X j,k | y jk | ˜Φ( r jk ) < Γ x ( y ) , (25)where ˜Φ( r ) := 12 (Φ( r ) + Φ( − r )) = 12 r + 1 r − r , (26)with the identiﬁcation ˜Φ( ±

1) = 1. Let ˜Φ ρ be a self-adjoint linear operator deﬁning in terms of the Hadamard productbetween the input matrix and the matrix with components ˜Φ( r jk ). With this notation the necessary condition givenin Eq.(25) can be written as Tr h y ∗ ˜Φ ρ ( y ) i < Γ x ( y ) . (27)A simple analysis of the function ˜Φ shows that ˜Φ( r ) ≥ r = ±

1. Thus, Eq.(27) alsoimplies the following necessary condition on a local minimum:1 ≤ Γ x ( y ) , which can be written as E ( y ) − E ( x ) ≥ −

12 [ S ( yy ∗ k xx ∗ ) + S ( y ∗ y k xx ∗ )] (28)where S ( yy ∗ k xx ∗ ) ≡ Tr ( yy ∗ log yy ∗ ) − Tr ( yy ∗ log xx ∗ )is the relative entropy. Since S ( yy ∗ k xx ∗ ) ≥ yy ∗ = xx ∗ , we always have S ( yy ∗ k xx ∗ ) > xy ∗ ) = 0. Nevertheless, it is possible that Tr ( xy ∗ ) = 0 and yet S ( yy ∗ k xx ∗ ) ≤

1. In such cases Eq. (28) gives E ( y ) ≥ E ( x ) which is consistent with the fact that x is a local min. III. LOCAL ADDITIVITY

We now state the main result of this paper.

Theorem 5.

Let x (1) and x (2) be two non-degenerate local minima of E ( x ) in K (1) ⊂ C n × m and K (2) ⊂ C n × m ,respectively. Then, x (1) ⊗ x (2) is a non-degenerate local minimum of E ( x ) in K (1) ⊗K (2) . Moreover, if x (1) is degeneratelocal minimum and x (2) is non-degenerate local minimum, then x (1) ⊗ x (2) is a degenerate local minimum. The theorem above implies, in particular, that if x (1) and x (2) are critical points of E ( x ) in K (1) and K (2) , respec-tively, then, x (1) ⊗ x (2) is a critical point of E ( x ) in K (1) ⊗ K (2) . This fact was observed in [6] (see also [24]), andlater was stated in [10]. It follows from the linearity in y of the condition given in Eq. (18) for critical points. Moreprecisely, if x (1) and x (2) are critical points, then x (1) ⊗ x (2) is also critical if (see Eq. (18))0 = Tr h(cid:16) x (1) ⊗ x (2) (cid:17) y ∗ log (cid:16) x (1) x (1) ∗ ⊗ x (2) x (2) ∗ (cid:17)i =Tr h x (1) y (1) ∗ log( x (1) x (1) ∗ ) i + Tr h x (2) y (2) ∗ log( x (2) x (2) ∗ ) i for all y ∈ ( x (1) ⊗ x (2) ) ⊥ , where y (1) ∗ ≡ Tr [( I ⊗ x (2) ) y ∗ ] and y (2) ∗ ≡ Tr [( x (1) ⊗ I ) y ∗ ]. In the equation above weused the additivity of the logarithm function under tensor products. Moreover, since y ∈ ( x (1) ⊗ x (2) ) ⊥ , we also have y (1) ∈ ( x (1) ) ⊥ and y (2) ∈ ( x (1) ) ⊥ . Thus, if x (1) and x (2) are critical points, x (1) ⊗ x (2) is also critical [29].In the following subsection we provide one of the main ingredients for the local additivity of the von-Neumannentropy output of a quantum channel. A. The Subadditivity of Φ ± ρ Lemma 6.

Let Φ , ˜Φ : R → R be deﬁned as in Eq.(21) and Eq. (26) , respectively. Then, for any r, s ∈ R the followingholds: Φ( rs ) ≤ ˜Φ( r ) + ˜Φ( s ) (29) with equality if and only if r = s . In the operator language of Eqs. (22,27), the inequality (29) can be expressed as Φ ± ρ A ⊗ ρ B ≤ ˜Φ ρ A ⊗ I B + ˜Φ I A ⊗ ρ B = ˜Φ ρ A ⊗ I B + I A ⊗ ˜Φ ρ B . (30) where two operators satisﬁes O ≤ O if and only if Tr [ y ∗ O y ] ≤ Tr [ y ∗ O y ] for all y .Proof. We need to prove that rs + 1 rs − r s ) ≤ r + 1 r − r + s + 1 s − s This inequality is equivalent to (cid:18) r + 1 r − − rs + 1 rs − (cid:19) log r + (cid:18) s + 1 s − − rs + 1 rs − (cid:19) log s ≥ s − rrs − f ( r ) − f ( s )) ≥ , (31)where f ( r ) ≡ rr − r . That is, we need to prove that f ( r ) ≥ f ( s ) if ( s − r ) / ( rs − > f ( r ) ≤ f ( s ) if ( s − r ) / ( rs − <

0. Fromsymmetry under exchange of r and s , both cases are equivalent, and therefore without lose of generality we assume( s − r ) / ( rs − > (a) s > r and rs > (b) s < r and rs <

1. A simple analysis ofthe function f ( r ) shows that f is odd, and it is monotonically increasing for − ≤ r ≤ | r | >

1. Moreover, note that f (1 /r ) = f ( r ).Consider case (a) : If s > r > f ( r ) ≥ f ( s ) since f is monotonically decreasing in this region. In the sameway if − > s > r then f ( r ) ≥ f ( s ). Another possibility in this case is that 0 < r < < /r < s . But sinceboth r and 1 /s are positive and smaller than 1, we get f ( r ) ≥ f (1 /s ) = f ( s ), where we have used the fact that f ( r ) is monotonically increasing for | r | ≤

1. The last possibility in this case is that 1 /r > s > − > r . For thislast possibility both s and 1 /r are negative numbers bigger than − f is monotonically increasing.Thus, f ( r ) = f (1 /r ) ≥ f ( s ).Consider case (b) : First note that if s < < r then f ( s ) < < f ( r ), and if − < s < r < f ( r ) ≥ f ( s )since f is monotonically increasing in this region. Another possibility in this case is that s < < r < /s . But sinceboth r and 1 /s are positive and bigger than 1, we get f ( r ) ≥ f (1 /s ) = f ( s ), where we have used the fact that f ( r )is monotonically decreasing for r ≥

1. Finally, the last possibility in this case is that 1 /r < s < − < r . For this lastpossibility both s and 1 /r are negative numbers smaller than − f is monotonically decreasing.Thus, f ( r ) = f (1 /r ) ≥ f ( s ).In order to prove the equality conditions, we need to show that the expression in Eq.(31) equals zero if andonly if s = r . Before proceeding to prove that, we check the case r = 1 /s . In this case, Φ( rs ) = Φ(1) = 2 and˜Φ( s ) = ˜Φ(1 /r ) = ˜Φ( r ). That is, if r = 1 /s then the equality in Eq. (29) holds if and only if ˜Φ( r ) = 1. As pointedout earlier, ˜Φ( r ) = 1 if and only if r = ±

1. We therefore conclude that if r = 1 /s than the equality in Eq. (29) holdsif and only if r = s = ±

1. Assume now rs = 1. In this case, the expression in Eq.(31) equals zero if and only if f ( r ) = f ( s ). However, a simple analysis of the function f ( r ) implies that f ( r ) = f ( s ) if and only if r = s or r = 1 /s .Since we assumed rs = 1, we get that r = s . This completes the proof.0 B. Proof of Theorem 5

We can assume without loss of generality that n = m , n = m . This can always be done by adding zero rowsor columns. However, in this part of the proof we also assume that both x (1) and x (2) are non-singular. The singularcase is treated separately in section IV. From the singular valued decomposition (see the argument below deﬁnition 1)we can assume without loss of generality that x (1) = diag {√ p , . . . , √ p n } and x (2) = diag {√ q , . . . , √ q n } , where p i and q j are positive and P n i =1 p i = P n j =1 q j = 1.We ﬁrst assume that both x (1) and x (2) are non-degenerate local minima. We need to show that D y E ( x ) > y ∈ x ⊥ , where x ≡ x (1) ⊗ x (2) . The most general y ∈ (cid:0) x (1) ⊗ x (2) (cid:1) ⊥ can be written as y = c x (1) ⊗ y (2) + c y (1) ⊗ x (2) + c y ′ , (32)where y (1) ∈ ( x (1) ) ⊥ , y (2) ∈ ( x (2) ) ⊥ , and y ′ ∈ (cid:0) x (1) (cid:1) ⊥ ⊗ (cid:0) x (2) (cid:1) ⊥ are all normalized. The numbers c j can be chosento be real because we can absorb their phases in y (1) , y (2) , and y ′ . They also satisfy c + c + c = 1, so that y isnormalized.Consider ﬁrst the simple case where y = x (1) ⊗ y (2) . In this case, E (cid:18) x + ty √ t (cid:19) = E (cid:18) x (1) ⊗ x (2) + ty (2) √ t (cid:19) = E (cid:16) x (1) (cid:17) + E (cid:18) x (2) + ty (2) √ t (cid:19) . (33)Since x (2) is a non-degenerate local minimum, we must have D y E ( x ) >

0. The case y = y (1) ⊗ x (2) is similar.Consider now the case in which y ∈ (cid:0) x (1) (cid:1) ⊥ ⊗ (cid:0) x (2) (cid:1) ⊥ . Using its Schmidt decomposition, we can write it as y = X l c l y (1) l ⊗ y (2) l , (34)where Tr [ y (1) l y (1) ∗ l ′ ] = Tr [ y (2) l y (2) ∗ l ′ ] = δ ll ′ , (35)and c l are real numbers such that P l c l = 1.By deﬁnition we have M x ( y ) = Tr h w AB Φ + ρ A ⊗ ρ B ( w AB ) + z AB Φ − ρ A ⊗ ρ B ( z AB ) i , (36)where w AB = ( y ∗ + y ) / z AB = i ( y ∗ − y ) / ρ A ≡ x (1) x (1) ∗ and ρ B ≡ x (2) x (2) ∗ .Applying lemma 6 both to Φ ± ρ A ⊗ ρ B gives: M x ( y ) ≤ Tr h w AB ˜Φ ρ A ⊗ I B ( w AB ) + w AB ˜Φ I A ⊗ ρ B ( w AB ) + z AB ˜Φ ρ A ⊗ I B ( z AB ) + z AB ˜Φ I A ⊗ ρ B ( z AB ) i = Tr h y ∗ ˜Φ ρ A ⊗ I B ( y ) + y ∗ ˜Φ I A ⊗ ρ B ( y ) i . where I A and I B are the identity matrices in the respective spaces, and in the last equality we have used the deﬁnitions w AB = ( y ∗ + y ) / z AB = i ( y ∗ − y ) /

2. Now, but substituting (34) into the above equation we get M x ( y ) ≤ X l c l Tr h y (1) ∗ l ˜Φ ρ A ( y (1) l ) + y (2) ∗ l ˜Φ ρ B ( y (2) l ) i , where we have used the orthogonality relations in Eq. (35). Combining this with Eq. (27) gives M x ( y ) < X l c l (cid:16) Γ x (1) ( y (1) l ) + Γ x (2) ( y (2) l ) (cid:17) = Γ x ( y ) (37)where the last equality can be veriﬁed from the orthogonality relations given in Eq. (35), and the fact thatlog xx ∗ = log x (1) x (1) ∗ ⊗ I B + I A ⊗ log x (2) x (2) ∗ . (38)1This completes the proof for y ∈ (cid:0) x (1) (cid:1) ⊥ ⊗ (cid:0) x (2) (cid:1) ⊥ .Consider now the most general case where y ∈ x ⊥ has the form given in Eq. (32). Denote w AB = 12 ( y ∗ + y ) = c x (1) ⊗ w (2) + c w (1) ⊗ x (2) + c w ′ , (39)where w ′ = ( y ′∗ + y ′ ) / (cid:16) x (1) ∗ ⊗ y (2) ∗ + x (1) ⊗ y (2) (cid:17) = x (1) ⊗ (cid:16) y (2) ∗ + y (2) (cid:17) ≡ x (1) ⊗ w (2) (cid:16) y (1) ∗ ⊗ x (2) ∗ + y (1) ⊗ x (2) (cid:17) = 12 (cid:16) y (1) ∗ + y (1) (cid:17) ⊗ x (2) ≡ w (1) ⊗ x (2) . In the above equation we used the fact that x (1) and x (2) are square diagonal matrices with their singular values onthe diagonal. We would like to substitute the expression in Eq. (39) for w AB , into the expression for M x ( y ) given inEq. (36). By doing that we will get expressions with several cross terms. We argue that these cross terms vanish. Tosee that consider for example the cross term c c Tr h x (1) ⊗ w (2) Φ + ρ A ⊗ ρ B ( w ′ ) i , and recall that ρ A ≡ x (1) x (1) ∗ and ρ B ≡ x (2) x (2) ∗ . Since Φ + ρ A ⊗ ρ B is self-adjoint, the above expression can be writtenas c c Tr h x (1) ⊗ w (2) Φ + ρ A ⊗ ρ B ( w ′ ) i = c c Tr h w ′ Φ + ρ A ⊗ ρ B ( x (1) ⊗ w (2) ) i = c c Tr h w ′ (cid:16) x (1) ⊗ Φ + ρ B ( w (2) ) (cid:17)i where in the last equality we used the identity Φ + ρ A ⊗ ρ B ( x (1) ⊗ w (2) ) = x (1) ⊗ Φ + ρ B ( w (2) ). This identity follows from thedeﬁnition of Φ + ρ , when working with a basis in which x (1) is diagonal. Now, since the partial trace Tr [ w ′ ( x (1) ⊗ B )] = 0for all matrices B , we have c c Tr h x (1) ⊗ w (2) Φ + ρ A ⊗ ρ B ( w ′ ) i = 0 . In the same way, we see that all the other cross terms vanish. Moreover, denote z AB = i y ∗ − y ) = c x (1) ⊗ z (2) + c z (1) ⊗ x (2) + c z ′ , where z (1) , z (2) , and z ′ are deﬁned similarly to w (1) , w (2) , and w ′ . Substituting this expression for z AB in Eq. (36)will also lead to vanishing cross terms. To summarize, by substituting the above expressions for z AB and w AB inEq. (36) we get M x ( y ) = c M x ( x (1) ⊗ y (2) ) + c M x ( y (1) ⊗ x (2) ) + c M x ( y ′ )However, since we already proved that x is a non-degenerate local minimum in the directions x (1) ⊗ y (2) , y (1) ⊗ x (2) ,and y ′ , we get M x ( y ) < c Γ x ( x (1) ⊗ y (2) ) + c Γ x ( y (1) ⊗ x (2) ) + c Γ x ( y ′ ) (40)Now, note the orthogonality relations in the partial traces: Tr [( x (1) ⊗ y (2) )( y ′ ) ∗ ] = Tr [( x (1) ⊗ y (2) )( y ′ ) ∗ ] = 0 andTr [( y (1) ⊗ x (2) )( y ′ ) ∗ ] = Tr [( y (1) ⊗ x (2) )( y ′ ) ∗ ] = 0. With these relations and from Eq. (38) we get that the expressionin the RHS of Eq.(40) is equal to Γ x ( y ). This completes the proof of the main part of the theorem.To prove the second part of the theorem, assume that x (1) is degenerate local minimum and x (2) is a non-degeneratelocal minimum. Following the exact same lines of the proof above we get that M x ( y ′ ) < Γ x ( y ′ ) for y ′ ∈ (cid:0) x (1) (cid:1) ⊥ ⊗ (cid:0) x (2) (cid:1) ⊥ . This is clear from Eq. (37) and the one above it, where we use the fact thatTr h y (2) ∗ l ˜Φ ρ B ( y (2) l ) i < Γ x (2) ( y (2) l )since x (2) is a non-degenerate local minimum. Similarly, if y = x (1) ⊗ y (2) we get M x ( y ) < Γ x ( y ). The only y ∈ x ⊥ for which it is possible to have M x ( y ) = Γ x ( y ) is y = y (1) ⊗ x (2) . However, in this case E (cid:18) x + ty √ t (cid:19) = E (cid:18) x (1) + ty (1) √ t (cid:19) + E (cid:16) x (2) (cid:17) , (41)so x is a local minimum in this direction as well. Hence, x is a degenerate local minimum. This completes the proofof the second part of the theorem.2 IV. THE SINGULAR CASE

In the previous section, we were able to derive the ﬁrst and second directional derivatives D y E ( x ) and D y E ( x )assuming x is non-singular. In this section we consider the case where x is singular. While the expression for D y E ( x )is the same as in the previous section, the expression for the second derivative is not the same for the singular case.In particular, in the singular case it is possible that D y E ( x ) ≡ d dt S ( ρ ( t )) (cid:12)(cid:12) t =0 diverge. Nevertheless, we will see inthis section that even if x is singular, E ( x ) is additive.For simplicity of the exposition, we will consider here subspaces K ⊂ C n ⊗ C m , where n = m , since we can alwaysembed K in C max { n,m } ⊗ C max { n,m } . The following theorem provides the criterion for the divergence of the secondderivative. Theorem 7.

Let x, y ∈ K ⊂ C n × n , Tr xx ∗ = Tr yy ∗ = 1 and Tr ( xy ∗ ) = 0 . Change the standard orthonormal base in C n to a new orthonormal base such that x and y have the forms x = (cid:20) x r,n − r n − r,r n − r,n − r (cid:21) and y = (cid:20) y y y y (cid:21) , (42) where r is the rank of x , i,j are i × j zero matrices, and x , y ∈ C r × r . Then S ( ρ ( t )) = f ( t ) − ( K + tg ( t )) t log t , K = Tr ( y y ∗ ) , (43) where f ( t ) , g ( t ) are analytic functions in a neighbourhood of . Hence D y E ( x ) = + ∞ if and only if y = 0 .Furthermore, if y = 0 then either g ( t ) ≡ or g ( t ) = at k − (1 + O ( t )) , where a > and k is a positive integer. A much weaker version of the theorem above can be found in [10]. For the clarity of the exposition in this section,we leave the proof of Theorem 7 to appendix A.From the theorem above it follows that w.l.o.g we can set y = 0 since otherwise the second derivative is + ∞ .This will be useful when proving local additivity for the singular case. However, in the tensor product space, y canbe written as in Eq.( 34). Hence, while we assume that the (2 ,

2) block of the bipartite state y is zero, it is notimmediately obvious that the (2 ,

2) blocks of the one-party states y (1) l and y (2) l are also zero. Nevertheless, this isindeed the case as we show now. A. Tensor product structure in the singular case

Let

K ⊂ C n × n be a subspace of matrices that are partitioned as in Eq. (42). We assume that K contains a matrix x = (cid:20) x

00 0 (cid:21) , Tr ( x ∗ x ) = 1 . (44)We now choose a following orthonormal base x , . . . , x p , y , . . . , y q , z , . . . , z r , w , . . . , w s ∈ K . First, x = x . Then1. x , . . . , x p is an orthonormal basis of the subspace of K of matrices of the form (cid:20) ∗

00 0 (cid:21) . (It is possible that p = 1.)2. x , . . . , x p , y , . . . , y q is an orthonormal basis of the subspace of K of matrices of the form (cid:20) ∗ ∗ (cid:21) . (It is possiblethat q = 0.)3. x , . . . , x p , y , . . . , y q , z , . . . , z r is an orthonormal basis of the subspace of K of matrices of the form (cid:20) ∗ ∗∗ (cid:21) . (Itis possible that r = 0.)4. x , . . . , x p , y , . . . , y q , z , . . . , z r , w , . . . , w s is an orthonormal basis of K . (It is possible that s = 0.)We observe the following1. The projections of x , . . . , x p on the block (1 ,

1) are linearly independent.2. The projections of y , . . . , y q on the block (1 ,

2) are linearly independent if q ≥ z , . . . , z r on the block (2 ,

1) are linearly independent if r ≥ w , . . . , w s on the block (2 ,

2) are linearly independent if s ≥ K i ⊂ C n i × n i for i = 1 ,

2. We consider here the most complicated case in whichboth matrices x (1) ∈ K and x (2) ∈ K are singular. So we assume that each x ( i ) has the form (44). For i = 1 , x ( i )1 , . . . , x ( i ) p i , y ( i )1 , . . . , y ( i ) q i , z ( i )1 , . . . , z ( i ) r i , w ( i )1 , . . . , w ( i ) s i ∈ K i (45)exactly as above. We now form a tensor product of K ⊗ K with respect to the partitions of K , K as above.Let A = (cid:20) A A A A (cid:21) ∈ K , B = (cid:20) B B B B (cid:21) ∈ K . (46)We then agree that the partition in K ⊗ K is of the form as the following partition of A ⊗ B : A ⊗ B =  A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B A ⊗ B  . (47) Lemma 8.

Let K , K be two subspaces in C n × n and C n × n , respectively. Let C = [ C ij ] i,j =1 ∈ K ⊗ K bepartitioned as in (47) . Suppose that C = 0 and C ij = 0 for i, j ≥ . Write C as a linear combination of the tensorproducts of the bases of K and K , chosen as in (45) . Then each term in this linear combination of C is of the form αf ⊗ g , where α ∈ C , f ∈ K , g ∈ K , and both f and g have the form (cid:20) ∗ ∗∗ (cid:21) .Remark. It is also possible to show that at least one of the matrices f and g must have the form (cid:20) ∗ ∗ (cid:21) or (cid:20) ∗ ∗ (cid:21) .However, we will not be using it here. Proof.

Suppose the expansion of C contains a term of the form w (1) i ⊗ w (2) j . Look at the block (4 , w (1) i and w (2) j on the block(2 , C = 0 contrary to our assumption.Assume now that the expansion of C contains w (1) i ⊗ z (2) j . Since the expansion of C does not have terms w (1) i ⊗ w (2) j ,the contribution to the block C comes only from the projection of w (1) i on the block (2 ,

2) and the projection of z (2) j on the block (2 , C = 0, contrary to ourassumptions.Similarly, there are no terms in the expansion of C of the form w (1) i ⊗ y (2) j , since C = 0, and there are no terms inthe expansion of C of the form w (1) i ⊗ x (2) j since C = 0. That is, we have shown that the matrices w (1) i do not appearin the expansion of C . In exactly the same way, there are no terms in the expansion of C of the form z (1) i ⊗ w (2) j , y (1) i ⊗ w (2) j , and x (1) i ⊗ w (2) j , since C = 0, C = 0, and C = 0, respectively. This completes the proof. B. Local additivity in the singular case

In this subsection we prove Theorem 5 for the case in which x (1) and x (2) are singular local minima of K (1) and K (2) , respectively. We therefore choose bases such that x (1) and x (2) are of the form given in Eq. (42), and denote by r and r their respective ranks.Assume ﬁrst that both x (1) and x (2) are non-degenerate local minima. We need to show that D y E ( x ) > y ∈ x ⊥ , where x ≡ x (1) ⊗ x (2) . Note that the partition of x = [ x ij ] i,j =1 as in Eq. (47) gives x ij = 0 for all i, j = 1 , , , x = x (1)11 ⊗ x (2)11 .The most general y ∈ (cid:0) x (1) ⊗ x (2) (cid:1) ⊥ can be written as in Eq. (32), where y ′ is of the form given in Eq. (34).Consider now the partition of y = [ y ij ] i = j =1 as in Eq.(47). From Theorem 7 we know that D y E ( x ) = + ∞ unless4 y ij = 0 for all i, j = 2 , ,

4. We therefore assume now that y ij = 0 for all i, j = 2 , ,

4. In this case, Lemma 8 impliesthat all the matrices y (1) l and y (2) l in Eq. (34) have the form (cid:20) ∗ ∗∗ (cid:21) . That is, their (2 ,

2) block is zero. For this reason, we replace each subspace K ( i ) ⊂ C n i × n i ( i = 1 ,

2) with a smallersubspace U ( i ) ⊂ K ( i ) such that each matrix in the basis of U ( i ) has zeros on the (2,2) block. It is left to prove that x ≡ x ⊗ x is local minimum in U (1) ⊗ U (2) .Consider the new subspace U ( i ) ǫ , for ǫ >

0, where in the orthonormal basis of U ( i ) , we change only the ﬁrst matrix x ( i ) , i.e. the local minimum matrix, with the normalized diagonal matrix x ( i ) ǫ ≡ p n i − r i ) ǫ (cid:20) x ( i )11 r i ,n i − r i n i − r i ,r i ǫI n i − r i (cid:21) i = 1 , i,j are i × j zero matrices and I n i − r i are ( n i − r i ) × ( n i − r i ) identity matrices. Lemma 9.

Assume x ( i ) is a non-degenerate local minimum in U ( i ) , then x ( i ) ǫ is a non-degenerate local minimum in U ( i ) ǫ . Moreover, there exists δ > and ǫ > such that if ǫ < ǫ then D y ( i ) E ( x ( i ) ǫ ) > δ for all y ( i ) ∈ (cid:16) x ( i ) ǫ (cid:17) ⊥ .Proof. For simplicity of the exposition we remove the superscript ( i ) from x ( i ) and denote d ≡ n − r . That is, consider x = (cid:20) x r,d d,r d,d (cid:21) and x ǫ ≡ √ dǫ (cid:20) x r,d d,r ǫI d (cid:21) . We need to show that if x is a non-degenerate local minimum in U then x ǫ is a non-degenerate local minimum in U ǫ for small enough ǫ .First, we need to show that x ǫ remains critical. Indeed, since the condition (18) for criticality is satisﬁed for x , itis also satisﬁed for x ǫ . This is because x ǫ is a diagonal matrix and all y ∈ x ⊥ ǫ ⊂ U is of the form (cid:20) ∗ ∗∗ d,d (cid:21) . Second, we need to show that D y E ( x ǫ ) > δ . In Appendix B we show that D y E ( x ǫ ) does not diverge in the limit ǫ → y jk = 0 when both j > r and k > r ). Now, since we assume D y E ( x ) > y ∈ x ⊥ , we can alsoassume that there exist δ ′ > D y E ( x ) > δ ′ for all y ∈ x ⊥ . This is true because the set of all normalizedmatrices in x ⊥ is compact. Hence, from the nice behaviour of D y E ( x ǫ ) in the limit ǫ → ǫ there exists δ > D y E ( x ǫ ) > δ for all y ∈ ( x ǫ ) ⊥ . This completes the proof of thelemma.We now apply Theorem 5 to the non-singular case of x ǫ ≡ x (1) ǫ ⊗ x (2) ǫ . From Lemma 9, the second derivatives D y ( i ) E ( x ( i ) ǫ ) > δ for all y ( i ) ∈ (cid:0) x ( i ) (cid:1) ⊥ and i = 1 ,

2. Thus, we get that D y E ( x ǫ ) > δ for all y ∈ ( x ǫ ) ⊥ . We obtain it byfollowing precisely the same steps of the proof of Theorem 5 (in the non-singular case). Letting ǫ → y the second derivative at x (1) ⊗ x (2) is strictly positive (greater or equal to 2 δ ). This complete theproof of the main part of theorem 5 for the singular case.To proof the second part of the theorem, we assume now that x (1) is degenerate local minimum and x (2) is non-degenerate local minimum. In this case we only have D y (2) E ( x (2) ǫ ) > δ . Nevertheless, in Appendix B we show that D y (1) E ( x (1) ǫ ) − D y (1) E ( x (1) ) is of order ǫ log ǫ . Therefore, since D y (1) E ( x (1) ) ≥

0, it follows that we can choose ǫ small enough such that D y (1) E ( x (1) ǫ ) > − δ/ y ∈ x ⊥ (recall x ≡ x (1) ⊗ x (2) ) forwhich it is possible to have D y E ( x ) = 0 is y = y (1) ⊗ x (2) . However, the equality in Eq. (41) implies that x is a localminimum in this direction and this is also true even if x ( i ) are singular. We will therefore assume now that y is notof the form y (1) ⊗ x (2) . By following precisely the same steps of the proof of Theorem 5 (in the non-singular case) weget that for all other y ∈ x ⊥ we have D y E ( x ǫ ) > δ − δ/ δ/

2. We therefore get D y E ( x ) > ǫ → V. DISCUSSION

We have shown that the minimum entropy output of a quantum channel is locally additive (assuming at least oneof the two local minima is non-degenerate). Our proof consists of two key ingredients. The ﬁrst one is the use ofthe divided diﬀerence approach, which enabled us to calculate directional derivatives explicitly, and the second oneis the explicit use of the complex structure. In the appendix B of [10] we show that there exists counterexamplesfor local additivity over the real numbers. These counterexamples precludes the existence of a more straightforwarddiﬀerentiation argument than the complex structure based argument given here.The fact that the minimum entropy output is not globally additive makes local additivity of even greater interestto quantum information theorists. It suggests that it is some global feature, of the quantum channels involved, thatcorresponds to cases of non-additivity of the minimum entropy output. Perhaps one way to improve our understandingin this direction is to study properties of generic channels. In particular, it seems quite possible to us that for genericchannels (or generic subspaces) the entropy output have a ﬁnite number of isolated non-degenerate critical points.

Acknowledgments:—

We acknowledge many fruitful discussions with A. Roy and J. Yard in the earlier stages ofthis work. GG research is supported by NSERC. The authors acknowledge support from PIMS CRG MQI, MITACS,and iCore for Shmuel Friedland’s visits to IQIS in Calgary.

Appendix A: Proof of Theorem 7

Proof.

Let λ ( t ) ≥ . . . ≥ λ n ( t ) ≥

0, for t >

0, be the eigenvalues of ρ ( t ). Rellich’s theorem yields that each λ i ( t ) isanalytic in t in a neighbourhood of t = 0. So λ i (0) = λ i ( ρ ) > i = 1 , . . . , r and λ i (0) = 0 for i = r + 1 , . . . , n .Since each λ i ( t ) ≥ λ i ( t )

0, for i > r , must start with t to a positiveeven power times a positive constant. I.e. λ i ( t ) = λ i, n i t n i (1 + O ( t )), where λ i, n i > n i is a positive integerfor i > r . This shows that S ( ρ ( t )) = − P ni =1 λ i ( t ) log λ i ( t ) must be of the form (43). Furthermore, K = 0 if and onlyif n i ≥ i > r . So if K = 0 and not all λ i ( t ) are identically zero for i > r , then k = min { n i − , λ i, n i > } .It is left to show that K = Tr ( y y ∗ ). Let X = (cid:20) xx ∗ (cid:21) , Y = (cid:20) yy ∗ (cid:21) . Recall that the pencil X + tY has n nonnegaive and n nonpositive eigenvalues σ ( t ) ≥ . . . ≥ σ n ( t ) ≥ ≥ − σ n ( t ) ≥ . . . ≥ − σ ( t ) . The singular values of x + ty are the n nonnegative eigenvalues of X + tY . Hence, the eigenvalues of ρ ( t ) are σ i ( t ) t for i = 1 , . . . , n . Let σ i ( t ) = σ i, t + O ( t ) for t > i > r . Hence the coeﬃcient of t in the i -th eigenvalue of ρ ( t ),for i > r , is σ i, . Thus K = P ni = r +1 σ i, .Let P ∈ C n × n be the orthogonal projection on the zero eigenspace of X . Then P Y P (( I − P ) C n ) = . Theother possible nonzero eigenvalues of P Y P are σ r +1 , ≥ . . . ≥ σ n, ≥ ≥ − σ n, ≥ . . . ≥ − σ r +1 , , which are theeigenvalues of the restriction of P Y P on the kernel of ρ [8, 18] or [9, § P Y P to the kernel of X is (cid:20) y y ∗ (cid:21) , obtained by deleting the corresponding rows and the columns in Y . Hence2 K = 2 n X i = r +1 σ i, = Tr (( P Y P ) ) = Tr ( y y ∗ + y ∗ y ) . This completes the proof.

Appendix B: Formula for the second derivative in the singular case

Proposition 10.

Let x ǫ = (cid:20) x r,n − r n − r,r ǫI n − r,n − r (cid:21) and y = (cid:20) y y y n − r,n − r (cid:21) , (B1) where i,j are i × j zero matrices, and x , y ∈ C r × r , y ∈ C r × n , y ∈ C n × r . We also assume that x =diag {√ p , ..., √ p r } is non singular. Then, the limit of D y E ( x ǫ ) when ǫ goes to zero exists and equals to lim ǫ → D y E ( x ǫ ) = D y E ( x ) − y y ∗ + y ∗ y ) log ρ ] , (B2)6 where ρ ≡ x x ∗ .Remark. The contribution of the normalization factor of x ǫ is of order O ( ǫ ) and therefore ignored here. Proof.

The proof is based on a straightforward calculation. The expression for the second derivative given in Eq.(17),can be written as: D y E ( x ǫ ) ≡ d dt S ( ρ ǫ ( t )) (cid:12)(cid:12)(cid:12) t =0 = −  S ( ρ ǫ ) + n X j =1 n X k =1 G jk  , where ρ ǫ ≡ x ǫ x ∗ ǫ , γ ≡ x ǫ y ∗ + yx ∗ ǫ G jk ≡ | y jk | log p j + log p j − log p k p j − p k ) | ( γ ) jk | . (B3)If both j, k are smaller or equal to r , then clearly those G jk terms contribute to D y E ( x ). Now, if both j > r and k > r then y jk = 0 and we have G jk = 0. Hence, we get D y E ( x ǫ ) = D y E ( x ) − n X j = r +1 r X k =1 ( G jk + G kj ) (B4)We therefore focus now on the expressions for G jk and G kj in the case j > r and k ≤ r .Writing x ǫ = diag {√ p , ..., √ p n } with p j = ǫ for j > r we have( γ ) jk = √ p j ¯ y kj + √ p k y jk = √ p k y jk + O ( ǫ )( γ ) kj = √ p k ¯ y jk + √ p j y kj = √ p k ¯ y jk + O ( ǫ ) , where the last equality was obtained by setting p j = ǫ . We therefore have | ( γ ) jk | = | ( γ ) kj | up to O ( ǫ ). From theexpressions above we get for j > r and k ≤ r the following formulas: G jk = | y jk | log p j + log p j − log p k p j − p k ) (cid:0) p k | y jk | + O ( ǫ ) (cid:1) G kj = | y kj | log p k + log p k − log p j p k − p j ) (cid:0) p k | y jk | + O ( ǫ ) (cid:1) Since p j = ǫ and p k >

0, we have G jk = | y jk | log ǫ + log ǫ − log p k ǫ − p k ) (cid:0) p k | y jk | + O ( ǫ ) (cid:1) = 12 | y jk | log ǫ + 12 | y jk | log p k + O ( ǫ log ǫ ) G kj = | y kj | log p k + log p k − log ǫ p k − ǫ ) (cid:0) p k | y jk | + O ( ǫ ) (cid:1) = | y kj | log p k + 12 | y jk | log p k − | y jk | log ǫ + O ( ǫ log ǫ )Hence, G jk + G kj = | y jk | log p k + | y kj | log p k + O ( ǫ log ǫ ) . By substituting this expression into Eq.(B4) we get (B2). This completes the proof. [1] K. M. R. Audenaert and S. L. Braunstein, Comm. Math. Phys., , 443 (2004).[2] G. G. Amosov, A. S. Holevo, and R. F. Werner, Problems in Inf. Trans. , 305 (2000).[3] C. H. Bennett, D. P. DiVincenzo, T. Mor, P. W. Shor, J. A. Smolin and B. M. Terhal, Phys. Rev. Lett. , 5385 (1999).[4] F. G.S.L. Brandao, J. Eisert, M. Horodecki, and D. Yang, Phys. Rev. Lett. , in press (2011). [quant-ph/1010.5074].[5] F. G.S.L. Brandao and M. Horodecki, Open Syst. Inf. Dyn. 17, 31 (2010).[6] H. Derksen, S. Friedland, G. Gour, D. Gross, L. Gurvits, A. Roy, and J. Yard. On minimum entropy output and theadditivity conjecture. Notes of Quantum Information Group, American Institure for Mathematics workshop “Geometryand representation theory of tensors for computer science, statistics and other areas”, July 21-25, 2008. [7] M. Fannes, B. Haegeman, M. Mosonyi, and D. Vanpeteghem, quant-ph/0410195.[8] Extremal eigenvalue problems, Bull. Brazilian Math. Soc.

Matrices , A draft of a book in preparation, http://homepages.math.uic.edu/ ∼ friedlan/bookm.pdf[10] S. Friedland, G. Gour, A. Roy, Local extrema of entropy functions under tensor products, Quantun Information andComputation , Vol.11 No.11-12 pp1028-1044,.eprint: math-ph/1105.5380.[11] M. Fukuda, C. King, and D. K. Moser, Commun. Math. Phys. , 111-143, 2010.[12] G. Gour and S. Friedland, arXiv:1105.6122 [quant-ph].[13] G. Gour and N. Wallach, Phys. Rev. A , 042309 (2007).[14] M. B. Hastings, Nature Physics , 255 (2009).[15] A. S. Holevo, The additivity problem in quantum information theory. In International Congress of Mathematicians. Vol.III , pages 999–1018. Eur. Math. Soc., Z¨urich, 2006.[16] R.A. Horn and C.R. Johnson,

Topics in Matrix Analysis , Cambridge Univ. Press, 1999.[17] R. Horodecki, P. Horodecki, and M. Horodecki, Rev. Mod. Phys. , 865 (2009).[18] T. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, 2nd ed., New York 1980.[19] C. King, J. Math. Phys. , 4641 (2002).[20] C. King, Quant. Inf. Comp. , 186 (2003).[21] C. King and M. B. Ruskai, IEEE Trans. Inf. Theory , 192 (2001).[22] M. Ohya and D. Petz, Quantum entropy and its use , Springer (2004).[23] D. Petz,

Quantum information theory and quantum statistics , Springer (2008).[24] M. E. Shirokov, Problems of Information Transmission, , 23-40 (2006).[25] P. W. Shor, J. Math. Phys. , 4334 (2002).[26] P. W. Shor. Equivalence of additivity questions in quantum information theory. Comm. Math. Phys. , 246(3):453–472,2004.[27] G. Smith and J. Yard, Science , 1812 (2008).[28] J. Yard, private communication.[29] In [6, 10] it was shown to be true for a large class of functions (not only for the von-Neumann entropy) including all the pp