[PDF] Global convergence of diluted iterations in maximum-likelihood quantum tomography

Abstract

In this paper we present an inexact stepsize selection for the Diluted R\rho R algorithm, used to obtain the maximum likelihood estimate to the density matrix in quantum state tomography. We give a new interpretation for the diluted R\rho R iterations that allows us to prove the global convergence under weaker assumptions. Thus, we propose a new algorithm which is globally convergent and suitable for practical implementation.

Full PDF

aa r X i v : . [ m a t h - ph ] J un Global convergence of diluted iterations in maximum-likelihood quantum tomography

D. S. Gon¸calves ∗ Departamento de Matem´atica Aplicada, Universidade Estadual de Campinas, Campinas, SP 13083-859, Brazil andIRISA, University of Rennes 1, Rennes, France

M. A. Gomes-Ruggiero and C. Lavor

Departamento de Matem´atica Aplicada, Universidade Estadual de Campinas, Campinas, SP 13083-859, Brazil (Dated: September 11, 2018)In this paper we present an inexact stepsize selection for the Diluted

RρR algorithm [1], usedto obtain the maximum likelihood estimate to the density matrix in quantum state tomography.We give a new interpretation for the diluted

RρR iterations that allows us to prove the globalconvergence under weaker assumptions. Thus, we propose a new algorithm which is globallyconvergent and suitable for practical implementation.PACS number(s): 03.65.Wj

I. INTRODUCTION

In quantum state tomography, the aim is to ﬁnd an estimate for the density matrix associated to the ensemble ofidentically prepared quantum states, based on measurement results [2–4]. This is an important procedure in quantuminformation and computation, for example, to verify the ﬁdelity of the prepared state [5, 6] or in quantum processtomography [7].Besides the experimental design to get a tomographically complete set of measurements, post processing routinesare required to recover information from the measurement results. Some approaches are based on direct inversion ofthe data while others rest in statistical based methods. For a survey, the reader can see [3].Among the statistical based methods, the Maximum Likelihood estimation (ML) [3, 8] has been often used byexperimentalists [9]. The Maximum Likelihood estimate for the density matrix is that one which maximizes theprobability of the observed data. In [2, 8], it was proposed an iterative procedure to solve the problem of themaximum likelihood estimation for density matrices. We refer to this procedure as the

RρR algorithm. The mainproperties of the

RρR algorithm are: keeping the positivity and unit trace of the iterates and its low computationalcost, involving only matrix products at each iteration.Although in practice the

RρR method works in most of the cases, there is no theoretical guarantee of convergence,regardless the dataset and the initial point. In [1], the authors presented an example where the method gets into acycle. In the same work, they proposed some kind of relaxation of the

RρR iterations, controlling the step size ateach iteration by a positive parameter t . They called this kind of iterations as Diluted RρR iterations. It was provedthat the Diluted

RρR method converges to the maximum likelihood solution if, at each iteration, the optimal valueof the step size t is chosen.However, to ﬁnd the optimal value of t means to solve another optimization problem at each iteration, whichrepresents an undesirable additional computational cost in practice. This issue was remarked in [1], where it wassuggested some heuristics in order to get some reasonable guess for the step size t in practical implementations, butloosing the convergence warranty.In this work we propose a new stepsize selection procedure which is reliable and feasible in practice. We give anew interpretation to the Diluted RρR iteration, where the search direction is a combination of two ascent directionscontrolled by the step size t . This allow us to apply an inexact line search to determine the step length. Instead ofthe optimal value of t , at each iteration, it is enough to ﬁnd a value which ensures a suﬃcient improvement in thelikelihood function in order to prove the global convergence. We propose an algorithm, using an Armijo-like condition ∗ [email protected] [10, 11] and a backtracking procedure, and prove that it is globally convergent and also computationally practicable.This paper is organized as follows. Section II reviews the theory of the RρR algorithms for quantum tomography.The concepts of nonlinear optimization used to prove the convergence of the Diluted

RρR are presented in SectionIII. Section IV presents the proof of global convergence of the Diluted

RρR algorithm under line search and Armijocondition. Examples illustrating the diﬀerences and similarities of our proposal to the traditional ﬁxed step lengthare presented in Section V. Section VI closes this work with some ﬁnal considerations.

II.

RρR

ITERATIONS FOR QUANTUM TOMOGRAPHY

In this section we address the theory and motivation behind the

RρR and the Diluted

RρR algorithms, followingthe references [1, 2, 8].Here we consider measurements described by a POVM set { E i } i , where E i are semideﬁnite positive operators whichsum to the identity. The relation between the density matrix ρ and the probability outcomes is given by the Born’srule [3]: p i ( ρ ) = tr ( E i ρ ) . Linear inversion methods equate the predicted probabilities { p i ( ρ ) } i with the experimental data { f i } i : f i = tr ( E i ρ ) , ∀ i and the inversion of these linear equations gives an estimate of ρ . The main problem with this approach is that, ingeneral, the frequencies are noisy and this fact can leads to a matrix ρ outside the density matrix space (the Hermitiansemideﬁnite positive trace one matrices).Among the statistical based methods, the Maximum Likelihood estimation [3, 8] has been often used by experimen-talists [9]. Let us denote ρ † the conjugate transpose of ρ and ρ (cid:23) ρ is semideﬁnitepositive (or ρ ≻ S = (cid:8) ρ | ρ = ρ † , ρ (cid:23) , tr ( ρ ) = 1 (cid:9) , that one which maximizes the likelihood function. The likelihood function is the probability of getting the observeddata given the density matrix ρ . A common used likelihood [2, 8], for a given data set { f i } , is L ( ρ ) ∝ Y i p i ( ρ ) Nf i , and since the log-likelihood is more tractable, our goal is to ﬁnd ρ that solves the problemmax ρ X i f i log p i ( ρ ) ≡ F ( ρ )s.t tr ( ρ ) = 1 ρ (cid:23) . (1)The maximization of the objective function F ( ρ ) in (1) is constrained to the density matrix space S which is theintersection of the semideﬁnite positive cone ρ (cid:23) ρ ) = 1. The constraints may motivateone to try semideﬁnite programming (SDP) methods [12] for solving (1), but eﬃcient solvers [13, 14] are availableonly for linear and quadratic objective functions.Other methods are based on the reparameterization [9, 15] of the matrix variable ρ = ρ ( θ ) in order to automaticallyfulﬁll the constraints and then to solve an unconstrained maximization problem in the new variable θ . However,generic numerical optimization methods are often slow when the number of parameters d ( d is the dimension of theHilbert space) is large.Here, we study an alternative algorithm, proposed in [8], which takes advantage of the structure of the problem (1)and has good convergence properties.Consider the gradient of the objective function F ( ρ ), given by ∇ F ( ρ ) = X i f i tr ( E i ρ ) E i ≡ R ( ρ ) , (2)and let int ( S ) be the interior of S , that is int ( S ) = { ρ ∈ S | ρ ≻ } . As it was shown in [8], a matrix ρ ∈ int ( S ) solves (1) if it satisﬁes the extremal equation R ( ρ ) ρ = ρ, (3)or equivalently R ( ρ ) ρR ( ρ ) = ρ. (4)If the density matrix ρ is restricted to diagonal matrices, the equation (3) can be solved by the expectation-maximization (EM) algorithm [16]. The EM algorithm is guaranteed to increase the likelihood at each step andconverges to a ﬁxed point of (3). However, the EM algorithm cannot be applied to the quantum problem, becausewithout the diagonal constraint it does not preserve the positivity of the density matrix. In [8], it was proposed an iter-ative procedure based on the equation (4) instead. Let k be the iteration index, and so, ρ k the current approximationto the solution. An iteration of the RρR algorithm is given by: ρ k +1 = N R ( ρ k ) ρ k R ( ρ k ) , where N is the normalization constant which ensures unit trace.Notice that the positivity is explicitly preserved at each step. Another remarkable property of the RρR algorithmis its computational cost: at each iteration, it is just required to compute a matrix-matrix product. This is a quitecheap iteration in contrast with the iteration of an semideﬁnite programming method.Although the

RρR algorithm is a generalization of the EM algorithm, its convergence is not guaranteed in general.In [1], it was presented a counterexample where the method produces a cycle. For this reason, in that work wasproposed the diluted iteration of the

RρR algorithm, or simply “Diluted

RρR ”.The idea is to control each iteration step by mixing the operator R ( ρ ) with the identity operator: ρ k +1 = N (cid:20) I + tR ( ρ k )1 + t (cid:21) ρ k (cid:20) I + tR ( ρ k )1 + t (cid:21) , (5)where t > N is the normalization constant. It is important to observe that as t → ∞ , the iteration tends to theoriginal RρR iteration. Moreover, when t > R ( ρ ) ρ = ρ . It was also shown that the “Diluted RρR ” is convergent to the ML density matrix,if the initial approximation is the maximally mixed state ρ = (1 /d ) I and the optimal value of t : t = arg max t> F ( ρ k +1 ( t )) , (6)is used at each iteration.In nonlinear optimization [10, 11], this is called exact line search. Though the convergence can be achieved usingthis procedure, in general, solving (6) may be computationally demanding. Albeit in [1] the authors proved theconvergence with the exact line search, they suggest that, in practice, one could use an ad hoc scheme to determinethe “best” value of the steplength t to be used through all iterations.Here, instead of (6), we propose an inexact line search to determine the steplength t in each iteration (5). We donot search the best possible t >

0, but one that ensures a suﬃcient improvement in the log-likelihood. We prove thatthis procedure is well-deﬁned and that the iterations (5) converge to a solution of (1), from any positive initial matrix ρ . The implementation of the inexact line search is straightforward and we also present some examples showing theimprovements, against an ad hoc ﬁxed t strategy. III. GLOBAL CONVERGENCE THEORY FOR ASCENT DIRECTION METHODS

The purpose of this section is to expose some basic concepts of nonlinear optimization which are necessary to provethe global convergence of the Diluted

RρR algorithm under an inexact line search scheme. These concepts are classicalfor the optimization community and are detailed in [10, 11]. To make it easier, we have adapted these concepts usingthe quantum tomography notation.Consider the following maximization problem over the set of Hermitian matrices H :max ρ F ( ρ )s.t ρ ∈ Ω , (7)where f : H → R is a continuously diﬀerentiable function and Ω ⊂ H is a convex set.Given an approximation ρ k for the solution of problem (7), ascent direction methods try to improve the currentobjective function value generating an ascent direction D k and updating the iterate ρ k +1 = ρ k + t k D k , (8)where t k is called stepsize or steplength. Deﬁnition III.1.

A direction D k is an ascent direction at the iterate ρ k iftr (cid:0) ∇ F ( ρ k ) D k (cid:1) > , and this ensures that, for a suﬃcient small t k > , the function value is increased. An ascent direction D k is feasibleif ρ k +1 , belongs to Ω for t k ∈ (0 , ε ) , for some ε > . One of the insights of this work is that the diluted

RρR iteration (5) can be written as an ascent direction iteration(8) and so, using the theory of this section, we can prove the global convergence.The iteration (8) can be repeated while there exists a feasible ascent direction. If at some point ρ ∗ there is nofeasible ascent direction, then ρ ∗ is a stationary point . It is well known that every local maximizer is a stationarypoint, but the converse is not true in general. If the function f is concave on the convex set Ω, then a stationarypoint is also a maximizer.A maximization algorithm for the problem (7) is called globally convergent [10, 11] if every limit point of the sequencegenerated by the algorithm is a stationary point, regardless the initial approximation ρ . Although feasible ascentdirections ensure that, for a suﬃcient small t k >

0, we can increase the function value, this is not enough to ensure theglobal convergence. The reason is that a simple increase in the objective function, F ( ρ k +1 ) > F ( ρ k ), along an ascentdirection is a too modest objective. In order to achieve local maximizers, or at least stationary points, a suﬃcient increase at each iteration is required.Of course that a natural choice for the steplength t k , along the direction D k , is the solution of the problem: t k = argmax t F ( ρ k + tD k ) , (9)that is called exact line search . However, ﬁnding the global maximizer of f along the direction D k is itself a hardproblem, and unless the function f has a special structure such as a quadratic function, for instance, the computationaleﬀort is considerable.To avoid the considerable computational eﬀort in the exact line search (9), an inexact line search can be performed.A natural scheme is to consider successive stepsize reductions. Since the search is on a ascent direction, eventuallyfor a small t k , we can obtain F ( ρ k + t k D k ) > F ( ρ k ). But, this simple increase can not eliminate some convergencediﬃculties. One possible strategy is the use of the Armijo rule , which asks for a steplength t such that a suﬃcientimprovement in the objective function is obtained: F ( ρ k + tD k ) > F ( ρ k ) + γt tr (cid:0) ∇ F ( ρ k ) D k (cid:1) , (10)where γ ∈ (0 , t until the condition (10) is veriﬁed. There are other alternatives tothe successive stepsize reduction, for instance, strategies based on quadratic or cubic interpolation [11].Besides the steplength selection, requirements on the ascent directions D k are also necessary to avoid certainproblems. For example, it is not desirable to have directions D k with small norm when we are far from the solution.It is also necessary to avoid that the sequence of directions (cid:8) D k (cid:9) become orthogonal to the gradient of f , because, inthis case, we are in directions of almost zero variation where too small or none improvement on the objective functioncan be reached. A general condition that avoid such problems is called gradient related condition [10]. Deﬁnition III.2 (Gradient related) . A sequence of directions (cid:8) D k (cid:9) is gradient related if for any subsequence (cid:8) ρ k (cid:9) k ∈K that converges to a nonstationary point, the corresponding subsequence (cid:8) D k (cid:9) k ∈K is bounded and satisﬁes lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) D k (cid:1) > . This condition means that (cid:13)(cid:13) D k (cid:13)(cid:13) does not become ’too small’ or ’too large’ relative to (cid:13)(cid:13) ∇ F ( ρ k ) (cid:13)(cid:13) , and that the D k and ∇ F ( ρ k ) do not become orthogonal.If an algorithm generates ascent directions satisfying the gradient related condition and the stepsizes are selectedaccording to the Armijo rule, then it is possible to prove the global convergence [10]. Theorem III.3 (Global convergence) . Let (cid:8) ρ k (cid:9) be a sequence generated by a feasible ascent directions method ρ k +1 = ρ k + t k D k , and assume that (cid:8) D k (cid:9) is gradient related and t k is chosen by the Armijo rule. Then, every limitpoint of (cid:8) D k (cid:9) is a stationary point.Proof. See [10, Proposition 2.2.1].

IV. CONVERGENCE OF THE DILUTED

RρR

Sections II and III gave us the necessary background to show the convergence of the diluted

RρR iterations usingan inexact line search to determine the stepsize t . In this section, ﬁrstly we show that the diluted iteration (5) can bewritten as an ascent direction iteration (8). So, we give a geometrical interpretation and prove that the correspondingsequence of directions (cid:8) D k (cid:9) is gradient related. Finally, using the Armijo condition and a backtracking procedure,we present an algorithm which is globally convergent following the Theorem III.3.From now on, we will use the notation ∇ F ( ρ ) instead of R ( ρ ). So, the equation (5) becomes: ρ k +1 = N (cid:20) I + t ∇ F ( ρ k )1 + t (cid:21) ρ k (cid:20) I + t ∇ F ( ρ k )1 + t (cid:21) (11)or ρ k +1 = ( I + t ∇ F ( ρ k )) ρ k ( I + t ∇ F ( ρ k ))tr (( I + t ∇ F ( ρ k )) ρ k ( I + t ∇ F ( ρ k ))) ≡ G ( ρ k ) . (12)The expression above can be seen as a ﬁxed point iteration. Expanding that expression, we obtain G ( ρ ) = ρ + t ( ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )) + t ∇ F ( ρ ) ρ ∇ F ( ρ )1 + 2 t + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) . (13)Notice that ρ ∗ is a ﬁxed point of G ( ρ ), G ( ρ ∗ ) = ρ ∗ , for t >

0, if the following conditions are satisﬁed ρ ∗ = ∇ F ( ρ ∗ ) ρ ∗ ∇ F ( ρ ∗ ) = ∇ F ( ρ ∗ ) ρ ∗ . (14)If the above conditions are veriﬁed at a positive deﬁnite trace one matrix ¯ ρ , then the optimality conditions [10, 11]for the problem (1) are satisﬁed and ρ ∗ is the maximum likelihood estimate.The following two lemmas are useful when concerning the RρR iterations.

Lemma IV.1.

For all ρ ∈ int ( S ) , we have tr ( ∇ F ( ρ ) ρ ) = 1 . Proof.

Directly from (2).

Lemma IV.2. If ρ ∈ int ( S ) , then tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) ≥ , with equality if and only if ρ = ∇ F ( ρ ) ρ . Proof.

From Lemma IV.1, 1 = tr ( ∇ F ( ρ ) ρ ) = tr (cid:16) ∇ F ( ρ ) ρ / ρ / (cid:17) , and from the Cauchy-Schwarz inequality, we obtain1 = (cid:12)(cid:12)(cid:12) tr (cid:16) ∇ F ( ρ ) ρ / ρ / (cid:17)(cid:12)(cid:12)(cid:12) ≤ tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) tr ( ρ ) = tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) . The equality in Cauchy-Schwarz occurs when ∇ F ( ρ ) ρ / = αρ / , or equivalently, when ∇ ( ρ ) ρ = ρ .Let us simplify the expression (13), deﬁning for some ρ , q ( t ) = 1 + 2 t + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) . Since tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) ≥

1, for any density matrix ρ , we have that q ( t ) ≥ t ≥

0. Furthermore, if ρ ∈ S , theset of density matrices, G ( ρ ) ∈ S as well, for any t ≥

0. Thus, G ( ρ ) deﬁnes a path on the density matrices space S ,parameterized by t such that, when t → G ( ρ ) → ρ , and when t → ∞ , G ( ρ ) → ˜ ρ , where˜ ρ = ∇ F ( ρ ) ρ ∇ F ( ρ )tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) , as in the original RρR algorithm [2].Let us also deﬁne the point ¯ ρ = ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )2 . (15)Unlike the point ˜ ρ , the point ¯ ρ , in general, is not in the set S .Now, rewriting the expression (13), we obtain G ( ρ ) = 1 q ( t ) ρ + 2 tq ( t ) (cid:18) ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )2 (cid:19) + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ∇ F ( ρ ) ρ ∇ F ( ρ )tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) , (16)that is G ( ρ ) = 1 q ( t ) ρ + 2 tq ( t ) ¯ ρ + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ˜ ρ. Therefore, we have a convex combination of the points ρ, ¯ ρ , ˜ ρ , and the path deﬁned by t is in the convex set whoseextreme points are ρ, ¯ ρ and ˜ ρ , as we can see in Figure 1. ρ k ¯ ρ k ˜ ρ k ¯ D k ˜ D k tr ( ρ ) = 1 ( t → ∞ )( t = 0) ρ k +1 ( t ) S Figure 1: Geometrical interpretation of G ( ρ k ) as a curved path parametrized by t . Finally, deﬁning the directions ¯ D = ¯ ρ − ρ = ∇ f ( ρ ) ρ + ρ ∇ F ( ρ )2 − ρ, (17)˜ D = ˜ ρ − ρ = ∇ F ( ρ ) ρ ∇ F ( ρ )tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − ρ, (18)and using (16), we obtain G ( ρ ) = ρ + 2 tq ( t ) ¯ D + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ˜ D, which provides us an iteration like (8) ˆ ρ = G ( ρ ) = ρ + tD, where D = 2 q ( t ) ¯ D + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ˜ D. (19)The search direction D is a combination of the directions ¯ D and ˜ D controlled by the parameter t . From Figure 1, wecan see that as t → ∞ , D goes to the direction ˜ D , whereas t → D becomes parallel to ¯ D . It is worth to prove thatthese are feasible ascent directions. Proposition IV.3.

The directions ¯ D and ˜ D are feasible ascent directions for any nonstationary point ρ .Proof. To prove that ¯ D is an ascent direction, we need to show that tr (cid:0) ∇ F ( ρ ) ¯ D (cid:1) >

0. Using the deﬁnition of ¯ D , weget ¯ D = ¯ ρ − ρ = ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )2 − ρ. For a nonstationary ρ ( ∇ F ( ρ ) ρ = ρ ),tr (cid:0) ∇ F ( ρ ) ¯ D (cid:1) = tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − tr ( ∇ F ( ρ ) ρ ) =tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − > , by the Cauchy-Schwarz inequality, which implies that ¯ D is an ascent direction. If ρ ∈ int ( S ), then there exists t > ρ + t ¯ D ∈ S , so the direction is feasible.In a similar way for ˜ D , tr (cid:16) ∇ F ( ρ ) ˜ D (cid:17) = tr ( ∇ F ( ρ ) ∇ F ( ρ ) ρ ∇ F ( ρ ))tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − > . Using the fact that ˜ ρ ∈ S , for t ∈ (0 , ρ + t ˜ D ∈ S as well.Since the direction D is a positive combination of feasible ascent directions, it is also a feasible ascent direction.Now, if we can show that the sequence of directions (cid:8) D k (cid:9) is gradient related, then we can prove the global convergenceunder an inexact line search scheme. First, we present some technical lemmas which are useful to show the desiredresult. Lemma IV.4.

For ρ k ≻ and tr (cid:0) ρ k (cid:1) = 1 , the matrix ¯ ρ k , deﬁned in (15), is the solution of the problem max ρ tr (cid:0) ∇ F ( ρ k )( ρ − ρ k ) (cid:1) − tr (cid:0) ( ρ − ρ k )( ρ k ) − ( ρ − ρ k ) (cid:1) s.t tr ( ρ ) = 1 . (20) Proof.

Consider the optimality conditions for (20): −∇ F ( ρ k ) + 12 (cid:2) ( ρ − ρ k )( ρ k ) − + ( ρ k ) − ( ρ − ρ k ) (cid:3) + λ I = 0 , (21)tr ( ρ ) = 1 . (22)In equation (21), multiplying at the right by ρ k and taking the trace, we have − tr (cid:0) ∇ F ( ρ k ) ρ k (cid:1) + tr (cid:0) ρ − ρ k (cid:1) + λ tr (cid:0) ρ k (cid:1) = 0 , which implies that λ = 1. So, from −∇ F ( ρ k ) + 12 (cid:2) ( ρ − ρ k )( ρ k ) − + ( ρ k ) − ( ρ − ρ k ) (cid:3) + I = 0 , we obtain ρ ( ρ k ) − + ( ρ k ) − ρ = ∇ F ( ρ k ) . Using the symmetry of the solution ρ , the symmetry of ρ k , and ∇ F ( ρ k ), we conclude that ρ = ∇ F ( ρ k ) ρ k = ρ k ∇ F ( ρ k ) = ∇ F ( ρ k ) ρ k + ρ k ∇ F ( ρ k )2 = ¯ ρ k . Lemma IV.5.

The sequence of directions (cid:8) ¯ D k (cid:9) , used to deﬁne the sequence (cid:8) ρ k (cid:9) by ρ k +1 = ρ k + t ¯ D k , satisﬁes lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k )(¯ ρ k − ρ k ) (cid:1) > , for all subsequence (cid:8) ρ k (cid:9) k ∈K that converges to a non-stationary point ρ ′ .Proof. Suppose there is a subsequence (cid:8) ρ k (cid:9) k ∈K that converges to a non-stationary point ρ ′ . Lemma IV.4 tell us that¯ ρ k is the solution of (20). Thus, at ¯ ρ k , the gradient of the objective function of (20) is orthogonal to the hyperplanetr ( ρ ) = 1, that is tr (cid:18)(cid:20) ∇ F ( ρ k ) − (cid:0) (¯ ρ k − ρ k )( ρ k ) − + ( ρ k ) − (¯ ρ k − ρ k ) (cid:1)(cid:21) ( ρ − ¯ ρ k ) (cid:19) = 0 , ∀ ρ such that tr ( ρ ) = 1. Since the feasible set of (20) contains S , we havetr (cid:18)(cid:20) ∇ F ( ρ k ) − (cid:0) (¯ ρ k − ρ k )( ρ k ) − + ( ρ k ) − (¯ ρ k − ρ k ) (cid:1)(cid:21) ( ρ − ¯ ρ k ) (cid:19) = 0 , ∀ ρ ∈ S . Expanding the last expression, we obtaintr (cid:0) ∇ F ( ρ k )( ρ − ¯ ρ k ) (cid:1) = − (cid:2) tr (cid:0) ( ρ k − ¯ ρ k )( ρ k ) − ( ρ − ¯ ρ k ) (cid:1) + tr (cid:0) ( ρ − ¯ ρ k )( ρ k ) − ( ρ k − ¯ ρ k ) (cid:1)(cid:3) , ∀ ρ ∈ S . In particular, for ρ = ρ k ,tr (cid:0) ∇ F ( ρ k )(¯ ρ k − ρ k ) (cid:1) = tr (cid:0) ( ρ k − ¯ ρ k )( ρ k ) − ( ρ k − ¯ ρ k ) (cid:1) = (cid:13)(cid:13) ρ k − ¯ ρ k (cid:13)(cid:13) ρ k ) − . (23)Using the continuity of the solution given by Lemma (IV.4), we havelim k →∞ , k ∈K ¯ ρ k = ¯ ρ = ∇ F ( ρ ′ ) ρ ′ + ρ ′ ∇ F ( ρ ′ )2 . Taking limits in (23), we obtain lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k )(¯ ρ k − ρ k ) (cid:1) = k ρ ′ − ¯ ρ k ρ ′ ) − > . Since ρ ′ is non-stationary, the right hand side of the above inequality is strictly positive and this completes theproof.Finally, using the previous lemmas, we can prove the main assertion of this section. Proposition IV.6.

The sequence of directions (cid:8) D k (cid:9) is gradient related.Proof. First, let us show that (cid:8) D k (cid:9) is bounded. In fact, ρ k +1 ( t k ) = ρ k + t k D k = G ( ρ k ) is in S , since ρ k ≻ t k ≥

0, by deﬁnition. In particular, for t k = 1, we have ρ k +1 (1) = ρ k + D k ∈ S , and since S is bounded, then (cid:8) D k (cid:9) is also bounded.Now, let (cid:8) ρ k (cid:9) k ∈K be a subsequence of the sequence (cid:8) ρ k (cid:9) generated by the iterations ρ k +1 = ρ k + t k D k . Suppose (cid:8) ρ k (cid:9) k ∈K converges to a nonstationary point ρ ′ . Using the deﬁnition of D k , we obtaintr (cid:0) ∇ F ( ρ k ) D k (cid:1) = 2 q ( t k ) tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) + t k tr (cid:0) ∇ F ( ρ k ) ρ k ∇ F ( ρ k ) (cid:1) q ( t k ) tr (cid:16) ∇ F ( ρ k ) ˜ D k (cid:17) . The second term in the right hand side is nonnegative, thentr (cid:0) ∇ F ( ρ k ) D k (cid:1) ≥ q ( t k ) tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) . Considering t k ∈ (0 , t max ], we have tr (cid:0) ∇ F ( ρ k ) D k (cid:1) ≥ q ( t max ) tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) . Taking the limit for a subsequence converging to a nonstationary point,lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) D k (cid:1) ≥ q ( t max ) lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) , and since (cid:8) ¯ D k (cid:9) is gradient related, by Lemma IV.5,lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) D k (cid:1) > , which implies that (cid:8) D k (cid:9) is gradient related.Thus, choosing the step size t k at each iteration, such that the Armijo condition (10) is fulﬁlled, we obtain a globallyconvergent algorithm following the Theorem III.3.In this way, we can deﬁne the steps of a globally convergent algorithm that uses an inexact line search as the following: Algorithm 1Step 0.

Given ρ ≻ (cid:0) ρ (cid:1) = 1, t max > < α < α <

1, set k = 0 and t = t max . Step 1.

If some stopping criterion is veriﬁed, stop. Otherwise, compute the directions ¯ D k and ˜ D k , deﬁnedin (17) and (18). Set t = max { , t k − } . Step 2.

Set D = q ( t ) ¯ D k + t tr (cid:0) ∇ F ( ρ k ) ρ k ∇ F ( ρ k ) (cid:1) q ( t ) ˜ D k ! . F ( ρ k + tD ) ≤ F ( ρ k ) + γ t tr (cid:0) ∇ F ( ρ k ) D (cid:1) , choose t ∈ [ α t, α t ] and go to Step 2. Step 3.

Set t k = t , D k = D and ρ k +1 = ρ k + t k D k . Go to the step 1.The Theorem IV.7 states the desired result, that is, any limit point of the sequence generated by Algorithm 1 is astationary point, regardless the initial approximation. Since the problem (1) is convex, then a stationary problem isalso a solution. Theorem IV.7.

Every limit point ρ ∗ of a sequence (cid:8) ρ k (cid:9) , generated by the Algorithm 1, is a stationary point, thatis, ∇ F ( ρ ∗ ) ρ ∗ = ρ ∗ = ρ ∗ ∇ F ( ρ ∗ ) .Proof. Using the Proposition IV.6, we have that (cid:8) D k (cid:9) k , used in Algorithm 1, is gradient related. Since the stepselection in Algorithm 1 satisﬁes the Armijo condition, then we can apply the Theorem III.3 to obtain the claimedresult.In the step 2 of Algorithm 1, instead of successive reductions of the steplength t , one could use, for instance, aquadratic or cubic interpolation [11] to estimate t that maximizes F ( ρ k + tD k ), in order to turn the search moreeﬀective. V. ILLUSTRATIVE EXAMPLES

In this section we selected two illustrative examples to show that Algorithm 1 outperforms the Diluted

RρR algorithm with ﬁxed stepsize [1]. Besides Algorithm 1 converges in problems where the original

RρR does not, it alsoreduces the number of iterations when compared to the ﬁxed stepsize version of the Diluted

RρR , without harmingthe convergence behavior in cases where the last one works.First, we consider the counterexample where the pure

RρR method gets into a cycle [1]. Suppose we made threemeasurements on a qubit with the apparatus described by Π = | ih | and Π = | ih | , detecting | i once and | i twice. We used the completely mixed state as starting point and considered convergence when the distance betweentwo consecutive iterates is small enough (less than 10 − ). For each t ﬁxed in the Diluted RρR , we deﬁne t max = t inthe algorithm that uses line search. We also used γ = 10 − and α = α = 0 . t max N u m be r o f i t e r a t i on s Fixed tLine search t max N u m be r o f i t e r a t i on s Fixed tLine search

Figure 2: Number of iterations as a function of t . The Figure 2 brings the comparison between the version with ﬁxed step size (stars) against the one with line search(circles), described in the previous section. In the left panel, we can see that the number of iterations grows up as the1stepsize t increases, for the “ﬁxed t ” strategy. This was expected because as t → ∞ , the iterations tend to be pure RρR iterations, and in this limit case, there is no convergence. Conversely, the line search strategy keeps the numberof iterations bounded, regardless the value of t max .The right panel is a zoomed version of the left one, in order to show the behavior for small values of t . As expected,although the Diluted RρR guarantees the monotonic increase of the likelihood for suﬃcient small steps, repeating toosmall steps leads to more iterations of the method. The Algorithm 1 ensures a substantial increase of the likelihoodthrough the line search procedure. To avoid extremely small steps, at each iteration of the Algorithm 1, the ﬁrst trialfor t k is at least one. −2 −1 t max N u m be r o f i t e r a t i on s fixed tLine search Figure 3: Number of iterations as a function of t (W state tomography). Second, we consider as data the theoretical probabilities for the W state. The Figure 3 presents the number ofiterations for diﬀerent values of t (log-scale). Again, ﬁxed small values of t will produce a higher number of iterations.It is also important to note, in this example, that the behavior of the line search version is the same as the “ﬁxed t ”one, as the suggested step length t max increases. This means that in the Algorithm 1, the full step t k = t max wasaccepted (fulﬁlls the Armijo condition) in every iteration.In [1], the authors claim that one should ﬁrst try a larger value for the step size t and perform Diluted RρR iterations with the same value of t . If the iterations do not converge, then try a smaller value of t . This ad hocprocedure was motivated because the pattern of the Figure 3 often occurs in practice, and then larger t means lessiterations. However, what should be a good guess for a larger value of t in order to ensure few iterations? And ifthe convergence does not occur, how to choose a smaller value of t to guarantee the convergence? These issues couldresult in a lot of re-runs until a good value of t can be found, which can change from one dataset to another.These examples illustrate that the Armijo line search procedure represents an improvement on the Diluted RρR algorithm, adjusting the step length t just when necessary, and show that the convergence does not depend on aspeciﬁc choice of a ﬁxed step length or the starting point. VI. FINAL REMARKS

We proved the global convergence of the Diluted

RρR algorithm under a line search procedure with Armijo condition.The inexact line search is a weaker assumption than the exact line search used in convergence proofs of a previouswork [1]. Moreover, the proposed globalization by line search does not depend on the guess of a ﬁxed step length forall iterations. Instead, as usual in nonlinear optimization, the step length is adjusted just when necessary in orderto ensure a suﬃcient improvement in the likelihood at each iteration. Thus, the Armijo line search procedure is areliable globalization and represents a practical improvement in the Diluted

RρR algorithm for quantum tomography.2

Acknowledgements

We thanks to the Brazilian research agencies FAPESP, CNPq and INCT-IQ (National Institute for Science andTechnology for Quantum Information). DG also thanks the Brittany Region (France) and INRIA for partial ﬁnancialsupport. [1] J. ˇReh´aˇcek, Z. Hradil, E. Knill, and A. I. Lvovsky, Physical Review A, , 042108 (2007).[2] Z. Hradil, Phys. Rev. A, , R1561 (1997).[3] M. Paris and J. Reh´acek, eds., Quantum State Estimation (Lecture Notes in Physics, vol.649) , Vol. 649 (Springer, 2004).[4] D. S. Gon¸calves, C. Lavor, M. A. Gomes-Ruggiero, A. T. Ces´ario, R. O. Vianna, and T. O. Maciel, Phys. Rev. A, ,052140 (2013).[5] M. Nielsen and I. Chuang, Quantum Computation and Quantum Information (Cambridge Series on Information and theNatural Sciences) , 1st ed. (Cambridge University Press, 2004) ISBN 521635039.[6] T. Evangelista, C. Lavor, and W. R. M. Rabelo, International Journal of Modern Physics C, , 95 (2011).[7] T. O. Maciel and R. O. Vianna, Quantum Information and Computation, , 0442 (2012).[8] Z. Hradil, J. ˇReh´aˇcek, J. Fiur´aˇsek, and M. Jeˇzek, in Quantum State Estimation , Lecture Notes in Physics, Vol. 649(Springer, 2004) pp. 163–172.[9] D. F. V. James, P. G. Kwiat, W. J. Munro, and A. G. White, Phys. Rev. A, , 052312 (2001).[10] D. P. Bertsekas, Nonlinear programming (Athena Scientiﬁc, 1999).[11] J. Nocedal and S. J. Wright,

Numerical Optimization (Springer, 1999).[12] E. de Klerk,

Aspects of Semideﬁnite Programming: Interior Point Algorithms and Selected Applications (Kluwer AcademicPublishers, 2002).[13] K. Toh, M. Todd, and R. Tutunc, Optimization Methods and Software, 545 (1999).[14] J. F. Sturm, “Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones,” (1998).[15] D. S. Gon¸calves, M. A. Gomes-Ruggiero, C. Lavor, O. J. Far´ıas, and P. H. S. Ribeiro, Quantum Information andComputation, , 775 (2012).[16] Y. Vardi and D. Lee, Journal of the Royal Statistical Society. Series B (Methodological),55