Global convergence of diluted iterations in maximum-likelihood quantum tomography
aa r X i v : . [ m a t h - ph ] J un Global convergence of diluted iterations in maximum-likelihood quantum tomography
D. S. Gon¸calves ∗ Departamento de Matem´atica Aplicada, Universidade Estadual de Campinas, Campinas, SP 13083-859, Brazil andIRISA, University of Rennes 1, Rennes, France
M. A. Gomes-Ruggiero and C. Lavor
Departamento de Matem´atica Aplicada, Universidade Estadual de Campinas, Campinas, SP 13083-859, Brazil (Dated: September 11, 2018)In this paper we present an inexact stepsize selection for the Diluted
RρR algorithm [1], usedto obtain the maximum likelihood estimate to the density matrix in quantum state tomography.We give a new interpretation for the diluted
RρR iterations that allows us to prove the globalconvergence under weaker assumptions. Thus, we propose a new algorithm which is globallyconvergent and suitable for practical implementation.PACS number(s): 03.65.Wj
I. INTRODUCTION
In quantum state tomography, the aim is to find an estimate for the density matrix associated to the ensemble ofidentically prepared quantum states, based on measurement results [2–4]. This is an important procedure in quantuminformation and computation, for example, to verify the fidelity of the prepared state [5, 6] or in quantum processtomography [7].Besides the experimental design to get a tomographically complete set of measurements, post processing routinesare required to recover information from the measurement results. Some approaches are based on direct inversion ofthe data while others rest in statistical based methods. For a survey, the reader can see [3].Among the statistical based methods, the Maximum Likelihood estimation (ML) [3, 8] has been often used byexperimentalists [9]. The Maximum Likelihood estimate for the density matrix is that one which maximizes theprobability of the observed data. In [2, 8], it was proposed an iterative procedure to solve the problem of themaximum likelihood estimation for density matrices. We refer to this procedure as the
RρR algorithm. The mainproperties of the
RρR algorithm are: keeping the positivity and unit trace of the iterates and its low computationalcost, involving only matrix products at each iteration.Although in practice the
RρR method works in most of the cases, there is no theoretical guarantee of convergence,regardless the dataset and the initial point. In [1], the authors presented an example where the method gets into acycle. In the same work, they proposed some kind of relaxation of the
RρR iterations, controlling the step size ateach iteration by a positive parameter t . They called this kind of iterations as Diluted RρR iterations. It was provedthat the Diluted
RρR method converges to the maximum likelihood solution if, at each iteration, the optimal valueof the step size t is chosen.However, to find the optimal value of t means to solve another optimization problem at each iteration, whichrepresents an undesirable additional computational cost in practice. This issue was remarked in [1], where it wassuggested some heuristics in order to get some reasonable guess for the step size t in practical implementations, butloosing the convergence warranty.In this work we propose a new stepsize selection procedure which is reliable and feasible in practice. We give anew interpretation to the Diluted RρR iteration, where the search direction is a combination of two ascent directionscontrolled by the step size t . This allow us to apply an inexact line search to determine the step length. Instead ofthe optimal value of t , at each iteration, it is enough to find a value which ensures a sufficient improvement in thelikelihood function in order to prove the global convergence. We propose an algorithm, using an Armijo-like condition ∗ [email protected] [10, 11] and a backtracking procedure, and prove that it is globally convergent and also computationally practicable.This paper is organized as follows. Section II reviews the theory of the RρR algorithms for quantum tomography.The concepts of nonlinear optimization used to prove the convergence of the Diluted
RρR are presented in SectionIII. Section IV presents the proof of global convergence of the Diluted
RρR algorithm under line search and Armijocondition. Examples illustrating the differences and similarities of our proposal to the traditional fixed step lengthare presented in Section V. Section VI closes this work with some final considerations.
II.
RρR
ITERATIONS FOR QUANTUM TOMOGRAPHY
In this section we address the theory and motivation behind the
RρR and the Diluted
RρR algorithms, followingthe references [1, 2, 8].Here we consider measurements described by a POVM set { E i } i , where E i are semidefinite positive operators whichsum to the identity. The relation between the density matrix ρ and the probability outcomes is given by the Born’srule [3]: p i ( ρ ) = tr ( E i ρ ) . Linear inversion methods equate the predicted probabilities { p i ( ρ ) } i with the experimental data { f i } i : f i = tr ( E i ρ ) , ∀ i and the inversion of these linear equations gives an estimate of ρ . The main problem with this approach is that, ingeneral, the frequencies are noisy and this fact can leads to a matrix ρ outside the density matrix space (the Hermitiansemidefinite positive trace one matrices).Among the statistical based methods, the Maximum Likelihood estimation [3, 8] has been often used by experimen-talists [9]. Let us denote ρ † the conjugate transpose of ρ and ρ (cid:23) ρ is semidefinitepositive (or ρ ≻ S = (cid:8) ρ | ρ = ρ † , ρ (cid:23) , tr ( ρ ) = 1 (cid:9) , that one which maximizes the likelihood function. The likelihood function is the probability of getting the observeddata given the density matrix ρ . A common used likelihood [2, 8], for a given data set { f i } , is L ( ρ ) ∝ Y i p i ( ρ ) Nf i , and since the log-likelihood is more tractable, our goal is to find ρ that solves the problemmax ρ X i f i log p i ( ρ ) ≡ F ( ρ )s.t tr ( ρ ) = 1 ρ (cid:23) . (1)The maximization of the objective function F ( ρ ) in (1) is constrained to the density matrix space S which is theintersection of the semidefinite positive cone ρ (cid:23) ρ ) = 1. The constraints may motivateone to try semidefinite programming (SDP) methods [12] for solving (1), but efficient solvers [13, 14] are availableonly for linear and quadratic objective functions.Other methods are based on the reparameterization [9, 15] of the matrix variable ρ = ρ ( θ ) in order to automaticallyfulfill the constraints and then to solve an unconstrained maximization problem in the new variable θ . However,generic numerical optimization methods are often slow when the number of parameters d ( d is the dimension of theHilbert space) is large.Here, we study an alternative algorithm, proposed in [8], which takes advantage of the structure of the problem (1)and has good convergence properties.Consider the gradient of the objective function F ( ρ ), given by ∇ F ( ρ ) = X i f i tr ( E i ρ ) E i ≡ R ( ρ ) , (2)and let int ( S ) be the interior of S , that is int ( S ) = { ρ ∈ S | ρ ≻ } . As it was shown in [8], a matrix ρ ∈ int ( S ) solves (1) if it satisfies the extremal equation R ( ρ ) ρ = ρ, (3)or equivalently R ( ρ ) ρR ( ρ ) = ρ. (4)If the density matrix ρ is restricted to diagonal matrices, the equation (3) can be solved by the expectation-maximization (EM) algorithm [16]. The EM algorithm is guaranteed to increase the likelihood at each step andconverges to a fixed point of (3). However, the EM algorithm cannot be applied to the quantum problem, becausewithout the diagonal constraint it does not preserve the positivity of the density matrix. In [8], it was proposed an iter-ative procedure based on the equation (4) instead. Let k be the iteration index, and so, ρ k the current approximationto the solution. An iteration of the RρR algorithm is given by: ρ k +1 = N R ( ρ k ) ρ k R ( ρ k ) , where N is the normalization constant which ensures unit trace.Notice that the positivity is explicitly preserved at each step. Another remarkable property of the RρR algorithmis its computational cost: at each iteration, it is just required to compute a matrix-matrix product. This is a quitecheap iteration in contrast with the iteration of an semidefinite programming method.Although the
RρR algorithm is a generalization of the EM algorithm, its convergence is not guaranteed in general.In [1], it was presented a counterexample where the method produces a cycle. For this reason, in that work wasproposed the diluted iteration of the
RρR algorithm, or simply “Diluted
RρR ”.The idea is to control each iteration step by mixing the operator R ( ρ ) with the identity operator: ρ k +1 = N (cid:20) I + tR ( ρ k )1 + t (cid:21) ρ k (cid:20) I + tR ( ρ k )1 + t (cid:21) , (5)where t > N is the normalization constant. It is important to observe that as t → ∞ , the iteration tends to theoriginal RρR iteration. Moreover, when t > R ( ρ ) ρ = ρ . It was also shown that the “Diluted RρR ” is convergent to the ML density matrix,if the initial approximation is the maximally mixed state ρ = (1 /d ) I and the optimal value of t : t = arg max t> F ( ρ k +1 ( t )) , (6)is used at each iteration.In nonlinear optimization [10, 11], this is called exact line search. Though the convergence can be achieved usingthis procedure, in general, solving (6) may be computationally demanding. Albeit in [1] the authors proved theconvergence with the exact line search, they suggest that, in practice, one could use an ad hoc scheme to determinethe “best” value of the steplength t to be used through all iterations.Here, instead of (6), we propose an inexact line search to determine the steplength t in each iteration (5). We donot search the best possible t >
0, but one that ensures a sufficient improvement in the log-likelihood. We prove thatthis procedure is well-defined and that the iterations (5) converge to a solution of (1), from any positive initial matrix ρ . The implementation of the inexact line search is straightforward and we also present some examples showing theimprovements, against an ad hoc fixed t strategy. III. GLOBAL CONVERGENCE THEORY FOR ASCENT DIRECTION METHODS
The purpose of this section is to expose some basic concepts of nonlinear optimization which are necessary to provethe global convergence of the Diluted
RρR algorithm under an inexact line search scheme. These concepts are classicalfor the optimization community and are detailed in [10, 11]. To make it easier, we have adapted these concepts usingthe quantum tomography notation.Consider the following maximization problem over the set of Hermitian matrices H :max ρ F ( ρ )s.t ρ ∈ Ω , (7)where f : H → R is a continuously differentiable function and Ω ⊂ H is a convex set.Given an approximation ρ k for the solution of problem (7), ascent direction methods try to improve the currentobjective function value generating an ascent direction D k and updating the iterate ρ k +1 = ρ k + t k D k , (8)where t k is called stepsize or steplength. Definition III.1.
A direction D k is an ascent direction at the iterate ρ k iftr (cid:0) ∇ F ( ρ k ) D k (cid:1) > , and this ensures that, for a sufficient small t k > , the function value is increased. An ascent direction D k is feasibleif ρ k +1 , belongs to Ω for t k ∈ (0 , ε ) , for some ε > . One of the insights of this work is that the diluted
RρR iteration (5) can be written as an ascent direction iteration(8) and so, using the theory of this section, we can prove the global convergence.The iteration (8) can be repeated while there exists a feasible ascent direction. If at some point ρ ∗ there is nofeasible ascent direction, then ρ ∗ is a stationary point . It is well known that every local maximizer is a stationarypoint, but the converse is not true in general. If the function f is concave on the convex set Ω, then a stationarypoint is also a maximizer.A maximization algorithm for the problem (7) is called globally convergent [10, 11] if every limit point of the sequencegenerated by the algorithm is a stationary point, regardless the initial approximation ρ . Although feasible ascentdirections ensure that, for a sufficient small t k >
0, we can increase the function value, this is not enough to ensure theglobal convergence. The reason is that a simple increase in the objective function, F ( ρ k +1 ) > F ( ρ k ), along an ascentdirection is a too modest objective. In order to achieve local maximizers, or at least stationary points, a sufficient increase at each iteration is required.Of course that a natural choice for the steplength t k , along the direction D k , is the solution of the problem: t k = argmax t F ( ρ k + tD k ) , (9)that is called exact line search . However, finding the global maximizer of f along the direction D k is itself a hardproblem, and unless the function f has a special structure such as a quadratic function, for instance, the computationaleffort is considerable.To avoid the considerable computational effort in the exact line search (9), an inexact line search can be performed.A natural scheme is to consider successive stepsize reductions. Since the search is on a ascent direction, eventuallyfor a small t k , we can obtain F ( ρ k + t k D k ) > F ( ρ k ). But, this simple increase can not eliminate some convergencedifficulties. One possible strategy is the use of the Armijo rule , which asks for a steplength t such that a sufficientimprovement in the objective function is obtained: F ( ρ k + tD k ) > F ( ρ k ) + γt tr (cid:0) ∇ F ( ρ k ) D k (cid:1) , (10)where γ ∈ (0 , t until the condition (10) is verified. There are other alternatives tothe successive stepsize reduction, for instance, strategies based on quadratic or cubic interpolation [11].Besides the steplength selection, requirements on the ascent directions D k are also necessary to avoid certainproblems. For example, it is not desirable to have directions D k with small norm when we are far from the solution.It is also necessary to avoid that the sequence of directions (cid:8) D k (cid:9) become orthogonal to the gradient of f , because, inthis case, we are in directions of almost zero variation where too small or none improvement on the objective functioncan be reached. A general condition that avoid such problems is called gradient related condition [10]. Definition III.2 (Gradient related) . A sequence of directions (cid:8) D k (cid:9) is gradient related if for any subsequence (cid:8) ρ k (cid:9) k ∈K that converges to a nonstationary point, the corresponding subsequence (cid:8) D k (cid:9) k ∈K is bounded and satisfies lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) D k (cid:1) > . This condition means that (cid:13)(cid:13) D k (cid:13)(cid:13) does not become ’too small’ or ’too large’ relative to (cid:13)(cid:13) ∇ F ( ρ k ) (cid:13)(cid:13) , and that the D k and ∇ F ( ρ k ) do not become orthogonal.If an algorithm generates ascent directions satisfying the gradient related condition and the stepsizes are selectedaccording to the Armijo rule, then it is possible to prove the global convergence [10]. Theorem III.3 (Global convergence) . Let (cid:8) ρ k (cid:9) be a sequence generated by a feasible ascent directions method ρ k +1 = ρ k + t k D k , and assume that (cid:8) D k (cid:9) is gradient related and t k is chosen by the Armijo rule. Then, every limitpoint of (cid:8) D k (cid:9) is a stationary point.Proof. See [10, Proposition 2.2.1].
IV. CONVERGENCE OF THE DILUTED
RρR
Sections II and III gave us the necessary background to show the convergence of the diluted
RρR iterations usingan inexact line search to determine the stepsize t . In this section, firstly we show that the diluted iteration (5) can bewritten as an ascent direction iteration (8). So, we give a geometrical interpretation and prove that the correspondingsequence of directions (cid:8) D k (cid:9) is gradient related. Finally, using the Armijo condition and a backtracking procedure,we present an algorithm which is globally convergent following the Theorem III.3.From now on, we will use the notation ∇ F ( ρ ) instead of R ( ρ ). So, the equation (5) becomes: ρ k +1 = N (cid:20) I + t ∇ F ( ρ k )1 + t (cid:21) ρ k (cid:20) I + t ∇ F ( ρ k )1 + t (cid:21) (11)or ρ k +1 = ( I + t ∇ F ( ρ k )) ρ k ( I + t ∇ F ( ρ k ))tr (( I + t ∇ F ( ρ k )) ρ k ( I + t ∇ F ( ρ k ))) ≡ G ( ρ k ) . (12)The expression above can be seen as a fixed point iteration. Expanding that expression, we obtain G ( ρ ) = ρ + t ( ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )) + t ∇ F ( ρ ) ρ ∇ F ( ρ )1 + 2 t + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) . (13)Notice that ρ ∗ is a fixed point of G ( ρ ), G ( ρ ∗ ) = ρ ∗ , for t >
0, if the following conditions are satisfied ρ ∗ = ∇ F ( ρ ∗ ) ρ ∗ ∇ F ( ρ ∗ ) = ∇ F ( ρ ∗ ) ρ ∗ . (14)If the above conditions are verified at a positive definite trace one matrix ¯ ρ , then the optimality conditions [10, 11]for the problem (1) are satisfied and ρ ∗ is the maximum likelihood estimate.The following two lemmas are useful when concerning the RρR iterations.
Lemma IV.1.
For all ρ ∈ int ( S ) , we have tr ( ∇ F ( ρ ) ρ ) = 1 . Proof.
Directly from (2).
Lemma IV.2. If ρ ∈ int ( S ) , then tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) ≥ , with equality if and only if ρ = ∇ F ( ρ ) ρ . Proof.
From Lemma IV.1, 1 = tr ( ∇ F ( ρ ) ρ ) = tr (cid:16) ∇ F ( ρ ) ρ / ρ / (cid:17) , and from the Cauchy-Schwarz inequality, we obtain1 = (cid:12)(cid:12)(cid:12) tr (cid:16) ∇ F ( ρ ) ρ / ρ / (cid:17)(cid:12)(cid:12)(cid:12) ≤ tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) tr ( ρ ) = tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) . The equality in Cauchy-Schwarz occurs when ∇ F ( ρ ) ρ / = αρ / , or equivalently, when ∇ ( ρ ) ρ = ρ .Let us simplify the expression (13), defining for some ρ , q ( t ) = 1 + 2 t + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) . Since tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) ≥
1, for any density matrix ρ , we have that q ( t ) ≥ t ≥
0. Furthermore, if ρ ∈ S , theset of density matrices, G ( ρ ) ∈ S as well, for any t ≥
0. Thus, G ( ρ ) defines a path on the density matrices space S ,parameterized by t such that, when t → G ( ρ ) → ρ , and when t → ∞ , G ( ρ ) → ˜ ρ , where˜ ρ = ∇ F ( ρ ) ρ ∇ F ( ρ )tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) , as in the original RρR algorithm [2].Let us also define the point ¯ ρ = ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )2 . (15)Unlike the point ˜ ρ , the point ¯ ρ , in general, is not in the set S .Now, rewriting the expression (13), we obtain G ( ρ ) = 1 q ( t ) ρ + 2 tq ( t ) (cid:18) ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )2 (cid:19) + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ∇ F ( ρ ) ρ ∇ F ( ρ )tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) , (16)that is G ( ρ ) = 1 q ( t ) ρ + 2 tq ( t ) ¯ ρ + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ˜ ρ. Therefore, we have a convex combination of the points ρ, ¯ ρ , ˜ ρ , and the path defined by t is in the convex set whoseextreme points are ρ, ¯ ρ and ˜ ρ , as we can see in Figure 1. ρ k ¯ ρ k ˜ ρ k ¯ D k ˜ D k tr ( ρ ) = 1 ( t → ∞ )( t = 0) ρ k +1 ( t ) S Figure 1: Geometrical interpretation of G ( ρ k ) as a curved path parametrized by t . Finally, defining the directions ¯ D = ¯ ρ − ρ = ∇ f ( ρ ) ρ + ρ ∇ F ( ρ )2 − ρ, (17)˜ D = ˜ ρ − ρ = ∇ F ( ρ ) ρ ∇ F ( ρ )tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − ρ, (18)and using (16), we obtain G ( ρ ) = ρ + 2 tq ( t ) ¯ D + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ˜ D, which provides us an iteration like (8) ˆ ρ = G ( ρ ) = ρ + tD, where D = 2 q ( t ) ¯ D + t tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) q ( t ) ˜ D. (19)The search direction D is a combination of the directions ¯ D and ˜ D controlled by the parameter t . From Figure 1, wecan see that as t → ∞ , D goes to the direction ˜ D , whereas t → D becomes parallel to ¯ D . It is worth to prove thatthese are feasible ascent directions. Proposition IV.3.
The directions ¯ D and ˜ D are feasible ascent directions for any nonstationary point ρ .Proof. To prove that ¯ D is an ascent direction, we need to show that tr (cid:0) ∇ F ( ρ ) ¯ D (cid:1) >
0. Using the definition of ¯ D , weget ¯ D = ¯ ρ − ρ = ∇ F ( ρ ) ρ + ρ ∇ F ( ρ )2 − ρ. For a nonstationary ρ ( ∇ F ( ρ ) ρ = ρ ),tr (cid:0) ∇ F ( ρ ) ¯ D (cid:1) = tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − tr ( ∇ F ( ρ ) ρ ) =tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − > , by the Cauchy-Schwarz inequality, which implies that ¯ D is an ascent direction. If ρ ∈ int ( S ), then there exists t > ρ + t ¯ D ∈ S , so the direction is feasible.In a similar way for ˜ D , tr (cid:16) ∇ F ( ρ ) ˜ D (cid:17) = tr ( ∇ F ( ρ ) ∇ F ( ρ ) ρ ∇ F ( ρ ))tr ( ∇ F ( ρ ) ρ ∇ F ( ρ )) − > . Using the fact that ˜ ρ ∈ S , for t ∈ (0 , ρ + t ˜ D ∈ S as well.Since the direction D is a positive combination of feasible ascent directions, it is also a feasible ascent direction.Now, if we can show that the sequence of directions (cid:8) D k (cid:9) is gradient related, then we can prove the global convergenceunder an inexact line search scheme. First, we present some technical lemmas which are useful to show the desiredresult. Lemma IV.4.
For ρ k ≻ and tr (cid:0) ρ k (cid:1) = 1 , the matrix ¯ ρ k , defined in (15), is the solution of the problem max ρ tr (cid:0) ∇ F ( ρ k )( ρ − ρ k ) (cid:1) − tr (cid:0) ( ρ − ρ k )( ρ k ) − ( ρ − ρ k ) (cid:1) s.t tr ( ρ ) = 1 . (20) Proof.
Consider the optimality conditions for (20): −∇ F ( ρ k ) + 12 (cid:2) ( ρ − ρ k )( ρ k ) − + ( ρ k ) − ( ρ − ρ k ) (cid:3) + λ I = 0 , (21)tr ( ρ ) = 1 . (22)In equation (21), multiplying at the right by ρ k and taking the trace, we have − tr (cid:0) ∇ F ( ρ k ) ρ k (cid:1) + tr (cid:0) ρ − ρ k (cid:1) + λ tr (cid:0) ρ k (cid:1) = 0 , which implies that λ = 1. So, from −∇ F ( ρ k ) + 12 (cid:2) ( ρ − ρ k )( ρ k ) − + ( ρ k ) − ( ρ − ρ k ) (cid:3) + I = 0 , we obtain ρ ( ρ k ) − + ( ρ k ) − ρ = ∇ F ( ρ k ) . Using the symmetry of the solution ρ , the symmetry of ρ k , and ∇ F ( ρ k ), we conclude that ρ = ∇ F ( ρ k ) ρ k = ρ k ∇ F ( ρ k ) = ∇ F ( ρ k ) ρ k + ρ k ∇ F ( ρ k )2 = ¯ ρ k . Lemma IV.5.
The sequence of directions (cid:8) ¯ D k (cid:9) , used to define the sequence (cid:8) ρ k (cid:9) by ρ k +1 = ρ k + t ¯ D k , satisfies lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k )(¯ ρ k − ρ k ) (cid:1) > , for all subsequence (cid:8) ρ k (cid:9) k ∈K that converges to a non-stationary point ρ ′ .Proof. Suppose there is a subsequence (cid:8) ρ k (cid:9) k ∈K that converges to a non-stationary point ρ ′ . Lemma IV.4 tell us that¯ ρ k is the solution of (20). Thus, at ¯ ρ k , the gradient of the objective function of (20) is orthogonal to the hyperplanetr ( ρ ) = 1, that is tr (cid:18)(cid:20) ∇ F ( ρ k ) − (cid:0) (¯ ρ k − ρ k )( ρ k ) − + ( ρ k ) − (¯ ρ k − ρ k ) (cid:1)(cid:21) ( ρ − ¯ ρ k ) (cid:19) = 0 , ∀ ρ such that tr ( ρ ) = 1. Since the feasible set of (20) contains S , we havetr (cid:18)(cid:20) ∇ F ( ρ k ) − (cid:0) (¯ ρ k − ρ k )( ρ k ) − + ( ρ k ) − (¯ ρ k − ρ k ) (cid:1)(cid:21) ( ρ − ¯ ρ k ) (cid:19) = 0 , ∀ ρ ∈ S . Expanding the last expression, we obtaintr (cid:0) ∇ F ( ρ k )( ρ − ¯ ρ k ) (cid:1) = − (cid:2) tr (cid:0) ( ρ k − ¯ ρ k )( ρ k ) − ( ρ − ¯ ρ k ) (cid:1) + tr (cid:0) ( ρ − ¯ ρ k )( ρ k ) − ( ρ k − ¯ ρ k ) (cid:1)(cid:3) , ∀ ρ ∈ S . In particular, for ρ = ρ k ,tr (cid:0) ∇ F ( ρ k )(¯ ρ k − ρ k ) (cid:1) = tr (cid:0) ( ρ k − ¯ ρ k )( ρ k ) − ( ρ k − ¯ ρ k ) (cid:1) = (cid:13)(cid:13) ρ k − ¯ ρ k (cid:13)(cid:13) ρ k ) − . (23)Using the continuity of the solution given by Lemma (IV.4), we havelim k →∞ , k ∈K ¯ ρ k = ¯ ρ = ∇ F ( ρ ′ ) ρ ′ + ρ ′ ∇ F ( ρ ′ )2 . Taking limits in (23), we obtain lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k )(¯ ρ k − ρ k ) (cid:1) = k ρ ′ − ¯ ρ k ρ ′ ) − > . Since ρ ′ is non-stationary, the right hand side of the above inequality is strictly positive and this completes theproof.Finally, using the previous lemmas, we can prove the main assertion of this section. Proposition IV.6.
The sequence of directions (cid:8) D k (cid:9) is gradient related.Proof. First, let us show that (cid:8) D k (cid:9) is bounded. In fact, ρ k +1 ( t k ) = ρ k + t k D k = G ( ρ k ) is in S , since ρ k ≻ t k ≥
0, by definition. In particular, for t k = 1, we have ρ k +1 (1) = ρ k + D k ∈ S , and since S is bounded, then (cid:8) D k (cid:9) is also bounded.Now, let (cid:8) ρ k (cid:9) k ∈K be a subsequence of the sequence (cid:8) ρ k (cid:9) generated by the iterations ρ k +1 = ρ k + t k D k . Suppose (cid:8) ρ k (cid:9) k ∈K converges to a nonstationary point ρ ′ . Using the definition of D k , we obtaintr (cid:0) ∇ F ( ρ k ) D k (cid:1) = 2 q ( t k ) tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) + t k tr (cid:0) ∇ F ( ρ k ) ρ k ∇ F ( ρ k ) (cid:1) q ( t k ) tr (cid:16) ∇ F ( ρ k ) ˜ D k (cid:17) . The second term in the right hand side is nonnegative, thentr (cid:0) ∇ F ( ρ k ) D k (cid:1) ≥ q ( t k ) tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) . Considering t k ∈ (0 , t max ], we have tr (cid:0) ∇ F ( ρ k ) D k (cid:1) ≥ q ( t max ) tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) . Taking the limit for a subsequence converging to a nonstationary point,lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) D k (cid:1) ≥ q ( t max ) lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) ¯ D k (cid:1) , and since (cid:8) ¯ D k (cid:9) is gradient related, by Lemma IV.5,lim k →∞ inf k ∈K tr (cid:0) ∇ F ( ρ k ) D k (cid:1) > , which implies that (cid:8) D k (cid:9) is gradient related.Thus, choosing the step size t k at each iteration, such that the Armijo condition (10) is fulfilled, we obtain a globallyconvergent algorithm following the Theorem III.3.In this way, we can define the steps of a globally convergent algorithm that uses an inexact line search as the following: Algorithm 1Step 0.
Given ρ ≻ (cid:0) ρ (cid:1) = 1, t max > < α < α <
1, set k = 0 and t = t max . Step 1.
If some stopping criterion is verified, stop. Otherwise, compute the directions ¯ D k and ˜ D k , definedin (17) and (18). Set t = max { , t k − } . Step 2.
Set D = q ( t ) ¯ D k + t tr (cid:0) ∇ F ( ρ k ) ρ k ∇ F ( ρ k ) (cid:1) q ( t ) ˜ D k ! . F ( ρ k + tD ) ≤ F ( ρ k ) + γ t tr (cid:0) ∇ F ( ρ k ) D (cid:1) , choose t ∈ [ α t, α t ] and go to Step 2. Step 3.
Set t k = t , D k = D and ρ k +1 = ρ k + t k D k . Go to the step 1.The Theorem IV.7 states the desired result, that is, any limit point of the sequence generated by Algorithm 1 is astationary point, regardless the initial approximation. Since the problem (1) is convex, then a stationary problem isalso a solution. Theorem IV.7.
Every limit point ρ ∗ of a sequence (cid:8) ρ k (cid:9) , generated by the Algorithm 1, is a stationary point, thatis, ∇ F ( ρ ∗ ) ρ ∗ = ρ ∗ = ρ ∗ ∇ F ( ρ ∗ ) .Proof. Using the Proposition IV.6, we have that (cid:8) D k (cid:9) k , used in Algorithm 1, is gradient related. Since the stepselection in Algorithm 1 satisfies the Armijo condition, then we can apply the Theorem III.3 to obtain the claimedresult.In the step 2 of Algorithm 1, instead of successive reductions of the steplength t , one could use, for instance, aquadratic or cubic interpolation [11] to estimate t that maximizes F ( ρ k + tD k ), in order to turn the search moreeffective. V. ILLUSTRATIVE EXAMPLES
In this section we selected two illustrative examples to show that Algorithm 1 outperforms the Diluted
RρR algorithm with fixed stepsize [1]. Besides Algorithm 1 converges in problems where the original
RρR does not, it alsoreduces the number of iterations when compared to the fixed stepsize version of the Diluted
RρR , without harmingthe convergence behavior in cases where the last one works.First, we consider the counterexample where the pure
RρR method gets into a cycle [1]. Suppose we made threemeasurements on a qubit with the apparatus described by Π = | ih | and Π = | ih | , detecting | i once and | i twice. We used the completely mixed state as starting point and considered convergence when the distance betweentwo consecutive iterates is small enough (less than 10 − ). For each t fixed in the Diluted RρR , we define t max = t inthe algorithm that uses line search. We also used γ = 10 − and α = α = 0 . t max N u m be r o f i t e r a t i on s Fixed tLine search t max N u m be r o f i t e r a t i on s Fixed tLine search
Figure 2: Number of iterations as a function of t . The Figure 2 brings the comparison between the version with fixed step size (stars) against the one with line search(circles), described in the previous section. In the left panel, we can see that the number of iterations grows up as the1stepsize t increases, for the “fixed t ” strategy. This was expected because as t → ∞ , the iterations tend to be pure RρR iterations, and in this limit case, there is no convergence. Conversely, the line search strategy keeps the numberof iterations bounded, regardless the value of t max .The right panel is a zoomed version of the left one, in order to show the behavior for small values of t . As expected,although the Diluted RρR guarantees the monotonic increase of the likelihood for sufficient small steps, repeating toosmall steps leads to more iterations of the method. The Algorithm 1 ensures a substantial increase of the likelihoodthrough the line search procedure. To avoid extremely small steps, at each iteration of the Algorithm 1, the first trialfor t k is at least one. −2 −1 t max N u m be r o f i t e r a t i on s fixed tLine search Figure 3: Number of iterations as a function of t (W state tomography). Second, we consider as data the theoretical probabilities for the W state. The Figure 3 presents the number ofiterations for different values of t (log-scale). Again, fixed small values of t will produce a higher number of iterations.It is also important to note, in this example, that the behavior of the line search version is the same as the “fixed t ”one, as the suggested step length t max increases. This means that in the Algorithm 1, the full step t k = t max wasaccepted (fulfills the Armijo condition) in every iteration.In [1], the authors claim that one should first try a larger value for the step size t and perform Diluted RρR iterations with the same value of t . If the iterations do not converge, then try a smaller value of t . This ad hocprocedure was motivated because the pattern of the Figure 3 often occurs in practice, and then larger t means lessiterations. However, what should be a good guess for a larger value of t in order to ensure few iterations? And ifthe convergence does not occur, how to choose a smaller value of t to guarantee the convergence? These issues couldresult in a lot of re-runs until a good value of t can be found, which can change from one dataset to another.These examples illustrate that the Armijo line search procedure represents an improvement on the Diluted RρR algorithm, adjusting the step length t just when necessary, and show that the convergence does not depend on aspecific choice of a fixed step length or the starting point. VI. FINAL REMARKS
We proved the global convergence of the Diluted
RρR algorithm under a line search procedure with Armijo condition.The inexact line search is a weaker assumption than the exact line search used in convergence proofs of a previouswork [1]. Moreover, the proposed globalization by line search does not depend on the guess of a fixed step length forall iterations. Instead, as usual in nonlinear optimization, the step length is adjusted just when necessary in orderto ensure a sufficient improvement in the likelihood at each iteration. Thus, the Armijo line search procedure is areliable globalization and represents a practical improvement in the Diluted
RρR algorithm for quantum tomography.2
Acknowledgements
We thanks to the Brazilian research agencies FAPESP, CNPq and INCT-IQ (National Institute for Science andTechnology for Quantum Information). DG also thanks the Brittany Region (France) and INRIA for partial financialsupport. [1] J. ˇReh´aˇcek, Z. Hradil, E. Knill, and A. I. Lvovsky, Physical Review A, , 042108 (2007).[2] Z. Hradil, Phys. Rev. A, , R1561 (1997).[3] M. Paris and J. Reh´acek, eds., Quantum State Estimation (Lecture Notes in Physics, vol.649) , Vol. 649 (Springer, 2004).[4] D. S. Gon¸calves, C. Lavor, M. A. Gomes-Ruggiero, A. T. Ces´ario, R. O. Vianna, and T. O. Maciel, Phys. Rev. A, ,052140 (2013).[5] M. Nielsen and I. Chuang, Quantum Computation and Quantum Information (Cambridge Series on Information and theNatural Sciences) , 1st ed. (Cambridge University Press, 2004) ISBN 521635039.[6] T. Evangelista, C. Lavor, and W. R. M. Rabelo, International Journal of Modern Physics C, , 95 (2011).[7] T. O. Maciel and R. O. Vianna, Quantum Information and Computation, , 0442 (2012).[8] Z. Hradil, J. ˇReh´aˇcek, J. Fiur´aˇsek, and M. Jeˇzek, in Quantum State Estimation , Lecture Notes in Physics, Vol. 649(Springer, 2004) pp. 163–172.[9] D. F. V. James, P. G. Kwiat, W. J. Munro, and A. G. White, Phys. Rev. A, , 052312 (2001).[10] D. P. Bertsekas, Nonlinear programming (Athena Scientific, 1999).[11] J. Nocedal and S. J. Wright,
Numerical Optimization (Springer, 1999).[12] E. de Klerk,
Aspects of Semidefinite Programming: Interior Point Algorithms and Selected Applications (Kluwer AcademicPublishers, 2002).[13] K. Toh, M. Todd, and R. Tutunc, Optimization Methods and Software, 545 (1999).[14] J. F. Sturm, “Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones,” (1998).[15] D. S. Gon¸calves, M. A. Gomes-Ruggiero, C. Lavor, O. J. Far´ıas, and P. H. S. Ribeiro, Quantum Information andComputation, , 775 (2012).[16] Y. Vardi and D. Lee, Journal of the Royal Statistical Society. Series B (Methodological),55