[PDF] Quantum-state estimation problem via optimal design of experiments

Abstract

In this paper, we study the quantum-state estimation problem in the framework of optimal design of experiments. We first find the optimal designs about arbitrary qubit models for popular optimality criteria such as A-, D-, and E-optimal designs. We also give the one-parameter family of optimality criteria which includes these criteria. We then extend a classical result in the design problem, the Kiefer-Wolfowitz theorem, to a qubit system showing the D-optimal design is equivalent to a certain type of the A-optimal design. We next compare and analyze several optimal designs based on the efficiency. We explicitly demonstrate that an optimal design for a certain criterion can be highly inefficient for other optimality criteria.

Full PDF

QQuantum-state estimation problem via optimal design of experiments

Jun Suzuki ∗ Graduate School of Informatics and Engineering, The University of Electro-Communications,1-5-1 Chofugaoka, Chofu-shi, Tokyo, 182-8585 Japan (Dated: December 23, 2020)In this paper, we study the quantum-state estimation problem in the framework of optimal designof experiments. We ﬁrst ﬁnd the optimal designs about arbitrary qubit models for popular optimalitycriteria such as A -, D -, and E -optimal designs. We also give the one-parameter family of optimalitycriteria which includes these criteria. We then extend a classical result in the design problem, theKiefer-Wolfowitz theorem, to a qubit system showing the D -optimal design is equivalent to a certaintype of the A -optimal design. We next compare and analyze several optimal designs based on theeﬃciency. We explicitly demonstrate that an optimal design for a certain criterion can be highlyineﬃcient for other optimality criteria. I. INTRODUCTION

Studies on any experiment governed by the law of statistics consists of two diﬀerent stages. The ﬁrst step is todesign or prepare a good experimental setup to extract information of interest. The second one is to analyze anactual datum obtained from the chosen experiment. The standard textbooks on statistics focus only on the latterpart; that is, how to extract a quantity of interest from a given datum. The ﬁrst element, known as optimal design ofexperiments (DoE), is a well-established branch of classical statistics.[1–3] It provides a systematic and powerful toolto search for an optimal DoE under a given optimality criterion.Quantum state estimation problems[4–9] are naturally divided into two stages, since measurement outcomes obeythe statistical rule by the axioms of quantum theory. It seems, however, that textbooks on the subject do notemphasize this point clearly nor analyze the problem at hand in the language of optimal DoE. Although severalauthors applied this theory to a quantum system, [10–14] its use has been limited so far. One important message ofthis paper is that so called “incompatibility” of estimating two diﬀerent parameters is already well-known phenomenain the classical theory of optimal DoE. Therefore, we cannot immediately attribute this kind of trade-oﬀ relations toquantum nature of the problem.In a recent paper [15], we developed the general theory to estimate a family of quantum channels based on thetheory of optimal DoE. We made an explicit comment there that quantum-state estimation problems can be handledas a special case. The aim of this paper is two folds. First, we provide a framework of optimal DoE for the problem ofquantum state estimation. We then apply the standard methodology of characterizing an optimal design. We showthat the qubit case is completely solved as an optimal DoE problem. Second, we wish to compare diﬀerent optimaldesigns for a qubit system. Thereby, we explicitly demonstrate that a particular optimal design is not optimal forothers.In the classical theory of DoE, it seems that a systematic comparison of diﬀerent optimal designs is not a commonsubject. Rather, they focus on analyzing a problem at hand based on a particularly chosen optimality criterion. In thisstudy, we emphasize that a proper comparison among diﬀerent optimality criteria is necessary rather than adoptingone particular optimality criterion. This is because there is no universally accepted optimality criterion exists, but itis very subjective. Second reason is that one particular optimal design may become ineﬃcient for the other optimalitycriteria. Indeed, our study suggests that one of the common optimal criteria, the D -optimality, in classical statisticsmay not be suited for quantum-state estimation problems. This is based on the result of the general qubit model inthe tomographic scenario. Other optimal designs are shown to perform very poorly for the D -optimal criterion whenthe purity of quantum states is high.The outline of this paper is as follows. In Sec. II, a brief summary on the classical theory of optimal DoE isgiven. We then apply it to the problem on quantum-state estimation in Sec. III. We also analytically solve commonoptimal design for the general qubit system. In Sec. V, we derive a quantum version of the equivalent theorem in thequbit system. Section VI studies comparisons of diﬀerent optimality criteria. We close our paper by conclusions andremarks in Sec. VII. ∗ [email protected] a r X i v : . [ qu a n t - ph ] D ec II. PRELIMINARIES

In this section, we provide a brief summary of optimal DoE base on non-linear response theory[2, 16–18] Ourformulation is based on the result presented in Ref. [15].

A. Formulation

1. Terminologies and deﬁnitions

Suppose a physical system of interest is speciﬁed by a state s , and we denote a set of all possible states by S .We call the set S as the state space. Typically, S is a subspace of a vector space, which could be real or complexin general. Denote by θ = ( θ , θ , . . . , θ n ) an n -parameter coordinate system for the state space, called a modelparameter , or simply parameter , to describe a state by s θ . Our interest is to analyze a family of states { s θ | θ ∈ Θ } ,where the parameter θ = ( θ i ) takes values in Θ, which is an open subset of R n . In the following, we assume that θ (cid:55)→ s θ is one-to-one and smooth in θ . Therefore, the model parameter θ identiﬁes the state s θ uniquely. A design e describes a particular experimental setup, and E denotes the set of all possible designs, which will be called a designspace . A model function f is a mapping from S × E to a set of probability distributions on X (=: P ( X )). That is, f : ( s, e ) (cid:55)→ p s ( ·| e ) ∈ P ( X ) where ∀ x ∈ X , p s ( x | e ) ≥ (cid:80) x ∈X p s ( x | e ) = 1 hold due to the axiom of probabilitytheory. A familiar example of this kind is a linear regression model that has been intensively studied in the ﬁeld ofoptimal DoE.[3, 16–18]One of the main objectives of optimal DoE is to infer the unknown parameter θ and hence s θ ∈ S by choosing anappropriate design e . A diﬀerence from the usual setting of classical statistics is that a statistical model is speciﬁedby the conditional distribution according to a chosen design e : M ( e ) = { p θ ( ·| e ) | θ ∈ Θ } , where p θ ( ·| e ) = p s θ ( ·| e ) is a shorthand convention. And thus, a datum X is a random variable drawn according to p θ ( ·| e ), which depends on a particular choice of design e . Denoting the conditional expectation value by E θ [ X | e ] := (cid:80) x ∈X x p θ ( x | e ), the mean-square error (MSE) matrix for the estimator ˆ θ : X →

Θ is deﬁned by V θ [ ˆ θ | e ] := (cid:104) E θ (cid:2) (ˆ θ i ( X ) − θ i )(ˆ θ j ( X ) − θ j ) (cid:12)(cid:12) e (cid:3)(cid:105) i,j . We look for the best estimator and design under the condition of locally unbiasedness deﬁned as follows. An estimatorˆ θ = (ˆ θ i ) is said to be locally unbiased at θ under a design e , if E θ [ˆ θ i ( X ) | e ] = θ i and ∂∂θ j E θ [ˆ θ i ( X ) | e ] = δ ij are satisﬁedfor ∀ i, j at θ . When an estimator is locally unbiased at all points θ , it is an unbiased estimator. As explained later,locally unbiasedness for ˆ θ under the design e is fundamental in the theory of DoE. This is in contrast to the standardparameter estimation problem, where locally unbiased estimators are of no practical importance in general.For a ﬁxed design e ∈ E , and assume that the model M ( e ) satisﬁes a certain regularity conditions. We can thenapply the Cram´er-Rao (CR) inequality, V θ [ ˆ θ | e ] ≥ (cid:16) J θ [ e ] (cid:17) − , which holds for any locally unbiased estimator. Here J θ [ e ] is the Fisher information matrix about the statisticalmodel M ( e ) at θ , which is deﬁned by J θ [ e ] := (cid:104) E θ (cid:2) ∂(cid:96) θ ( X | e ) ∂θ i ∂(cid:96) θ ( X | e ) ∂θ j (cid:12)(cid:12) e (cid:3)(cid:105) i,j , where (cid:96) θ ( x | e ) := log p θ ( x | e ) is the logarithmic likelihood function. Note that a locally unbiased estimator alwaysexists at each point, and thus the right hand side of the CR inequality provides the fundamental limit for the MSEmatrix at a given point. With this fact, we aim to minimize the inverse of the Fisher information matrix over allpossible designs.In passing we make remarks on the ﬁgure of merit formulated in this paper and other formulations. It is also possibleto minimize other quantities related to estimation errors, such as the ﬁdelity, trace distance, and quantum relativeentropy between an estimated state and the true state. The current paper focuses on the design problem before actualexperiments are performed. Thus, it is more natural to maximize information about an unknown state as possible,which is measured by the classical Fisher information. In particular, we are interested in the best experimental designat each point. This is to say that an optimal design here is optimal locally. By contrast, we can also investigateoptimal designs on average over possible states with some prior distribution on the state space. This is called theBayesian design problem.[3, 19–21]Another formulation is to ﬁnd the optimal design for the worst case. This is known as the min-max design problem.Extension of the present work to these setting shall be presented elsewhere. Finally, a recent paper [22] studied afamily of precision bounds for a function of the MSE matrix using the concept of weighted f -mean, which is known inthe positive matrix theory. While we have similar optimization problems, the current paper is based on the standardstatistical tool rather than purely mathematical ones.

2. Optimality criteria

To proceed further, we consider diﬀerent types of optimal designs deﬁned by each optimality function. Let Ψ be areal-valued function of non-negative matrix, called an optimality function , such that Ψ( A ) ≥ A ≥

0. We canthen formulate our optimization problem as follows [ ? ].Ψ ∗ := min e ∈E Ψ (cid:16) J θ [ e ] (cid:17) , e ∗ := arg min e ∈E Ψ (cid:16) J θ [ e ] (cid:17) . (1)We call this optimal design e ∗ as a Ψ -optimal design . The optimality function Ψ is assumed to satisfy the followingthree properties.i) Isotonicity: For J ≥ J ≥

0, Ψ( J ) ≥ Ψ( J ) holds.ii) Homogeneity: For a constant a >

0, Ψ( aJ ) = ψ ( a )Ψ( J ) with ψ ( a ) a non-negative function.iii) Convexity: λ ∈ [0 , J , J , Ψ( λJ + (1 − λ ) J ) ≤ λ Ψ( J ) + (1 − λ )Ψ( J ) holds.In addition to the above three conditions, we often impose the following condition.iv) Orthogonality invariance: For any orthogonal matrix O , Ψ( J ) = Ψ( OJO t ). That is an optimality criterion dependsonly on the eigenvalues of the Fisher information matrix.The ﬁrst condition is also known as an operator monotone function in matrix analysis. The second condition is neededto incorporate additivity property of the Fisher information matrix. In the following discussion, we assume that theoptimality function Ψ always satisﬁes these conditions [i), ii), iii)] unless stated otherwise.We list some of the popular criteria:1. A -optimality: Ψ A ( J ) = Tr { J − } .2. D -optimality: Ψ D ( J ) = Det { J − } .3. E -optimality: Ψ E ( J ) = λ max ( J − ) with λ max the maximum eigenvalue.4. c -optimality: Ψ c ( J ) = c t J − c for a given vector c ∈ R n .5. γ -optimal: Ψ γ ( J ) = ( n Tr { J − γ } ) /γ ( γ ∈ R ).Here γ is a ﬁxed parameter, and n = | Θ | is the dimension of the parameter set Θ. It is easy to observe that γ -optimalincludes the ﬁrst three optimality criteria as follows. First, A -optimality is obtained as Ψ A ( J ) = n γ lim γ → Ψ γ ( J )when we set γ = 1). Second, D -optimality corresponds to the limit γ → D ( J ) = (cid:0) lim γ → Ψ γ ( J ) (cid:1) n . Last, E -optimality is related to the limit γ → ∞ , since lim γ → Ψ γ ( J ) = λ max ( J − ). Here, we make a brief comment on thecase of non-invertible Fisher information matrix. It is clear that when J is not full rank, then a certain regularizationis needed to invert J . In this study, we mainly focus on ﬁnding optimal designs that gives raise to a regular statisticalmodel. That is to guarantee matrix inversion of the Fisher information matrix. (We will come back to this point inSec. II E.)Other than the above optimality criterion, there is a special optimal design known as the L¨owner optimality. Thisis deﬁned by the existence of a design e L such that the matrix inequality J θ [ e L ] ≥ J θ [ e ] holds for all other designs e .In general, the L¨owner optimal design does not exists.[3] If so, in fact, it dominates all other Ψ-optimality criteriondue to isotonicity of a function Ψ. It is straightforward to show that the necessary and suﬃcient condition for theexistence of the L¨owner optimal design; the c -optimal design is independent of c for all c ∈ R n . [3]Convexity about the design space E is of importance to ﬁnd an optimal design. We introduce a convex sumof two designs e , e ∈ E is deﬁned as e λ = λ e + (1 − λ ) e ∈ E . Then, the design space E becomes a convexset. With a proper deﬁnition of convex sum of two designs, it is straightforward to check it preserves the locallyunbiasedness condition. In other words, if an estimator ˆ θ is locally unbiased at θ under e and e , then ˆ θ is alsolocally unbiased at θ for e λ . Convexity of the optimality function states that the inequality, Ψ( λJ θ [ e ]+(1 − λ ) J θ [ e ]) ≤ λ Ψ( J θ [ e ]) + (1 − λ )Ψ( J θ [ e ]), holds for e , e ∈ E and λ ∈ [0 , A -, E -optimality satisfythis convexity condition. However, D -optimality violates this condition and the standard remedy is to optimizelog Det { J θ [ e ] − } = − log Det { J θ [ e ] } in stead, which is a convex function. With these additional structures, we canformulate our problem as a convex optimization problem over the convex set. This problem can then be implementedeﬃciently in an appropriate convex optimization algorithm. [2, 16–18] B. Discrete design problem

In this subsection, we extend an estimation strategy for a situation of N repetition of experiments, there are twodistinct strategies as follows.I. i. d. strategy: This strategy corresponds to repeating exactly same design e for N times, whose design is denoted by e N ∈ E N . The probability distribution for this case corresponds to independently and identically distributed (i. i. d.)one as p θ ( x N | e N ) = N (cid:89) t =1 p θ ( x t | e ) . Additivity of the Fisher information matrix applies to get the Fisher information matrix J θ [ e N ] = N J θ [ e ]. Thus, theproblem is reduced to the case N = 1.Mixed strategy: Let N ( m ) be an m -partition of an integer N , i.e., N ( m ) = ( n , n , . . . , n m ) such that (cid:80) mi =1 n i = N and n i ≥

0. The mixed strategy is to repeat a design e for n times, e for n times, . . . , and e m for n m times. ( N experiments in total.) The design for this mixed strategy is speciﬁed by the sets ( p , e ) where p = ( p , . . . , p m ) and e = ( e , . . . , e m ) are vectors of the relative frequency and designs, respectively. This mixed strategy is denoted by e [ N ( m )] = ( p , e ). The probability distribution for the design e [ N ( m )] is p θ ( x N | e [ N ( m )]) = m (cid:89) i =1 p θ ( x n i | e n i i ) = m (cid:89) i =1 n i (cid:89) t i =1 p θ ( x t i | e i ) . The normalized Fisher information matrix, which is divided by N , for the design e [ N ( m )] is J θ (cid:2) e [ N ( m )] (cid:3) = 1 N m (cid:88) i =1 n i J θ [ e i ] . (2)The problem is now to ﬁnd the best partition N ( m ) for a given N and a set of designs e = ( e , . . . , e m ) such that thevalue of the Ψ optimality function Ψ( J θ (cid:2) e [ N ( m )] (cid:3) ) is minimized. This discrete design problem , also known as exactdesign problem , is of importance in practice to ﬁnd the best design. However, this is a combinatoric optimizationproblem, and it is a hard problem even numerically. Thus, one has to ﬁnd an approximated optimal solution to theproblem at hand, which will be given in the next subsection. C. Continuous design problem

When the sample size N is large enough, we approximate the exact design problem by taking the limit N → ∞ with ﬁxed ratios p i = lim N →∞ ( n i /N ) in Eq. (2). This optimization problem is called the continuous design problem or the approximated design problem . In general, the optimal continuous design is a good approximation to the exactdesign problem for suﬃciently large N .The problem here is to ﬁnd an optimal relative frequency p = ( p i ) ∈ P ( m ) (:= a set of probability vector for m events) and a set of designs e = ( e i ) ∈ E m such that the value of a given optimality function Ψ( J θ (cid:2) e ( m )] (cid:3) ) isminimized. Here, we denote the design of this continuous design problem by e ( m ) = ( p , e ) ∈ P ( m ) × E m . (3)The Fisher information matrix about the design e ( m ) takes the convex mixture of each Fisher information matrix as J θ [ e ( m )] = m (cid:88) i =1 p i J θ [ e i ] . (4)We can also state that this is equivalent to the Fisher information about the joint probability distribution: p i p θ ( x | e i ).In other words, the mixed strategy is to consider a statistical model, M ( e ( m )) = (cid:110) p i p θ ( ·| e i ) (cid:12)(cid:12)(cid:12) θ ∈ Θ (cid:111) , (5)with known p i , which is also to be optimized.Summarizing above arguments, the following optimization problem needs to be solved: Given an optimality functionand integer m , to ﬁnd an optimal design e ∗ ( m ) = ( p ∗ , e ∗ ) deﬁned by e ∗ ( m ) = arg min e ( m ) ∈P ( m ) ×E m Ψ (cid:16) m (cid:88) i =1 p i J θ [ e i ] (cid:17) . (6)A convex structure for mixed strategies is naturally constructed from two continuous designs e ( m ) = ( p , e ) , e (cid:48) ( m ) =( p (cid:48) , e (cid:48) ) as λ e ( m ) + (1 − λ ) e (cid:48) ( m ) = (cid:0) λ p + (1 − λ ) p (cid:48) , λ e + (1 − λ ) e (cid:48) (cid:1) , where λ e + (1 − λ ) e (cid:48) = (cid:0) λ e i + (1 − λ ) e (cid:48) i (cid:1) is a well-deﬁned convex sum of two designs. However, it is not unique howto deﬁne a convex sum of two designs e ( m ) and e (cid:48) ( m (cid:48) ) for m (cid:54) = m (cid:48) in general. The usual treatment of this diﬃcultyis to introduce a measure ξ on the design space E . This is to consider an experimental design of the form e ξ := ξ ( d e ).This formalism is certainly more general, since a discrete measure reduces to the case of the above continuous designproblem. The Fisher information about this design then takes of the form: J ( ξ ) := (cid:90) ξ ( d e ) J θ [ e ] . (7)With this notation, the problem is expressed asΨ ∗ = min ξ ∈ Ξ Ψ (cid:16) J ( ξ ) (cid:17) ,ξ ∗ = arg min ξ ∈ Ξ Ψ (cid:16) J ( ξ ) (cid:17) , (8)with Ξ the totality of probability measures on the design space E . We call ξ as the design measure or simply a design when no confusion arises. This is an object of our interest in the theory of optimal DoE. It is easy to see fromCarath´eodory’s theorem that an optimal design measure can be found by using not more than n ( n + 1) / n the number of parameters to be estimated.Another important problem other than ﬁnding an optimal design is to characterize the structure of the Fisherinformation matrix for all possible designs: J ( E ) := { J θ [ e ] | e ∈ E} , J (Ξ) := { (cid:90) ξ ( d e ) J θ [ e ] | ξ ∈ Ξ } . (9)Clearly, J (Ξ) is the convex hull of J ( E ). We call the sets J ( E ) and J (Ξ) as the Fisher information regions . ThisFisher information region is a well-known concept in classical statistics. Several authors studied it in the contextof the quantum estimation theory.[7, 23, 24] The optimization problem takes the following alternative form as aminimization over the convex set: Ψ ∗ := min J ∈J Ψ( J ) ,J ∗ := arg min J ∈J Ψ( J ) . (10)From the optimal Fisher information matrix, we then associate it with the optimal design as J θ [ e ∗ ] = J ∗ . D. Necessary and suﬃcient condition

Under the assumptions made in our discussion, we can derive the necessary and suﬃcient condition for the optimaldesign in various diﬀerent forms. This is one of the central subject in the theory of optimal DoE.[2, 3, 16–18]For a given optimality function Ψ satisfying conditions in Sec. II A, we consider the directional derivative:Ψ (cid:48) ( ξ ; ξ ) := lim (cid:15) → (cid:15) (cid:104) Ψ (cid:0) (1 − (cid:15) ) J ( ξ ) + (cid:15)J ( ξ ) (cid:1) − J ( ξ ) (cid:105) , where J ( ξ ) = (cid:82) ξ ( d e ) J θ [ e ]. It is straightforward to see that ξ ∗ is an optimal design measure if and only if the directionalderivative is nonnegative Ψ (cid:48) ( ξ ∗ ; ξ ) ≥ ξ ∈ Ξ. In the theory of optimal DoE, many of the optimality functionsadmit the following special form for the directional derivative:Ψ (cid:48) ( ξ ; ξ ) = (cid:90) ξ ( d e ) ψ ( e , ξ ) . with ψ ( e , ξ ) some function of the design measure and design. It is convenient to introduce the sensitivity function ϕ for the optimality function Ψ by ϕ ( e , ξ ) := − ψ ( e , ξ ) + C ( ξ ) , (11)where C is a function of the design measure ξ , deﬁned for each Ψ.[16, 18] As examples, let us list two popularoptimal designs; the A -optimality Ψ( J ) = Tr { W J − } with a weight matrix W > D -optimality with Ψ( J ) =log Det { J − } . A -optimality: ψ ( e , ξ ) = Tr (cid:110) W J ( ξ ) − J θ [ e ] J ( ξ ) − (cid:111) , C ( ξ ) = Ψ( ξ ),where Ψ( ξ ) = Ψ( J ( ξ )). D -optimality: ψ ( e , ξ ) = Tr (cid:110) J ( ξ ) J θ [ e ] (cid:111) , C ( ξ ) = n .With these notations, the following theorem holds.[2, 3, 16, 18] Theorem II.1

For an optimality function satisfying the condition discussed before, the following design problems areequivalent.

1) min ξ ∈ Ξ Ψ( ξ ) ,

2) min ξ ∈ Ξ max e ∈E ϕ ( e , ξ ) ,

3) max e ∈E ϕ ( e , ξ ) = Ψ( ξ ) . This theorem can be regarded as a generalization of one of the celebrated results in the theory of optimal DoE, knownas the equivalence theorem due to Kiefer and Wolfowitz.[25] See Refs. [2, 3, 16–18] for more details.

E. Miscellaneous items

To supply some more languages for optimal DoE, we list a few of them. First, we need to be clear on the conceptof optimality in the optimal DoE. Let us consider an optimization problem (1) for simplicity. More general case ofoptimal designs can be considered similarly.The optimal design e ∗ = arg min e ∈E Ψ( J θ [ e ]) is called a local optimal design in the sense that it is optimal at aspeciﬁc point θ . In general, this local optimal design depends on the unknown value θ . In other words, we shouldexpress it as e ∗ ( θ ) = arg min e ∈E Ψ (cid:16) J θ [ e ] (cid:17) . When dealing with the generic statistical models, one always ﬁnds a localoptimal design only. Only when, one simpliﬁes a model, such a simple linear regression model, we can ﬁnd the globaloptimal design, which is optimal uniformly in θ , i.e., ∀ θ , θ (cid:48) , e ∗ ( θ ) = e ∗ ( θ (cid:48) ). In practice, one then has to combineother techniques of DoE to realize an optimal design. This has been studied in the ﬁeld of classical optimal DoE inpast under the name of the adaptive or the sequential design problem. [2, 3, 16–18] The adaptive estimation schemewill not be a subject of our paper due to the page limitation. It is interesting to lean that these adaptive schemeswere independently discovered in the context of quantum state estimation problems. Nagaoka ﬁrst proposed such anadaptive method based on updating the likelihood function.[26] Later, others proposed diﬀerent variants of adaptivemethods.[27, 28] The latter method is based on splitting N samples into two sets. The ﬁrst set is used to give a roughestimate, and then we apply a near optimal strategy for the second set. We note that this method was already wellstudied in the classical statistics.[2, 3, 16–18] As a word of caution, the two-step method for the asymptotic case isa method of proof for convergence. A practical problem in the theory of DoE is to ﬁnd the optimal division of N samples into two sets or more generally several sets, which gives the lowest estimation error.Second, we say that a design e is singular , when the resulting statistical model M ( e ) is not regular. See for exampleRef. [29] on the detail discussion of non-regular models. One common instance of a singular design is when the classicalFisher information matrix is rank deﬁcient. In fact, we often deal with singular models in the theory of optimal DoE. Inthis case, we may use the generalized inverse matrix method to evaluate the inverse of the classical Fisher informationmatrix. However, we cannot estimate all parameters simultaneously. There are alternative techniques known in thetheory of optimal DoE.[2, 16–18] Appendix Sec. 5 of Ref. [15] gives a short summary for these methodologies. Wewill make a few more comments on local optimality and the problem of singular designs in Sec. III F for the quantumcase.In passing, we note that a recent paper [30] discussed non-regular measurements. They called a measurement Π (adesign e in our terminology) is regular, when it is θ -independent. We stress that θ -independence is diﬀerent from theconcept of local optimal design. Further, they introduced a non-regular measurement e θ that also comprises a partof parametric dependence in the resulting statistical model: M ( e θ ) = { p θ ( ·| e θ ) | θ ∈ Θ } . (12)We note that this setting is unusual in the sense that the design e is no longer under our control. It is rather a partof the statistical model itself under consideration. In this special case, one has to diﬀerentiate θ for the family ofdesigns e θ , since we do not have precise knowledge on it.Third, a family of states { s θ | θ ∈ Θ } is said locally identiﬁable at θ , if there exists some neighborhood B θ of θ such that the following conditions is satisﬁed: ∀ θ ∈ B θ , ∀ e ∈ E , p θ ( ·| e ) = p θ ( ·| e ) ⇒ θ = θ . (13)When this property holds for all parameter set, i.e., B θ = Θ, we say this family is (globally) identiﬁable . Clearly, ifstatistical models M ( e ) for all designs e ∈ E are regular, θ (cid:55)→ p θ ( ·| e ) is one-to-one. Thus, the identiﬁability conditionis satisﬁed.In addition to identiﬁability of states, we have an issue of estimability. It is easy to check that we cannot estimateall parameters when a design is singular. In this case, only a certain linear combinations of the parameters can beestimated by this singular design. In the following, we focus on the case of a linear combination of the parameters.See for example Refs. [3, 17] for more general case. Suppose one is only interested in estimating a linear combinationof parameters: θ c := c t θ = n (cid:88) i =1 c i θ i , (14)for a given n -dimensional (column) vector c . In the language of optimal DoE, this setting is the c -optimal designproblem. The parameter θ c is said estimable , if there exists a design e such that the range of J θ [ e ] includes the vector c . Otherwise, the design e cannot be use to estimate θ c . We can also express this condition by the concept of thefeasibility cone as follows. Deﬁne the feasibility cone for c by the subset of non-negative matrices: A ( c ) := { A ∈ R n × n | A ≥ , A c (cid:54) = 0 } . (15)Then, θ c is estimable if and only if J θ [ e ] ∈ A ( c ) for some design e . Therefore, the c -optimal design problem shouldbe reformulated as e ∗ := arg min e ∈E : J θ [ e ] ∈A ( c ) c t (cid:0) J θ [ e ] (cid:1) − c. (16)Here, the inverse of the Fisher information matrix is evaluated in the sense of the generalized inverse.As a ﬁnal remark on the singular design problem, we make a comment on the optimal DoE. The E -optimal designproblem is also expressed as the following alternative form:Ψ E ( J ) = λ max ( J − ) = max c ∈ R n : | c | =1 c t J − c = ( λ min ( J )) − = (cid:18) min c ∈ R n : | c | =1 c t J c (cid:19) − . From this expression, we see that E -optimal design is amount to the min-max optimization of a certain c -optimaldesign problem. Unlike to the standard c -optimality criterion, however, we are interested in estimating all parametersin the E -optimality criterion. Therefore, we should avoid singular optimal designs.Related to the issue of local optimal designs and singular designs, we have a remark on the value of the optimalityfunction. Let us denote the minimum value of an optimality function at θ by Ψ θ . Consider arbitrary two-diﬀerentpoints θ and θ (cid:48) , and corresponding optimal designs: e ∗ ( θ ) := arg min e ∈E Ψ (cid:16) J θ [ e ] (cid:17) e ∗ ( θ (cid:48) ) := arg min e ∈E Ψ (cid:16) J θ (cid:48) [ e ] (cid:17) . The design e ∗ ( θ ) is optimal at θ , but not at θ (cid:48) . Generally speaking, there is no ordering relation between two valuesΨ θ and Ψ θ (cid:48) , nor matrix ordering between two optimal Fisher information matrices, J θ [ e ∗ ( θ )] and J θ (cid:48) [ e ∗ ( θ (cid:48) )]. Tosee this, let us consider the case when two points are nearby each other. In this case, a small deviation θ (cid:48) = θ + δ ,with δ = ( δ i ) a small vector, results in the following approximation up to the ﬁrst order in | δ | : J θ (cid:48) [ e ∗ ( θ (cid:48) )] = J θ + δ [ e ∗ ( θ (cid:48) )] (17) (cid:39) J θ [ e ∗ ( θ (cid:48) )] + (cid:88) k δ k ∂J θ [ e ∗ ( θ (cid:48) )] ∂θ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ , (18)where the partial diﬀerentiation of a matrix is done by component wise. The second term of Eq. (18) is symmetric, butit does not have a deﬁnite sign as a matrix in general. Upon assuming closeness between two designs, we substitutea relation, e ∗ ( θ (cid:48) ) (cid:39) (1 − (cid:15) ) e ∗ ( θ ) + (cid:15) e for some design e and small (cid:15) in the sense of a randomized design. Then, theﬁrst term of Eq. (18) is expressed as J θ [ e ∗ ( θ (cid:48) )] (cid:39) J θ [(1 − (cid:15) ) e ∗ ( θ ) + (cid:15) e ] (19)= (1 − (cid:15) ) J θ [ e ∗ ( θ )] + (cid:15)J θ [ e ] . (20)Therefore, we obtain an approximated relationship between two Fisher information matrices J θ [ e ∗ ( θ )] and J θ (cid:48) [ e ∗ ( θ (cid:48) )] without a deﬁnite matrix ordering.As a ﬁnal remark on the singular design problem, we comment on the use of generalized inverse of the Fisherinformation matrix. When an optimal design e ∗ is singular, an extra complication may arise. In this case, we oftenuse an appropriate generalized inverse of the Fisher information matrix for J θ [ e ∗ ]. In some circumstances, the obtainedresult may depend on a particular choice of generalized inverses. See for example Refs. [3, 17]. This point will beimportant for the c -optimality for example. We will expand this discussion in Sec. III F for the quantum case.To end this subsection, we shortly list three extensions of the theory of optimal DoE. First is the optimal designunder constraint(s). The design is typically subject to an additional constraint(s) in order to take into account realisticexperimental situations. Optimal design of experiments with constraint(s) can also be formulated.[16, 18]Second is a compound optimal design. Consider two optimality functions to deﬁne a new function Ψ ν := ν Ψ +(1 − ν )Ψ with ν ﬁxed positive parameters. e ∗ = arg min Ψ ν [ e ] is called a compound optimal design , and it representsa tradeoﬀ relation between two diﬀerent optimal designs deﬁned by Ψ , Ψ .Last is to evaluate eﬃciency of design. Given an optimality function Ψ, we can deﬁne the optimal design e ∗ forthis optimality. In practice, one is not only interested in ﬁnding the optimal design, but also the performance of asuboptimal design, say e , which can be easily implemented. To this end, we need to know smallness of the valueΨ( e ). Note that Ψ[ e ] is a relative quantity, and hence, we cannot immediately conclude the performance of thedesign e based on the value Ψ( e ) only. The standard way to handle this problem is to consider a normalized versionfunction Ψ by its optimal value Ψ[ e ∗ ], which is deﬁned as η Ψ [ e ] := Ψ[ e ∗ ]Ψ[ e ] = min e ∈E Ψ[ e ]Ψ[ e ] . (21)We call η Ψ [ e ] the eﬃciency of the design e with respect to the optimality function Ψ. By deﬁnition, the normalizedfunction satisﬁes 0 ≤ η Ψ [ e ] ≤

1. Notably, the equality Ψ[ e ] = 1 does not necessary imply e is an optimal design for Ψoptimality.An another application of eﬃciency of design is comparison of diﬀerent optimality criteria. Consider two optimalitycriteria based on Ψ and Ψ . The optimal design e ∗ = arg min Ψ [ e ] is optimal for Ψ but not for Ψ . One maynaively expect that this is also good for Ψ . To quantify how good it is, we can analyze eﬃciency η Ψ [ e ∗ ] = min e Ψ [ e ]Ψ [ e ∗ ] . (22)If this quantity is close to 1, it means that e ∗ is also good for the other criterion. This will be studied in Sec. VI.Applications of the above extended optimal designs were discussed in various statistical problems, see Refs. [3, 16–18]. III. QUANTUM STATE ESTIMATION AS OPTIMAL DESIGN OF EXPERIMENTS

We now apply the theory of optimal DoE to the parameter estimation problem in quantum systems.

A. Deﬁnitions A quantum system is represented by a d -dimensional complex vector space C d . With the standard inner product, itbecomes a Hilbert space denoted by H = C d . When the dimension of the system is two, we speak of “qubit” that isthe simplest quantum system. To simplify our discussion we only consider quantum systems with a ﬁxed dimension d < ∞ . A quantum state ρ is a non-negative matrix on H with unit trace. The set of all quantum states on H isdenoted by S ( H ) := { ρ ∈ C d × d | ρ ≥ , tr { ρ } = 1 } . A measurement Π on a given quantum state ρ is described a set ofpositive semideﬁnite matrices Π = { Π x } x ∈X ( ∀ x, Π x ≥

0) such that the condition (cid:80) x ∈X Π x = I d (Identity matrix) issatisﬁed. When Π is performed on ρ , the measurement outcomes are drawn according to a probability distribution: p ρ ( x | Π) := tr { ρ Π x } . Here the set X is a label set for the measurement outcomes. This probabilistic rule (Born’s rule) will be used to deﬁnethe model function. B. Formulation of the problem

We are now in place to formulate the parameter estimation problem about quantum states as a problem of anoptimal DoE . Given a family of n -parameter quantum states M Q := { ρ θ | θ ∈ Θ ⊂ R n } , under the assumption that θ (cid:55)→ ρ θ is one-to-one and smooth mapping. We identify the quantum state ρ as the state s . The design in our setting is a measurement e = Π, and the model function is given by Born’s rule: f : ( ρ θ , e ) (cid:55)→ p θ ( x | e ) = tr { ρ θ Π x } ( x ∈ X ) . Thus, the design space is the set of all possible POVMs.The statistical model for a design e is obtained as M ( e ) = { p θ ( ·| e ) | θ ∈ Θ } . We wish to ﬁnd an optimal design ξ ∗ ∈ Ξ that minimizes a properly chosen optimality criterion as Eq. (8). Animportant aspect of the optimal design problem for quantum state estimation is that the design e = Π (measurement)is subject to the constraints: ∀ x, Π x ≥ (cid:88) x ∈X Π x = I d , that gives rise to d ( d + 1) / x . A unique feature of DoE in the quantumcase is that these constraints appear in the design space E by the laws of quantum theory.As stated before, convex structure in the design space (a set of all POVMs) is important. A convex mixture oftwo POVMs is deﬁned as follows. Let Π = { Π , Π , . . . , Π k } and Π (cid:48) = { Π (cid:48) , Π (cid:48) , . . . , Π (cid:48) k (cid:48) } be two POVMs. For a given λ ∈ [0 , λ = λ Π ∪ (1 − λ )Π (cid:48) := { λ Π , λ Π , . . . , λ Π k } ∪ { (1 − λ )Π (cid:48) , (1 − λ )Π (cid:48) , . . . , (1 − λ )Π (cid:48) k (cid:48) } = { λ Π , λ Π , . . . , λ Π k , (1 − λ )Π (cid:48) , (1 − λ )Π (cid:48) , . . . , (1 − λ )Π (cid:48) k (cid:48) } , whose measurement outcomes are k + k (cid:48) . The convex structure for the POVM space plays an important role,since the problem can be casted into a convex optimization problem. This point was already pointed out in theliterature.[23, 31, 32]When some of POVM elements are proportional to each other, one could combine them without aﬀecting measure-ment statistics. For example, assume Π (cid:48) = c Π with c a positive constant, then a new POVM element of the formΠ new1 = (cid:0) λ + c (1 − λ ) (cid:1) Π provides the same design.0 C. Extensions of the problem

In this subsection, we brieﬂy list possible extensions of the DoE formalism for the quantum-state estimation problem.We note that most of these results are already known in the literature, yet we could present them in a uniﬁed mannerbased on the language of the theory of optimal DoE.

1. Restricted measurement

When only some of measurements are accessible in laboratory, it does not make sense to ﬁnd an optimal POVMamong all possible POVMs. Let E ⊂ E be the subset of the design space, and consider the following optimizationproblem[ ? ]: Ψ ∗ := min e ∈E Ψ (cid:16) J θ [ e ] (cid:17) , e ∗ := arg min e ∈E Ψ (cid:16) J θ [ e ] (cid:17) . (23)Clearly, this optimal design e ∗ represents what we could best among all “accessible” POVMs. A typical case is whenonly projections measurements are allowed. In this case, we optimize over the PVM space E PVM = { Π | Π is PVM. } .We know that an optimal measurement is not in general given by a PVM. By considering the continuous designproblem, we could do better in general. The problem to be solved now isΨ Random ∗ ( m ) := min e ( m ) ∈∈P ( m ) ×E m PVM Ψ (cid:16) J θ [ e ] (cid:17) , e Random ∗ ( m ) := arg min e ( m ) ∈∈P ( m ) ×E m PVM Ψ (cid:16) J θ [ e ] (cid:17) . (24)Then, we wish ﬁnd an optimal m ∗ minimizing the number of diﬀerent designs. By randomizing diﬀerent designs, theoptimal design e ∗ ( m ∗ ) can perform better; Ψ Random ∗ ( m ∗ ) ≤ Ψ ∗ .Note that one could attain optimal precision in some case by solving the above continuous design problem withinthe restricted design space. In other words, one could do best simply by measuring several PVMs randomly accordingto a proper distribution. In Sec. IV A, we will show that all possible qubit models can exhibit such an optimal solution.In higher dimensional case, it seems that this is not the case. In Sec. III D, we give more discussion on this point.

2. Classical-quantum state formalism

The continuous design problem is also interpreted as follows. The basic idea is to use a classical-quantum (CQ)state. Let us rewrite a quantum state ρ θ as a random mixture of an extended state of the form: (cid:98) ρ θ = m (cid:88) i =1 p i | i )( i | ⊗ ρ θ , (25)where {| i ) } mi =1 is an orthonormal basis for the m -dimensional real vector space H C := R m , and p = ( p i ) denotesthe known probability vector. Thus, the total Hilbert space is extended to (cid:98) H = H C ⊗ H . Next, we consider a setof POVMs e ( m ) = (Π (1) , Π (2) , . . . , Π ( m ) ), whose element forms a valid POVM Π ( k ) = { Π ( k ) x } x ∈X k for each k . If weperform a POVM on the extended space (cid:98) H of the form (cid:98) Π := { (cid:98) Π k,x k } ( k,x k ) with (cid:98) Π k,x k := | k )( k | ⊗ Π ( k ) x k , the resultingstatistical model is given by (cid:99) M ( (cid:98) Π) = { (cid:98) p θ ( ·| (cid:98) Π) | θ ∈ Θ } , (26)where measurement outcomes is labeled by the double index as (cid:98) p θ ( k, x k | (cid:98) Π). By construction, we have (cid:98) p θ ( k, x k | (cid:98) Π) = p k tr { ρ θ Π ( k ) x k } , (27)which forms a joint probability distribution. [See Eq. (5).]1Additivity of the classical Fisher information matrix yields the formula: J θ [ (cid:98) Π] = m (cid:88) k =1 p k J θ [Π ( k ) ] . (28)This is exactly the same formula as Eq. (4), which is obtained as the continuous design problem. Although thismathematical equivalence is almost trivial, this result might come out as a surprise when interpreted as follows.Consider process tomography or a channel estimation problem instead in the framework of DoE.[15] A task here isto design a set of good input states and send them to an unknown channel. Output states are then measured withappropriate POVMs. It is clear that we need to prepare multiple input states to ﬁnd an optimal strategy. If we phrasethe whole process as the CQ state scenario, we might then interpret it as if we only need to prepare a one big CQstate. However, we should not call it as a “one-shot” estimation strategy. A trick here is, of course, we are workingon the inﬁnite sample size limit to approximate the exact design problem.

3. Collective measurement strategy

It is well known that collective measurements on multiple copies of a state can perform equally or better thanindividual measurements depending on the nature of models. The case of collective strategy can also be handledsimilarly. Consider N identical copies of unknown states: ρ ⊗ N θ := ρ θ ⊗ ρ θ ⊗ . . . ⊗ ρ θ . The design now is described bya POVM on the N tensor Hilbert space H ⊗ N = H ⊗ H ⊗ . . . ⊗ H . Then, the optimization problem is given byΨ ( N ) ∗ := min e ∈E ( N ) Ψ (cid:16) J θ [ e ] (cid:17) , e ( N ) ∗ := arg min e ∈E ( N ) Ψ (cid:16) J θ [ e ] (cid:17) , (29)where E ( N ) denotes the set of all possible POVMS on H ⊗ N .

4. Holevo-Nagaoka type bound

In the theory of quantum state estimation, the Holevo bound [5] established the fundamental precision limit. Thisbound is deﬁned by minimizing a function of an n × n positive semi-deﬁnite matrix Z [ (cid:126)X ] whose components are Z [ (cid:126)X ] = (cid:2) tr { ρ θ X k X j } (cid:3) jk , (30)over an n Hermitian operators (cid:126)X = ( X , X , . . . , X n ) under the locally unbiased condition: X := (cid:26) (cid:126)X | X i Hermitian , ∀ ij, tr { ρ θ X i } = 0 , tr { ∂ρ θ ∂θ i X j } = δ ij (cid:27) . (31)It is important to note that when (cid:126)X in the set X , the conditions tr { ∂ρ θ ∂θ i X j } = δ ij require X , X , . . . , X n to be linearlyindependent. And hence, Z [ (cid:126)X ] > (cid:126)X ∈ X . The Holevo bound sets the lowest achievable convergence rate inthe asymptotic limit ( N → ∞ ). [33–36]In the language of the theory of DoE, the Holevo bound gives the ﬁrst order asymptotics for the A -optimality underthe collective POVM strategy explained in the previous subsection. It is then natural to extend the Holevo boundfor other optimality criteria. This is done by a straightforward manner and we only provide the ﬁnal result withoutdetails. Derivation here follows exactly same manner as Nagaoka’s formulation. [43] We shall call the bound as theHolevo-Nagaoka type bound. For a given optimality function Ψ, the Holevo-Nagaoka type bound is given as follows. Theorem III.1

The minimum value of the optimality function is bounded by min ξ ∈ Ξ Ψ (cid:0) ξ (cid:1) ≥ Ψ HN . (32) The Ψ -optimal Holevo-Nagaoka bound Ψ HN is deﬁned by the minimization: Ψ HN := min (cid:126)X ∈X χ Ψ (cid:0) Z [ (cid:126)X ] (cid:1) , (33)2 where χ Ψ is deﬁned indirectly by the minimization: χ Ψ ( Z ) := min J { Ψ (cid:0) J (cid:1) | J > , J − ≥ Z } . (34)As an example, the A -optimal Holevo-Nagaoka type bound is χ Ψ A ( Z ) := Tr (cid:110) Re Z (cid:111) + Tr (cid:110) | Im Z | (cid:111) . Another straightforward extension is to bound the Holevo-Nagaoka type bound further by quantum Fisher infor-mation matrix J Qθ such as the SLD and right logarithmic derivative (RLD) Fisher information matrices. Then, weobtain min ξ ∈ Ξ Ψ (cid:0) ξ (cid:1) ≥ Ψ (cid:0) J Q θ (cid:1) . This also follows from the fact that the quantum Fisher information matrix dominates the classical Fisher informationmatrix J Q θ ≥ J θ [ e ] for all designs. D. Fisher information region

As emphasized in Sec. II C, the Fisher information region is a key concept upon analyzing the problem of ﬁndingoptimal DoE. Generally speaking, it is a hard task to obtain an exact structure for the Fisher information regionanalytically. In some case, this problem is even harder to ﬁnd an optimal design itself. Nevertheless, it is worth derivingan approximated Fisher information region J approx such that the true Fisher information region is the subset. Suchthe larger set J approx can be used to derive the lower bound for the estimation errors for the optimality functionunder consideration. The celebrated Gill-Massar bound [42] was derived by this logic, although the concept of theFisher information region was not utilized explicitly.We now discuss an important property of the Fisher information region about the quantum-state estimation prob-lem. Let us deﬁne two Fisher information regions as in Eq. (9). J ( E POVM ) := { J θ [ e ] | e ∈ E POVM } , J (Ξ) := { (cid:90) ξ ( d e ) J θ [ e ] | ξ ∈ Ξ } , (35)where E POVM denotes the set of all POVMs. By deﬁnition, J (Ξ) is the convex hull of J ( E POVM ), and hence, J ( E POVM ) ⊂ J (Ξ) holds. The diﬀerence between two sets represents how much we could gain by consideringrandomized POVMs, or considering the continuous design problem in the asymptotic limit. It is worth emphasizingthat the quantum-state estimation problem is a special case in the sense that there is no gap between two strategies.The reason behind it is that the general POVM itself contains this kind of randomized POVMs by nature. Tosummarize this result, we have the following result. Proposition III.2

Two Fisher information regions are identical for the quantum-state estimation problem: J ( E POVM ) = J (Ξ) Proof:

To prove the statement, it is enough to show the inclusion relation J (Ξ) ⊂ J ( E POVM ), since the converse rela-tion holds by deﬁnition. Let us consider arbitrary continuous design e ( m ) = ( p , Π ), where Π = (Π (1) , Π (2) , . . . , Π ( m ) )is a set of m diﬀerent POVMs. The Fisher information matrix is expressed as J θ [ e ( m )] = m (cid:88) k =1 p k J θ [Π ( k ) ] . (36)Next, consider the following single POVM, Π = m (cid:91) k =1 p k Π ( k ) . (37)By construction, Π is a convex mixture of m diﬀerent POVMs, which are made up of with (cid:80) mk =1 x k outcomes intotal. It is straightforward to show that the above single POVM gives the same classical Fisher information matrix asEq. (36). The case of an integral form, (cid:82) ξ ( d e ) J θ [ e ] can be done similarly by taking an appropriate limit. In summary,every J ∈ J (Ξ) is also in the set J ( E POVM ), and thus J (Ξ) ⊂ J ( E POVM ) holds.3

E. Analytically solvable cases

1. Single (scalar) parameter model

When the number of parameters characterizing quantum states is equal to one, we can ﬁnd an optimal solutionanalytically. Let M Q = { ρ θ | θ ∈ Θ ⊂ R } be a one-parameter quantum-state model. Then, the well-known property ofthe Fisher information results in the following inequalities. J θ [Π] ≤ J SLD θ [ ρ θ ] ∀ Π ∈ E , (38)where J SLD θ [ ρ θ ] is the symmetric logarithmic derivative (SLD) Fisher information about the parametric state ρ θ .To remind ourselves, the SDL Fisher information matrix about the mixed-state ρ θ is deﬁed as follows. Consider ageneral n -parameter family of states M Q := { ρ θ | θ ∈ Θ } . The i th direction of SLD operator is deﬁned by the solutionof the operator equation ∂ρ θ /∂θ i = ( ρ θ L θ ,i + L θ ,i ρ θ ) /

2. The SLD Fisher information matrix about the model { ρ θ } is then deﬁned by J SLD θ [ ρ θ ] = (cid:2) tr { ρ θ ( L θ ,i L θ ,j + L θ ,j L θ ,i ) } / (cid:3) i,j . The SLD Fisher information is a quantum versionof the Fisher information and is calculated solely by a given parametric quantum state. In the following, we denoteit as J SLD θ = J SLD θ [ ρ θ ] for simplicity when no confusion arises.An optimal measurement attaining the above equality (38) is known.[37–39]. Hence, we can bound all possibleFisher information by the optimal one as (38). This corresponds to the L¨owner optimal design and hence we canconclude that this is the optimal among all possible designs including the mixed strategy. c -optimal design In the literature, the c -optimal design for the quantum-state estimation is known, see for example, Chap. 7 ofRef. [40]. Theorem III.3

Given an n -parameter model M Q , for each n -dimensional (column) vector c = ( c , c , . . . , c n ) t ∈ R n ,the inﬁmum of the MSE matrix in the direction of c is inf e ∈E c t J θ [ e ] − c = c t ( J SLD θ ) − c . (39) An optimal measurement is given by a set of projectors about the operator: L θ , c = n (cid:88) i,j =1 c i J SLD ,ij θ L θ ,j , (40) with J SLD ,ij θ the i, j component of the inverse of the SLD Fisher information matrix and L θ ,j the SLD operator forthe i th parameter θ i . This theorem provides an operational meaning of the SLD Fisher information matrix. In Sec. 5 of the review [41],the detailed discussion on this Theorem was given in the context of the nuisance parameter problem.In general, the optimal design given in this theorem depends on the unknown parameter θ as well as the choiceof the known vector c . Furthermore, the classical Fisher information matrix becomes singular, and hence it is thesingular design problem. To circumvent the singular design problem, one should solve the reﬁned optimization problemgiven by Eq. (16). Otherwise, an obtained optimized design describes purely mathematical one, which is useless. Weillustrate this point by a simple example in the next subsection. F. Local optimal design and singular design

In this subsection, we expand discussions on the issue of local optimal design and singular design, which were brieﬂypresented in Sec. II E.Consider a two-parameter qubit model given by M Q = (cid:26) ρ θ = 12 (cid:18) θ e − i θ θ e i θ (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) ( θ , θ ) ∈ [0 , π ) × (0 , (cid:27) . (41)4The SLD Fisher information matrix of this model is J SLD θ = (cid:18) θ − θ (cid:19) . (42)When we are only interested in estimating the phase of this state θ , whereas θ is treated as the nuisance parameter.The optimal design for the parameter of interest θ is obtained by the c -optimality with c = (1 , t . Theorem III.3provides an optimal projection measurement asΠ ∗ ( θ ) = (cid:26) (cid:18) ± i e − i θ ∓ i e i θ (cid:19)(cid:27) . (43)Clearly, this measurement depends on the unknown parameter θ , and hence it is a local optimal design. The classicalFisher information matrix of this optimal measurement is J ∗ θ := J θ [Π ∗ ( θ )] = (cid:18) θ

00 0 (cid:19) . (44)Thus, this optimal Π ∗ ( θ ) is the singular design in our terminology.First, let us discuss the issue of estimability discussed in Sec. II E. The feasibility cone (15) for the parameter θ = c t θ is given by A ( c ) = { A ∈ R × | A > } ∪ { a cc t = (cid:18) a

00 0 (cid:19) | a > } . (45)We see that the Fisher information matrix (44) for the c -optimal design Π ∗ ( θ ) is in this feasibility cone. Hence, θ is estimable by this optimal design. Next, we touch on the singular design problem. Deﬁne the set of all generalizedinverse matrices of J ∗ θ by GI(J ∗ θ ) := { J − ∈ R × | J ∗ θ J − J ∗ θ = J ∗ θ } . (46)It is easy to obtain the following explicit expression.GI(J ∗ θ ) = (cid:26) (cid:18) ( θ ) − ab c (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) a , b , c ∈ R (cid:27) . (47)Therefore, any generalized inverse of J ∗ θ attain the optimal value as c t J − c = c t ( J SLD θ ) − c , ∀ J − ∈ GI(J ∗ θ ) . (48)This suggests that there are other optimal design whose Fisher information matrix gives the same generalized inverseis in the set GI(J ∗ θ ). However, one should only consider an optimal design lying on the feasibility cone, otherwise itonly gives a meaningless design.Finally, we elaborate on local optimality of the design Π ∗ ( θ ). In reality, we can only perform an approximatedoptimal design with uncertainty in θ in the ﬁnite sample case. Upon using e δ := Π ∗ ( θ + δ ) with an uncertainty δ inthe knowledge about θ , the Fisher information matrix and its (Moore-Penrose) generalized inverse is calculated as J θ [ e δ ] = 11 − θ sin δ (cid:18) θ cos δ sin δ (cid:19) ( θ cos δ sin δ ) , (49) (cid:0) J θ [ e δ ] (cid:1) − = 1 − θ sin δ ( θ cos δ + sin δ ) (cid:18) θ cos δ sin δ (cid:19) ( θ cos δ sin δ ) . (50)By evaluating the (1 ,

1) component of the generalized inverse, we obtainΨ c [ e δ ] = c t (cid:0) J θ [ e δ ] (cid:1) − c = 1 − θ sin δ ( θ cos δ + sin δ ) θ cos δ. (51)We can show that Ψ c [ e δ ] can be lower than its optimal value Ψ c [ e ∗ ] = ( θ ) − for δ (cid:54) = 0. For example, consider thecase of small δ , then the Taylor series expansion givesΨ c [ e δ ] (cid:39) θ − θ (cid:20) θ + 2 θ − (cid:21) δ . (52)The second term is always negative. In fact, the optimal value for Ψ c [ e δ ] is zero, which is attained by a choice δ = π/ θ is estimable by the design e δ , if and only if its Fisher information matrix is in the feasibility cone A ( c ). This condition singles out the true optimal design with δ = 0.5 IV. QUBIT MODEL

In this section, we consider a general qubit model; M Q = { ρ θ ∈ S ( C ) | θ ∈ Θ ⊂ R n } . The single parameter case issolved in Sec. III E 1, and we consider two or three parameter models ( n = 2 , Lemma IV.1

For a qubit model, { ρ θ | θ ∈ Θ } , the Fisher information matrix about given a measurement Π can beexpressed in the form as J θ [Π] = (cid:113) J SLD θ J (cid:113) J SLD θ where J is some nonnegative-deﬁnite matrix satisfying the condition Tr { J } ≤ . This immediately yields the following corollary, known as the Gill-Massar inequality.[42]

Corollary IV.2

In a qubit system, the Fisher information matrix for any design ξ satisﬁes, Tr (cid:110)(cid:0) J SLD θ (cid:1) − J ( ξ ) (cid:111) ≤ , (53) where the equality holds if and only if a measurement consists of rank-1 operators. Another important property is the following lemma.

Lemma IV.3

For any qubit model M Q = { ρ θ | θ ∈ Θ } , let T be a positive matrix, then the following optimization hasthe solution: max e ∈E Tr (cid:110) J θ [ e ] T (cid:111) = λ max ( J SLD θ T ) , (54) where the λ max ( A ) denotes the maximum eigenvalue of the matrix A . A. Fisher information region for a qubit model

We can apply Lemma IV.1 to obtain the Fisher information region (9), the set of all possible Fisher informationmatrices: J (Ξ) = { (cid:113) J SLD θ J (cid:113) J SLD θ | J ≥ , Tr { J } ≤ } . (55)To better understanding, we have an explicit construction of the Fisher information matrix based on the SLDoperator. Let L θ ,i be the i th direction of the SLD operator. Given a unit vector u = ( u i ) ∈ R n ( u t u = | u | = 1),performing a projection measurement about an observable, L u := (cid:88) i,j u i ( J SLD θ ) − / ij L θ ,j , (56)yields the following form of the Fisher information matrix: J θ [Π( L u )] = (cid:113) J SLD θ uu t (cid:113) J SLD θ , which is rank-1. Consider a general experimental design e ( n ) = ( p , e ) for the n -parameter case of the form p =( p , . . . , p n ) ∈ P ( n ) and e = u := ( u , . . . , u n ) ↔ (Π( L u ) , . . . , Π( L u n )). (Note here that Π( L u )) is uniquelyspeciﬁed by a unit vector u .) The Fisher information matrix for this design is J θ [ e ( n )] = n (cid:88) i =1 p i (cid:113) J SLD θ u i u ti (cid:113) J SLD θ . (57)The matrix J = (cid:80) i p i u i u ti satisﬁes Tr { J } = 1 and can span all possible nonnegative-deﬁnite matrices J appearing inEq. (55). Thus, we can set n vectors u , . . . , u n to be orthogonal to each other to optimize the function Ψ( J θ [ e ( n )])over p and u . That is, u forms an orthonormal basis of R n . We can conﬁrm that the design e ( n ) for the n -parametercase can achieve optimal design among all possible e ( m ) with m ∈ N , that is, e ∗ ( m ∗ ) = e ∗ ( n ) holds.Combining discussions above, we arrived at the following statement. Proposition IV.4

For any qubit model, let J (Ξ) be the Fisher information region for all possible designs, and denoteby J ( E PVM ) the Fisher information region set by a convex mixture of all possible projection measurements. Two Fisherinformation regions are identical for the quantum-state estimation problem: J ( E PVM ) = J (Ξ) . B. Analytical forms of optimal designs

In this subsection, we study A -, D -, E -, and γ -optimal design. Each optimal design is constructed as randomizedmixture of PVMs in consistent to the above proposition. We ﬁrst derive the γ -optimal design, and then we list A -, D -, and E -designs. γ -optimal design We ﬁnally construct the γ -optimal design for the qubit model. Its optimality function is Ψ γ [ J θ [ e ( n )]] = (cid:0) n Tr { J θ [ e ( n )] − γ } (cid:1) /γ ( γ ∈ R ). The result is given as follows. Theorem IV.5

Given an n -parameter qubit model ( n = 2 , ), an optimal design e ∗ ( n ) and the minimum γ -optimalityfunction ( γ (cid:54) = 0 ) are given by min e ( n ) Ψ γ [ J θ [ e ( n )]] = 1 n γ (cid:0) Tr { ( J SLD θ ) − γγ +1 } (cid:1) γ +1 γ , p ∗ = ( p i ) with p i = ( λ SLD i ) − γγ +1 / (cid:88) j ( λ SLD j ) − γγ +1 , u ∗ = ( u SLD i ) , (58) where λ SLD i and u SLD i are the eigenvalues and eigenvectors of the SLD Fisher information matrix. The necessary andsuﬃcient condition for the optimal design e ∗ is that the Fisher information matrix for e ∗ satisﬁes J θ [ e ∗ ] = 1Tr (cid:110) ( J SLD θ ) − γγ +1 (cid:111) ( J SLD θ ) γ +1 . (59) Proof:

We extend the proof used in Ref. [23]. First note that it suﬃces to ﬁnd the minimum for Tr { J θ [ e ( n )] − γ } .Consider the functional of n × n positive matrix J > f ( J ) := Tr (cid:110) ( SJ − S ) γ (cid:111) + λ (cid:16) Tr (cid:110) J (cid:111) − (cid:17) , (60)where S = ( J SLD θ ) − / and λ is the Lagrange multiplier. Taking a variation with respect to J gives δf ( J ) = Tr (cid:110) − ( SJ − S ) γ γ ( S − δJS − )( S − JS − ) γ − ( SJ − S ) γ (cid:111) + λ Tr (cid:110) δJ (cid:111) = − Tr (cid:110) (cid:2) γS − ( SJ − S ) γ +1 S − − λI (cid:3) δJ (cid:111) . Therefore, the stationary condition yields the relation. γS − ( SJ − ∗ S ) γ +1 S − − λI = 0 ⇔ ( SJ − ∗ S ) γ +1 = λγ S ⇔ J ∗ = (cid:18) λγ (cid:19) γ +1 ( J SLD θ ) − γγ +1 . The condition Tr (cid:110) J ∗ (cid:111) = 1 determines λ as λ = γ (cid:16) Tr (cid:110) ( J SLD θ ) − γγ +1 (cid:111)(cid:17) γ +1 , and the Fisher information matrix for the optimal design e ∗ is obtained as J θ [ e ∗ ] = (cid:113) J SLD θ J ∗ (cid:113) J SLD θ = 1Tr (cid:110) ( J SLD θ ) − γγ +1 (cid:111) ( J SLD θ ) γ +1 . This is equivalent to the condition (59). This expression immediately gives expression for Ψ γ [ J θ [ e ∗ ( n )]] in the theorem.To ﬁnd an optimal design, we can solve J ∗ = (cid:80) i p i u i u ti . It is straightforward to check the optimal design e ∗ ( n ) givenin the theorem satisﬁes this relation.7 A -optimal design The A -optimal design for the qubit model is known. The Nagaoka bound corresponds to the case of n = 2,[43]and Hayashi-Gill-Massar bound is identical to n = 3.[42, 44] Yamagata gave a uniﬁed treatment for the qubit case asfollows.[23] min e ( n ) Ψ A [ J θ [ e ( n )]] = 1 n (cid:0) Tr { ( J SLD θ ) − / } (cid:1) , p ∗ = ( p i ) with p i = ( λ SLD i ) − / / (cid:88) j ( λ SLD j ) − / , u ∗ = ( u SLD i ) , (61)where λ SLD i and u SLD i are the eigenvalues and eigenvectors of the SLD Fisher information matrix J SLD θ . D -optimal design Let us discuss the D -optimal design. Since the Fisher information matrix J θ [Π] is expressed as in Eq. (57), theminimization of determinant of J θ [ e ( n )] − is equivalent to maximize the value (cid:81) ni p i . It is straightforward to see that p ∗ = (1 /n, . . . , /n ) is the optimal choice for the D -optimal design, and we haveΨ D ( J θ [ e ∗ ( n )]) = n − n Det { J SLD θ − } . (62)Furthermore, an optimal set of projection measurements is speciﬁed by arbitrary set of orthonormal vectors u ∗ = { u i } through expression (56). E -optimal design We next give the E -optimal design for the qubit model. As we remarked earlier, we only consider the full-rank Fisherinformation. Otherwise, any singular design cannot be used to estimate all parameters. An optimal measurement isagain a set of measurements about the directions of the SLD operators as in Theorem IV.5. This then leads to thefollowing minimization: Ψ E ( J θ [ e ∗ ( n )]) = min p max (cid:110)(cid:0) p i λ ( J SLD θ ) (cid:1) − (cid:111) = (cid:18) max p min (cid:8) p i λ ( J SLD θ ) (cid:9)(cid:19) − . The optimal relative frequency for the E -optimal design instead takes the form: p ∗ i = ( λ SLD i ) − (cid:80) nj =1 ( λ SLD j ) − , (63)where λ SLD i are the eigenvalues of the SLD Fisher information matrix. The minimum value of the maximum eigenvalueof the Fisher information matrix is given byΨ E ( J θ [ e ∗ ( n )]) = Tr (cid:110) J SLD θ − (cid:111) . (64) V. QUANTUM EQUIVALENCE THEOREM FOR A QUBIT SYSTEM

In this section, we prove a quantum version of equivalence theorem. Combining the results regarding the qubitmodel yields the following theorem.8

Theorem V.1

For any qubit model, the following optimization problems are equivalent.

1) min ξ ∈ Ξ Det { J ( ξ ) − } ,

2) min ξ ∈ Ξ Tr (cid:110) J SLD θ J ( ξ ) − (cid:111) . that is the D -optimal design coincides with the A -optimal design with the weight matrix J SLD θ . Proof:

Let us consider an alternative expression of the D -optimality function, Ψ D ( J ) = log (cid:2) Det { J − } (cid:3) = − log[Det { J } ].The sensitivity function is given by ϕ ( e , ξ ) = Tr (cid:110) J θ [ e ] J ( ξ ) − (cid:111) . From Theorem II.1, we have the following equivalence: ξ ∗ = min ξ ∈ Ξ Ψ D ( J ( ξ )) ⇔ max e ∈E Tr (cid:110) J θ [ e ] J ( ξ ∗ ) − (cid:111) = n. The maximization problem is solved by Lemma IV.3 to get the condition: λ max (cid:16) ˆ J ( ξ ∗ ) − (cid:17) = n, (65)where ˆ J ( ξ ∗ ) := ( J SLD θ ) − / J ( ξ ∗ )( J SLD θ ) − / . Next, from the sensitivity function for the A -optimality function withthe weight matrix J SLD θ , we ﬁnd ξ ∗ is A -optimal if and only ifmax e ∈E Tr (cid:110) J SLD θ J ( ξ ∗ ) − J θ [ e ] J ( ξ ∗ ) − (cid:111) = Tr (cid:110) J SLD θ J ( ξ ∗ ) − (cid:111) ⇔ λ max (cid:16) ˆ J ( ξ ∗ ) − (cid:17) = Tr (cid:110) ˆ J ( ξ ∗ ) − (cid:111) , (66)where Lemma IV.3 was used. Let j ≥ j ≥ · · · ≥ j n ( >

0) be the eigenvalues of ˆ J ( ξ ∗ ), then Corollary IV.2 states (cid:80) k j k ≤

1. With this notation, the D -optimality condition is equivalent to j − n = n . This is also equivalent to j k = 1 /n for all k due to the constraint (cid:80) k j k ≤

1. Finally, the A -optimality condition is expressed as j − n = (cid:80) k j − k .This then implies j k = 1 /n for all k . This completes the proof. VI. COMPARISON OF OPTIMAL DESIGNS

In this section, we compare optimal designs for A -, D -, and E -optimality criteria. We denote these optimal designsby e A , e D , and e E , respectively. As an another reference, we consider the so called the standard tomography e ST .This is deﬁned by the design e ST = ( p ST , e ST ) with p ST = (1 / , / , /

3) and e ST = (Π (1) , Π (2) , Π (3) ). Here, Π ( k ) ( k = 1 , ,

3) are the projection measurements about k th Pauli matrix σ k .We ﬁrst list Fisher information matrices for these designs. J A := J θ [ e A ] = 1Tr (cid:110) ( J SLD θ ) − / (cid:111) ( J SLD θ ) / , (67) J D := J θ [ e D ] = 1 n J SLD θ , (68) J E := J θ [ e E ] = 1Tr (cid:110) ( J SLD θ − ) (cid:111) I n , (69) J ST := J θ [ e ST ] =  (cid:88) k =1 , , − s θ ,k ∂s θ ,k ∂θ i ∂s θ ,k ∂θ j  , (70)where n is the number of parameters and I n denotes the n × n identity matrix. For the Fisher information matrix ofthe standard tomography, we use the Bloch vector representation of the state, s θ ,j = tr { ρ θ σ j } with σ j ( j = 1 , , γ -optimal design e γ : J γ := J θ [ e γ ] = 1Tr (cid:110) (cid:0) J SLD θ (cid:1) − γγ +1 (cid:111) (cid:0) J SLD θ (cid:1) γ +1 . (71)9Let us observe from this result that the structure is very diﬀerent for each optimal Fisher information matrix.Explicitly, the eigenvalues of J A , J D , and J E are all diﬀerent in general.As a concrete example, we consider the standard parametrization of the general qubit state with the Stokes param-eters. Its model is given by M Q = (cid:26) ρ θ = 12 (cid:18) θ θ − i θ θ + i θ − θ (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) θ ∈ Θ (cid:27) . (72)Here, Θ = (cid:8) θ ∈ R (cid:12)(cid:12) | θ | = (cid:80) i ( θ i ) < (cid:9) . For this model, the SLD Fisher information matrix can be computed, andits inverse is ( J SLD θ ) − = I − θθ t , where θ := ( θ , θ , θ ) t denotes the column vector. We list the inverse matrices of the Fisher information for eachoptimal design: J − A = (2 + (cid:112) − | θ | ) (cid:34) I + (cid:112) − | θ | − | θ | θθ t (cid:35) , (73) J − D = 3( I − θθ t ) , (74) J − E = (cid:0) − | θ | (cid:1) I , (75) J − ST = 3 diag . (cid:0) − ( θ ) , − ( θ ) , − ( θ ) (cid:1) . (76) A. A -optimality Let us consider the A -optimality function Ψ A ( J ) = Tr (cid:110) J − (cid:111) . First, values of the A -optimality function for optimaldesigns are Ψ A ( J A ) = (cid:16) Tr (cid:110) ( J SLD θ ) − / (cid:111)(cid:17) , Ψ A ( J D ) = n Tr (cid:110) ( J SLD θ ) − (cid:111) , Ψ A ( J E ) = n Tr (cid:110) ( J SLD θ ) − (cid:111) = Ψ A ( J D ) , Here we omit expression for the standard tomography design, since it is rather lengthy. From above results, weimmediately see that e D and e E perform exactly same in terms of the A -optimality.We next consider model (72). The results including the standard tomography design areΨ A ( J A ) = (2 + (cid:112) − | θ | ) , Ψ A ( J D ) = Ψ A ( J E ) = Ψ A ( J ST ) = 3 (cid:0) − | θ | (cid:1) . Interestingly, Ψ A takes the same values for e D , e E , e ST .As a normalized version of these values, we compare the eﬃciency, η A [ e ] = Ψ A ( J A ) / Ψ A ( J [ e ]), deﬁned in Sec. II C.By deﬁnition, η A [ e A ] = 1 and others are η A ( e D ) = η A ( e E ) = η A ( e ST ) = (2 + (cid:112) − | θ | ) (cid:0) − | θ | (cid:1) . In Fig. VI A, we plot eﬃciency functions η A [ e A ] = 1 (Black solid curve) and η A ( e D ) = η A ( e E ) = η A ( e ST ) (Gray solidcurve) as a function of | θ | . It is clear that η A ( e D ) = η A ( e E ) = η A ( e ST ) is a monotonically decreasing function of | θ | = (cid:80) i ( θ i ) . The inﬁmum is given by the pure-state limit | θ | →

1, whose value is 2 /

3. For small values of | θ | ,on the other hand, it becomes close to one. This means that there is no signiﬁcant diﬀerence among diﬀerent optimaldesigns when a state is closed to the completely mixed state.0 � � � � � � � � � � � � � � � � � �� θ � � � � � � � � � FIG. 1. Eﬃciency functions for the A -optimality criterion. B. D -optimality Let us consider the D -optimality function Ψ D ( J ) = Det { J − } . Values of the D -optimality function for optimaldesigns are Ψ D ( J A ) = Tr (cid:110) ( J SLD θ ) − / (cid:111) Det { ( J SLD θ ) − / } , Ψ D ( J D ) = n n Det { ( J SLD θ ) − } , Ψ D ( J E ) = (cid:16) Tr (cid:110) ( J SLD θ ) − (cid:111)(cid:17) n . For model (72), the results are Ψ D ( J A ) = (2 + (cid:112) − | θ | ) (cid:112) − | θ | , Ψ D ( J D ) = 3 (cid:0) − | θ | (cid:1) , Ψ D ( J E ) = (cid:0) − | θ | (cid:1) , Ψ D ( J ST ) = 3 (cid:89) k =1 , , (cid:0) − ( θ k ) (cid:1) . Eﬃciencies are calculated as η D ( e A ) = 3 (cid:112) − | θ | (cid:16) (cid:112) − | θ | (cid:17) ,η D ( e E ) = 3 (cid:0) − | θ | (cid:1)(cid:0) − | θ | (cid:1) ,η D ( e ST ) = (cid:0) − | θ | (cid:1) (cid:89) k =1 , , (cid:0) − ( θ k ) (cid:1) − . When compared to the A -optimal case, the performance of the standard tomography is not rotationally symmetric.To be speciﬁc, it eﬃciency η D ( e ST ) explicitly depends on the direction of the Bloch vector.In Fig. VI B, we plot four eﬃciency functions η D [ e A ] (Black solid curve), η D ( e D ) = 1 (Dotted curve), η D ( e E )(Dashed curve), η D ( e ST ) (Gray solid curve) as a function of | θ | . To produce these ﬁgures, we ﬁx a particulardirection of the Bloch vector given by (sin θ cos φ , sin θ sin φ , cos θ ) and then we change the square of the length | θ | . In the left plot, we choose θ = π/ , φ = π/

4. Another choice θ = π/ , φ = π/ η D ( e D ) ≥ η D ( e A ) , η D ( e ST ) ≥ η D ( e E ) . η D ( e A ) > η D ( e E ) ⇔ Ψ D ( J A ) ≤ Ψ D ( J E ) for all | θ | >

0. This can be done by analyzing Ψ D ( J E ) − Ψ D ( J A ) as afunction of | θ | . The other relation η D ( e ST ) ≥ η D ( e E ) ⇔ Ψ D ( J ST ) ≤ Ψ D ( J E ) obeys from the inequality of arithmeticand geometric means. From Fig. VI B, we see that there is no ordering between η D ( e A ) and η D ( e ST ).Next, we note that η D ( e A ) , η D ( e E ) , η D ( e ST ) become zero as | θ | approaches one (The pure-state limit). Thisindicates that designs e A , e E , e ST become completely useless in terms of D -optimality. We elaborate on this inSec. VI D. � � � � � � � � � � � � � � � � � �� θ � � � � � � � � � � � � � � � � � � � � � � � � � � �� θ � � � � � � � � � FIG. 2. Eﬃciency functions for the D -optimality criterion. C. E -optimality Let us consider the E -optimality function Ψ E ( J ) = λ max { J − } . Values of the E -optimality function for optimaldesigns are Ψ E ( J A ) = Tr (cid:110) ( J SLD θ ) − / (cid:111) λ max { ( J SLD θ ) − / } , Ψ E ( J D ) = nλ max { ( J SLD θ ) − } , Ψ E ( J E ) = Tr (cid:110) ( J SLD θ ) − (cid:111) . For model (72), we have Ψ E ( J A ) = 2 + (cid:112) − | θ | , Ψ E ( J D ) = 3 , Ψ E ( J E ) = 3 − | θ | , Ψ E ( J ST ) = 3(1 − min { ( θ i ) } ) . Eﬃciencies are obtained as η E ( J A ) = 3 − | θ | (cid:112) − | θ | ,η E ( J D ) = 13 (cid:0) − | θ | (cid:1) ,η E ( J ST ) = 3 − | θ | − min { ( θ i ) } ) . In Fig. VI C, we plot four eﬃciency functions η E [ e A ] (Black solid curve), η E ( e D ) (Dotted curve), η E ( e E ) = 1(Dashed curve), η E ( e ST ) (Gray solid curve) as a function of | θ | . As in Fig. VI B, we choose two particular directions2 � � � � � � � � � � � � � � � � � �� θ � � � � � � � � � � � � � � � � � � � � � � � � � � �� θ � � � � � � � � � FIG. 3. Eﬃciency functions for the E -optimality criterion. of the Bloch vector: θ = π/ , φ = π/ θ = π/ , φ = π/ E -optimality, we have the following ordering relation:1 = η E ( e E ) ≥ η E ( e A ) , η E ( e ST ) ≥ η E ( e D ) . The relation Ψ E ( J A ) ≤ Ψ E ( J D ) can be shown by a straightforward exercise. The other inequality Ψ E ( J ST ) ≤ Ψ E ( J D )also holds trivially. Figure VI C shows that there is no ordering between η E ( e A ) and η E ( e ST ).Another interesting characteristic is that η E ( e A ) becomes one as | θ | approaches one. Also, η E ( e D ) and η E ( e ST )do not vanish at the boundary | θ | = 1. D. Discussions

The ﬁrst observation in our study is that each optimality criterion deﬁnes a diﬀerent optimal design, whose char-acteristics can be very diﬀerent. Although this is clear, we explicitly demonstrate this fact for the popular optimalitycriteria. This point is illustrated by expressions for the Fisher information matrices (67), (68), (69), and (71). In thefollowing, we list more speciﬁc ﬁndings.The Fisher information matrix for the design for the standard tomography (76) shows asymmetry in the Blochvector representation. However, it becomes rotationally invariant for the A -optimality function by taking a tracewith a unit weight matrix. This point was also demonstrated by Yamagata[23] deriving the necessary and suﬃcientcondition for the weight matrix such that the standard tomography coincides with the A -optimal design. See also arelated work [46] When the standard tomography is evaluated for other optimality criteria, we see that it is not theworst design among other optimal designs. This might be another justiﬁcation of adopting the standard tomographyin practice.Next, we analyze A -optimal design. By evaluating eﬃciency functions for other optimality criteria, we see that itis relatively stable. In particular, it behaves well for the E -optimality when compared with the D -optimal design andthe standard tomography. One of the reasons behind this observation is that A -optimality with a unit weight matrixoptimizes three parameters equal footings. Thus, we expect that it should perform well on average.The D -optimal design is known to be one of the most popular criteria in the classical theory of optimal DoE.However, its applicability in the quantum case needs further justiﬁcation. Figure 2 shows that other optimal designsas well as the standard tomography become less eﬃcient when the model becomes pure. This is because this optimalcriterion concerns the product of eigenvalues of the Fisher information matrix, and thus it is sensitive to the smallnumbers. In particular, the D -optimality function Ψ D ( J D ) for the D -optimal design vanishes in the pure-state limit.Therefore, eﬃciency function η D also vanishes unless Ψ D ( J ) cancels each other. The classical Fisher informationmatrix of this D -optimal design (74) is proportional to the SLD Fisher information matrix. In literature[47, 48],the existence of such POVMs is not trivial in general, and it has been the subject known as the Fisher-symmetricinformationally complete measurement. Our result on the D -optimal design is thus related. From this observation, D -optimality for the higher dimensional case is worth a further study.Last, let us make a brief comment on the E -optimal design. This optimality is related to the philosophy of themin-max strategy: One tries to avoid the worst case value of the MSE matrix. Interestingly, the classical Fisher3information matrix for the E -optimal design is proportional to the identity matrix as seen in Eq. (69). This resultexhibits the maximum symmetry for the Fisher information matrix. This optimal design is not so common in thequantum domain so far. It should play an important role when one wishes to guarantee the best estimate for thesmallest eigenvalue of the Fisher information matrix. VII. SUMMARY AND OUTLOOK

In summary, we have formulated the problem of quantum-state estimation problem in the framework of optimaldesign of experiments (DoE). This formulation shows that the problem at hand is a usual statistical optimizationproblem except for the fact that quantities are represented by non-negative complex matrices. We have solved thequbit case analytically deriving popular optimal designs. A quantum version of the equivalence theorem is also provenin the qubit case. Another important ﬁnding of this paper is a comparison among the popular optimal designs: A -, D -, and E -optimal designs. In particular, we have shown that some of the optimal designs do not perform well forthe other choice of optimality criterion. Although this is likely to happen in general, we have explicitly demonstratedit for the standard parametrization of qubit states.An important future work is to apply our formulation to various physically important problems and to ﬁnd agood experimental setup by solving the optimization problem numerically. There are several open problems alongthe line of this research. First, to develop an eﬃcient optimization algorithm for ﬁnding an optimal design. Second,generalization of the equivalence theorem to higher dimensional systems. Third, the singular design problem that iscommon in ﬁnding an optimal design in the presence of nuisance parameters.[41, 49, 50] Classical theory of optimalDoE is a rich and mature subject in classical statistics. There are many unexplored subjects of DoE in the quantumcase, which would be of great importance in any quantum information processing, such as a sequential design, blockdesign, Bayesian design, minimax design, robust design, model discrimination, to list a few. ACKNOWLEDGEMENT

The work is partly supported by JSPS KAKENHI Grant Number JP17K05571 and the FY2020 UEC ResearchSupport Program, the University of Electro-Communications. He would like to thank Prof. Hui Khoon Ng forinvaluable discussions and her kind hospitality at Centre for Quantum Technologies in Singapore where part of thiswork was done. [1] R. A. Fisher et al. , The design of experiments. , no. 7th Ed (Oliver and Boyd. London and Edinburgh, 1960).[2] V. V. Fedorov,

Theory of optimal experiments (Academic Press, 1972).[3] F. Pukelsheim,

Optimal design of experiments (SIAM, 2006).[4] C. W. Helstrom,

Quantum detection and estimation theory (Academic press, 1976).[5] A. S. Holevo,

Probabilistic and statistical aspects of quantum theory (Edizioni della Normale, 2011).[6] M. G. A. Paris and J. E. ˇReh´aˇcek,

Quantum State Estimation (Springer, 2004).[7] M. Hayashi,

Quantum Information Theory: Mathematical Foundation (Springer, 2016).[8] D. Petz,

Quantum information theory and quantum statistics (Springer Science & Business Media, 2007).[9] Y. S. Teo,

Introduction to quantum-state estimation (World Scientiﬁc, 2016).[10] R. Kosut, I. A. Walmsley and H. Rabitz, Optimal experiment design for quantum state and process tomography andhamiltonian parameter estimation (2004).[11] J. Nunn, B. J. Smith, G. Puentes, I. A. Walmsley and J. S. Lundeen,

Physical Review A (Apr 2010) p. 042109.[12] G. Ball´o, K. M. Hangos and D. Petz, IEEE transactions on automatic control (2012) 2056.[13] L. Ruppert, D. Virosztek and K. Hangos, Journal of Physics A: Mathematical and Theoretical (2012) p. 265305.[14] T. Sugiyama, P. S. Turner and M. Murao, Physical Review A (2012) p. 052107.[15] Y. Gazit, H. K. Ng and J. Suzuki, Physical Review A (Jul 2019) p. 012350.[16] V. V. Fedorov and P. Hackl,

Model-oriented design of experiments (Springer Science & Business Media, 2012).[17] L. Pronzato and A. P´azman,

Design of experiments in nonlinear models (Springer & Business Media, 2013).[18] V. V. Fedorov and S. L. Leonov,

Optimal design for nonlinear response models (CRC Press, 2013).[19] K. Chaloner and I. Verdinelli,

Statistical Science (1995) 273.[20] A. DasGupta,

Handbook of Statistics (1996) 1099.[21] E. G. Ryan, C. C. Drovandi, J. M. McGree and A. N. Pettitt, International Statistical Review (2016) 128.[22] X.-M. Lu, Z. Ma and C. Zhang, Physical Review A (Feb 2020) p. 022303.[23] K. Yamagata,

International Journal of Quantum Information (2011) 1167. [24] H. Zhu, Scientiﬁc reports (2015) 1.[25] J. Kiefer and J. Wolfowitz, Canadian Journal of Mathematics (1960) 363.[26] H. Nagaoka, On the parameter estimation problem for quantum statistical models, in Asymptotic Theory of QuantumStatistical Inference: Selected Papers , ed. M. Hayashi (World Scientiﬁc, 2005).[27] M. Hayashi and K. Matsumoto, Statistical model with measurement degree of freedom and quantum physics, in

SurikaisekiKenkyusho Kokyuroku , 1998. (English translation available in [51]).[28] O. Barndorﬀ-Nielsen and R. Gill,

Journal of Physics A: Mathematical and General (2000) p. 4481.[29] M. Akahira and K. Takeuchi, Non-regular statistical estimation (Springer Science & Business Media, 2012).[30] L. Seveso and M. G. A. Paris,

International Journal of Quantum Information (2020) p. 2030001.[31] G. M. D’Ariano, P. L. Presti and P. Perinotti, Journal of Physics A: Mathematical and General (jun 2005) p. 5979.[32] A. Fujiwara, Journal of Physics A: Mathematical and General (2006) p. 12489.[33] M. Hayashi and K. Matsumoto, Journal of Mathematical Physics (2008) p. 102101.[34] J. Kahn and M. Guta, Communications in Mathematical Physics (2009) 597.[35] K. Yamagata, A. Fujiwara and R. D. Gill,

The Annals of Statistics (2013) 2197.[36] Y. Yang, G. Chiribella and M. Hayashi, Communications in Mathematical Physics (2019) 223.[37] T. Y. Young,

Information Sciences (1975) 25.[38] H. Nagaoka, On ﬁsher information of quantum statistical models, in Asymptotic Theory of Quantum Statistical Inference:Selected Papers , ed. M. Hayashi (World Scientiﬁc, 2005).[39] S. L. Braunstein and C. M. Caves,

Physical Review Letters (May 1994) 3439.[40] S.-I. Amari and H. Nagaoka, Methods of information geometry (American Mathematical Soc., 2007).[41] J. Suzuki, Y. Yang and M. Hayashi,

Journal of Physics A: Mathematical and Theoretical (2020) p. 453001.[42] R. D. Gill and S. Massar, Physical Review A (Mar 2000) p. 042312.[43] H. Nagaoka, IEICE Tech Report

IT 89-42 (1989) 9. (Reprinted in [51]).[44] M. Hayashi, A linear programming approach to attainable cramer-rao type bound, in

Quantum Communication, Comput-ing, and Measurement , eds. O. Hirota, A. S. Holevo and C. M. Caves (Plenum, New York, 1997).[45] R. Bhatia,

Matrix analysis (Springer Science & Business Media, 2013).[46] H. Zhu,

Physical Review A (Jul 2014) p. 012115.[47] N. Li, C. Ferrie, J. A. Gross, A. Kalev and C. M. Caves, Physical Review Letters (2016) p. 180402.[48] H. Zhu and M. Hayashi,

Physical Review Letters (2018) p. 030404.[49] J. Suzuki,