Computing exact D -optimal designs by mixed integer second-order cone programming
aa r X i v : . [ m a t h . S T ] O c t The Annals of Statistics (cid:13)
Institute of Mathematical Statistics, 2015
COMPUTING EXACT D -OPTIMAL DESIGNS BY MIXEDINTEGER SECOND-ORDER CONE PROGRAMMING By Guillaume Sagnol and Radoslav Harman Zuse Institut Berlin and Comenius University, Bratislava
Let the design of an experiment be represented by an s -dimensional vector w of weights with nonnegative components. Letthe quality of w for the estimation of the parameters of the statis-tical model be measured by the criterion of D -optimality, defined asthe m th root of the determinant of the information matrix M ( w ) = P si =1 w i A i A Ti , where A i , i = 1 , . . . , s are known matrices with m rows.In this paper, we show that the criterion of D -optimality is second-order cone representable. As a result, the method of second-order coneprogramming can be used to compute an approximate D -optimal de-sign with any system of linear constraints on the vector of weights.More importantly, the proposed characterization allows us to com-pute an exact D -optimal design, which is possible thanks to high-quality branch-and-cut solvers specialized to solve mixed integersecond-order cone programming problems. Our results extend to thecase of the criterion of D K -optimality, which measures the quality of w for the estimation of a linear parameter subsystem defined by afull-rank coefficient matrix K .We prove that some other widely used criteria are also second-order cone representable, for instance, the criteria of A -, A K -, G -and I -optimality.We present several numerical examples demonstrating the effi-ciency and general applicability of the proposed method. We showthat in many cases the mixed integer second-order cone program-ming approach allows us to find a provably optimal exact design,while the standard heuristics systematically miss the optimum.Received December 2013; revised January 2015. Supported by the VEGA 1/0163/13 grant of the Slovak Scientific Grant Agency.
AMS 2000 subject classifications.
Primary 62K05; secondary 65K05.
Key words and phrases.
Optimal experimental design, exact optimal designs, second-order cone programming, mixed integer programming, D -criterion. This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Statistics ,2015, Vol. 43, No. 5, 2198–2224. This reprint differs from the original inpagination and typographic detail. 1
G. SAGNOL AND R. HARMAN
1. Introduction.
Consider an optimal experimental design problem ofthe form max w ∈W Φ s X i =1 w i A i A Ti ! , (1.1)where Φ is a criterion mapping the space S + m of m × m positive semidefinitematrices over the set R + := [0 , ∞ ). In (1.1), A i ∈ R m × ℓ i , i = 1 , . . . , s areknown matrices, and W is a compact subset of R s + representing the set ofall permissible designs.Problem (1.1) arises in linear regression models with a design space X ≡ [ s ] := { , . . . , s } , independent trials and a vector θ ∈ R m of unknown pa-rameters, provided that the trial in the i th design point results in an ℓ i -dimensional response y i , satisfying E ( y i ) = A Ti θ and Var( y i ) = σ I ℓ i , where I k is the k × k -identity matrix. For a design w ∈ W , the moment matrix M ( w ) := P si =1 w i A i A Ti represents the total information gained from the de-sign w .When the criterion Φ satisfies certain properties, problem (1.1) can beinterpreted as selecting the weights w i that yield the most accurate estima-tion of θ . In this paper, we mainly focus on the D -optimal problem, wherethe criterion Φ is set to Φ D : M → (det M ) /m . (1.2)In the case of Gaussian measurement error, this corresponds to the problemof minimizing the volume of the standard confidence ellipsoid for the bestlinear unbiased estimator (BLUE) ˆ θ of θ .More generally, if the experimenter is interested in the estimation of theparameter subsystem ϑ = K T θ , where K is an m × k matrix ( k ≤ m ) of fullcolumn rank [rank( K ) = k ], a relevant criterion is D K -optimality, obtainedwhen the D -criterion is applied to the information matrix C K ( M ) for thelinear parametric subsystem given by the coefficient matrix K , defined by(Section 3.2 in [31]) C K ( M ) = min (cid:22) L ∈ R k × m LK = I k LM L T . Here the minimum is taken with respect to L¨owner ordering, over all leftinverses L of K . This information matrix is equal to ( K T M − K ) − if theestimability condition holds (range K ⊆ range M ); otherwise C K ( M ) is asingular matrix, soΦ D | K : M → (cid:26) (det K T M − K ) − /k , if range K ⊆ range M ;0 , otherwise.(1.3) OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP In the previous formula M − denotes a generalized inverse of M , that is,a matrix satisfying M M − M = M . Although M − is not unique in general,the definition of Φ D | K is consistent. Indeed, the matrix K T M − K does notdepend on the choice of the generalized inverse M − if the columns of K areincluded in the range of M ; cf. Pukelsheim [31]. Note that if k = 1, that is, ifthe matrix K = c is a nonzero vector, then the criterion Φ D | K is equivalentto the criterion of c -optimality.Other optimality criteria, such as A , A K , G and I -optimality, are alsodiscussed in the Appendix.In the standard form of the problem, W is the probability simplex W ∆ := ( w ∈ R s + : s X i =1 w i = 1 ) , and the design w is a weight vector indicating the proportions of trials inthe individual design points. This problem, called the optimal approximatedesign problem in the literature, is in fact a relaxation of a much moredifficult and more fundamental discrete optimization problem: the optimalexact design problem of size N , where W takes the form W N := ( n N : n ∈ N s , s X i =1 n i = N ) . Here, the experiment consists of N trials, and if w ∈ W N , then n i = N w i indicates the number of trials in the design point i . (In the above defini-tion, N denotes the set of all nonnegative integers, i.e., 0 ∈ N .) Note thatthe constraint w ∈ W ∆ is obtained from w ∈ W N by relaxing the integerconstraints on N w i .Many different approaches have been proposed to solve problems of type(1.1). However, most methods are specialized and work only if the feasibilityset W is the probability simplex W ∆ or the standard discrete simplex W N .In the former case (approximate optimal design, W = W ∆ ), the traditionalmethods are the Fedorov–Wynn type vertex-direction algorithms [13, 45],and the multiplicative algorithms [39, 41, 47, 48], eventually combined withadaptive changes of the finite grid X [19, 30, 46]. In the latter case (ex-act optimal design, W = W N ), the classical methods are heuristics such asexchange algorithms [3, 13, 27], rounding methods [32] and metaheuristicssuch as simulated annealing [15] or genetic algorithms [20]. For some smallto medium size models, branch-and-bound methods [43] have been used tocompute provably optimal solutions.In many practical situations, however, more complicated constraints areimposed on the design [9], and there is a need for more general algorithms.For example, assume that the experimental region can be partitioned as G. SAGNOL AND R. HARMAN X = X ∪ X , and that 40% (resp., 60%) of the trials should be chosen in X (resp., X ); that is, the constraint w ∈ W ∆ is replaced by w ∈ W := (cid:26) w ∈ R s + : X i ∈X w i = 0 . , X i ∈X w i = 0 . (cid:27) . This is an example of a stratified design [16], which is a generalization of thewell-known marginally constrained design [10]. Other examples of relevantdesign domains W defined by a set of linear inequalities are discussed in [42].For example, it is possible to consider a case in which a total budget isallocated, and the design points are associated to possibly unequal costs c , . . . , c s . It is also possible to consider decreasing costs when trials of specificdesign points are grouped, or to avoid designs that are concentrated on asmall number of design points.For some special linear constraints, the approximate D -optimal designproblem can be solved by modifications of the vertex-direction and the mul-tiplicative algorithms (see, e.g., [9, 16, 26]), but the convergence of thesemethods is usually slow. Recently, modern mathematical programming al-gorithms [12, 14, 18, 25, 28, 34, 36, 42] have been gaining in popularity. Theidea is to reformulate the optimal design problem under a canonical formthat specialized solvers can handle, such as maxdet programs (MAXDET), semidefinite programs (SDP) or second-order cone programs (SOCP).Reformulating an optimal design problem as an SOCP or an SDP is use-ful in many regards. First, it allows one to use modern software to computean optimal solution efficiently. Second, the available interior point methodsare known to return an ε -optimal solution in polynomial time with respectto the size of the instance and log ε because a self-concordant barrier existsfor these problems; cf. [6]. Third, mathematical programming methods aregeneral in the sense that they are not restricted to the use of special linearconstraints . Nevertheless, the inclusion of general linear constraints withinmathematical programming characterizations is not completely straightfor-ward. For instance, we show in Section 2 that the SOCP formulation of [34]for the standard approximate D -optimal design problem (over W ∆ ) doesnot yield a valid SOCP formulation of the constrained D -optimal designproblem when the constraint w ∈ W ∆ is replaced by w ∈ W .The main result of this paper is proved in Section 4 and states that thedeterminant criterion is SOC-representable . More precisely, it is possible toexpress that ( t, w ) belongs to the hypograph of w → Φ D ( M ( w )), that is, t m ≤ det M ( w ), as a set of second-order cone inequalities. Consequently, weobtain an alternative SOCP formulation for D -optimality, which remainsvalid for any weight domain W that can be expressed by SOC inequalities;see Section 3. OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP In the Appendix, we prove that other widely used criteria, such as A , G or I -optimality are also SOC-representable. We have summarized the SOCPformulations of constrained D -, A - and G -optimality in Table 1.Before this paper, the state of the art method for solving optimal designproblems with arbitrary linear constraints was the MAXDET formulation ofVandenberghe, Boyd and Wu [42], which is in fact reformulated as an SDPby most interfaces, such as YALMIP [24] or PICOS [35], by using the con-struction described in [5]. Having an SOCP instead of an SDP formulationhas two main advantages. The first is purely computational: it is well knownthat the computational effort per iteration required by the interior pointmethods to solve an SOCP is much less than that required to solve an SDP;cf. [1]. When the parameter θ is of large dimension m , or when the numberof candidate support points s is large, the SOCP can improve the computa-tional time by one or two orders of magnitude (compared to MAXDET), aswas already evidenced in [34] for D -optimality over the probability simplex W ∆ .The second and probably more important benefit of SOCP formulations(compared to SDP) is that specialized solvers can handle SOCP problemswith integer variables, while there is currently no reliable solver to handleSDPs with integer variables. Indeed, much progress has been made recentlyin the development of algorithms for second-order cone programming, whensome of the variables are constrained in the integral domain (MISOCP:mixed integer second-order cone programming). Thus the SOCP formulationof D -optimality presented in this article, unlike the existing SOCP and SDPformulations, allows us to use those specialized codes to solve exact designproblems. Indeed, our formulation is valid for any compact weight domain W , so in particular it is valid for the set W N of exact designs of size N ,and more generally for any polyhedron intersected with a lattice of integerpoints. Compared to the raw branch-and-bound method for computing exactdesigns proposed by Welch [43], the MISOCP approach is not only easierto implement, but also much more efficient. The reason is that specializedsolvers such as CPLEX [21] or MOSEK [2] rely on branch-and-cut algorithmswith sophisticated branching heuristics, and they use cut inequalities toseparate noninteger solutions.In Section 5, we demonstrate the general applicability of the proposed ap-proach, incorporating illustrative examples taken from two application areasof the theory of optimal experimental designs. The following key aspects ofthe MISOCP approach will be emphasized:(1) the ability to handle any system of linear constraints on the weights;(2) the ability to compute exact-optimal designs with a proof of optimal-ity; G. SAGNOL AND R. HARMAN
Table 1
SOCP formulation of the D K , A K and G -optimal design problems over a compact weightregion W ⊆ R s + . In the above, K represents a given m × k matrix of full column rank.The particular case k = 1 (where c = K is a column vector) gives SOCP formulations forthe c -optimal design problem, and the case K = I m yields the standard D and A -optimality problems. The variables Z i , Y i ( i ∈ [ s ] ) are of size ℓ i × k , the variables H ji ( i ∈ [ s ] , j ∈ [ s ] ) are of size ℓ j × ℓ i , J is of size k × k , the weight vector is w ∈ W and thevariables t ij ( i ∈ [ s ] , j ∈ [ k ] ), u ji ( i ∈ [ s ] , j ∈ [ s ] ), µ i ( i ∈ [ s ] ) and ρ are scalar max w ∈W Φ D | K ( M ( w )) = max w ,Z i ,t ij ,J k Y j =1 ( J j,j ) /k s.t. X i ∈ [ s ] A i Z i = KJ , J is lower triangular, k Z i e j k ≤ t ij w i ( i ∈ [ s ] , j ∈ [ k ]), s X i =1 t ij ≤ J j,j ( j ∈ [ k ]), t ij ≥ i ∈ [ s ] , j ∈ [ k ]), w ∈ W ,max w ∈W Φ A | K ( M ( w )) = max w ,Y i ,µ i X i ∈ [ s ] µ i s.t. X i ∈ [ s ] A i Y i = (cid:18) X i ∈ [ s ] µ i (cid:19) K , k Y i k F ≤ µ i w i ( i ∈ [ s ]), µ i ≥ i ∈ [ s ]), w ∈ W ,max w ∈W Φ G ( M ( w )) = max w ,H ji ,u ji ,ρ ρ s.t. X j ∈ [ s ] A j H ji = (cid:18) X j ∈ [ s ] u ji (cid:19) A i ( i ∈ [ s ]), k H ji k F ≤ w j u ji ( i ∈ [ s ] , j ∈ [ s ]), u ji ≥ i ∈ [ s ] , j ∈ [ s ]), ρ ≤ X j ∈ [ s ] u ji ( i ∈ [ s ]), w ∈ W .OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP (3) the ability to rapidly identify a near exact-optimal design for appli-cations where the computing time must remain short, while giving a lowerbound on its efficiency; moreover this bound is usually much better than thestandard bound obtained from the approximate optimal design.In particular, our algorithm can compute constrained exact optimal designs,a feature out of reach of the standard computing methods, although someauthors have proposed heuristics to handle some special cases such as costconstraints [40, 44]. A notable exception is the recent DQ-optimality ap-proach of Harman and Filov´a [17], which is a heuristic based on integerquadratic programming (IQP) that can handle the general case of linearlyconstrained exact designs. However, for some specific D -optimum designproblems, the IQP approach leads to very inefficient designs; cf. Section 4in [17].In practice, the MISOCP solvers take an input tolerance parameter ε > w ∗ is found, with a guarantee thatno design w with value Φ( M ( w )) ≥ (1 + ε )Φ( M ( w ∗ )) exists. In some casessuch as D -optimal block designs, there is a positive value of ε > ε > ε ) − ≥ − ε , which is usually a much better efficiencybound than the one based on the comparison with the approximate optimaldesign. In many situations, the solver is further able to terminate with an optimality status , which means that the branch and bound tree has beencompletely trimmed and constitutes a proof of optimality. Moreover, it oftenproduces better designs than the standard heuristics (also in cases whenperfect optimality is not guaranteed).
2. Former SOCP formulation of D -optimality. A second-order cone pro-gram (SOCP) is an optimization problem where a linear function f T x mustbe maximized, among the vectors x belonging to a set S -defined by second-order cone inequalities, that is, S = { x ∈ R n : ∀ i = 1 , . . . , N c , k G i x + h i k ≤ c Ti x + d i } for some G i , h i , c i , d i of appropriate dimensions. Optimization problems ofthis class can be solved efficiently to the desired precision using interior pointtechniques; see [6].We first recall the result from [34] about D -optimality, rewritten with thenotation of the present article. Note that k Z k F := √ trace ZZ T denotes theFrobenius norm of the matrix Z , which also corresponds to the Euclideannorm of the vectorization of Z : k Z k F = k vec( Z ) k . In the following formula-tion, the restriction to lower triangular matrices is just a compact notationfor the set of linear constraints that appears in [34]: G. SAGNOL AND R. HARMAN
Proposition 2.1 (Former SOCP for D -optimality [34]). Let ( Z , . . . , Z s ,L, w ) be optimal for the following SOCP: max Z i ∈ R ℓi × m L ∈ R m × m w ∈ R s + m Y k =1 L k,k ! /m s . t . s X i =1 A i Z i = L,L is lower triangular , (2.1) k Z i k F ≤ √ mw i ∀ i ∈ [ s ] , w ∈ W ∆ . Then Φ D ( M ( w )) = det /m M ( w ) = ( Q k L k,k ) /m , and w ∈ W ∆ is optimalfor the standard approximate D -optimal design problem. If we want to solve a D -optimal design problem over another design re-gion W , it is very tempting to replace the last constraint in problem (2.1)by w ∈ W . However, this approach fails. Consider, for example, the fol-lowing experimental design problem with three regression vectors in a two-dimensional space: A = [1 , T , A = [ − , √ ] T , A = [ − , − √ ] T . For rea-sons of symmetry, it is clear that the approximate D -optimal design (over W ∆ ) is w = w = w = , and this is indeed the vector w returned by prob-lem (2.1). Define now W := { w ∈ R : P i =1 w i = 1 , w ≥ w + 0 . } . Theoptimal design over W is w ∗ = [0 . , . , . w ≥ w + 0 .
25 yields the design w = [0 . , . , . w ∗ , L ∗ ) for prob-lem (2.1) satisfies M ( w ∗ ) = ( L ∗ )( L ∗ ) T ; that is, L ∗ is a Cholesky factor ofthe optimal information matrix. However, this relation is only true for opti-mality over the unit simplex W ∆ , which is a consequence of a generalizationof Elfving’s theorem; cf. [34]. In the present article, we give an alternativeSOCP formulation of the D -optimal problem, which remains valid for anycompact weight domain W . The main idea of our new formulation is thatthe Cholesky factorization of a matrix HH T can be computed by solvingan SOCP that mimics the Gram–Schmidt orthogonalization process of therows of H . Moreover, our new SOCP handles the more general case of D K -optimality. To derive our result, we use the notion of SOC-representability ,which we next present.
OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP
3. SOC-representability.
In this section, we briefly review some basicnotions about second-order cone representability. The following definitionwas introduced by Ben-Tal and Nemirovski [5]:
Definition 3.1 (SOC-representability of a set). A convex set S ⊆ R n issaid to be second-order cone representable , abbreviated SOC-representable ,if S is the projection of a set in a higher-dimensional space that can bedescribed by a set of second-order cone inequalities. More precisely, S isSOC-representable if and only if there exist G i ∈ R n i × ( n + m ) , h i ∈ R n i , c i ∈ R n + m , d i ∈ R ( i = 1 , . . . , N c ), such that x ∈ S ⇐⇒ ∃ y ∈ R m : ∀ i = 1 , . . . , N c , (cid:13)(cid:13)(cid:13)(cid:13) G i (cid:20) xy (cid:21) + h i (cid:13)(cid:13)(cid:13)(cid:13) ≤ c Ti (cid:20) xy (cid:21) + d i . An important example of an SOC-representable set is the following:
Lemma 3.2 (Rotated second-order cone inequalities).
The set S = { ( x , t, u ) ∈ R n × R × R : k x k ≤ tu, t ≥ , u ≥ } ⊆ R n +2 is SOC-representable. In fact, it is easy to see that S = (cid:26) ( x , t, u ) ∈ R n × R × R : (cid:13)(cid:13)(cid:13)(cid:13) x t − u (cid:13)(cid:13)(cid:13)(cid:13) ≤ t + u (cid:27) . The notion of SOC-representability is also defined for functions:
Definition 3.3 (SOC-representability of a function). A convex (resp.,concave) function f : S ⊆ R n R is said to be SOC-representable if andonly if the epigraph of f , { ( t, x ) : f ( x ) ≤ t } [resp., the hypograph { ( t, x ) : t ≤ f ( x ) } ], is SOC-representable.It follows immediately from these two definitions that the problem ofmaximizing a concave SOC-representable function (or minimizing a con-vex one) over an SOC-representable set can be cast as an SOCP. It is alsoeasy to verify that sets defined by linear equalities (i.e., polyhedrons) areSOC-representable, that intersections of SOC-representable sets are SOC-representable and that the (pointwise) minimum of concave SOC-representable functions is still concave and SOC-representable.We next give another example which is of major importance for this ar-ticle: the geometric mean of n nonnegative variables is SOC-representable. Lemma 3.4 (SOC-representability of a geometric mean [5]).
Let n ≥ be an integer. The function f mapping x ∈ R n + to Q ni =1 x /ni is SOC-representable. G. SAGNOL AND R. HARMAN
For construction of the SOC representation of f , see [23] or [1]. In whatfollows, we show the case n = 5. For all t ∈ R + , x ∈ R , we have t ≤ x x x x x ⇐⇒ t ≤ x x x x x t ⇐⇒ ∃ u ∈ R : u ≤ x x , u ≤ u u , u ≤ x x , u ≤ u t , u ≤ x t, t ≤ u u ,and each of these inequalities can be transformed to a standard second-ordercone inequality by Lemma 3.2.
4. SOC-representability of the D -criterion. The key to SOC represen-tation of the D -criterion is a Cholesky decomposition of the moment matrix,as given by the following lemma. Note that the lemma is general in the sensethat it does not require the estimability conditions to be satisfied. Lemma 4.1.
Let H be an m × n matrix ( m ≤ n ), and let K be an m × k matrix ( k ≤ m ) of full column rank. If k = m , let U = K , and if k < m , let U be a nonsingular matrix of the form [ V, K ] , where V ∈ R m × ( m − k ) . Thenthere exists a QR-decomposition of H T U − T = ˜ Q ˜ R where ˜ Q is an orthogonal n × n matrix and ˜ R is an upper triangular n × m matrix, satisfying ˜ R ii ≥ for all i ∈ [ m ] and ˜ R ii = 0 implies ˜ R i = · · · = ˜ R im = 0 for all i ∈ [ m ] . (4.1) Let L T ∗ be the k × k upper triangular sub-block of ˜ R with elements ( L T ∗ ) ij =˜ R m − k + i,m − k + j for all i, j ∈ [ k ] . Then C K ( HH T ) = L ∗ L T ∗ ; that is, L ∗ L T ∗ isa Cholesky factorization of the information matrix for the linear paramet-ric system given by the coefficient matrix K , corresponding to the momentmatrix HH T . Proof.
It is simple to show that a QR decomposition satisfying (4.1)can be obtained from any QR-decomposition H T U − T = ¯ Q ¯ R , using an ap-propriate sequence of Givens rotations and row permutations applied on¯ R .Consider the decomposition H T U − T = ˜ Q ˜ R satisfying (4.1). Assume that k < m < n . Partition the orthogonal matrix ˜ Q and the upper triangularmatrix ˜ R as follows:˜ Q = m − k ←→ k ←→ n − m ←→ [ Q Q ∗ Q ] l n , ˜ R = m − k ←→ k ←→ L T B L T ∗ l m − k l k l n − m (4.2) OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP where the block sizes are indicated on the border of the matrices. Let U − =[ Z T , X T ] T , where X is a k × m matrix. Note that [ Z T , X T ] T K = U − K =[0 , I k ] T , which implies XK = I k ; that is, X is a left inverse of K . Define Y = I m − KX . By a direct calculation, we obtain XH = B T Q T + L ∗ Q T ∗ and Y H = H − KXH = [
V, K ] ˜ R T ˜ Q T − K ( B T Q T + L ∗ Q T ∗ ) = V L Q T . Therefore,using the orthogonality of ˜ Q , that is, Q T Q = I m − k , Q T ∗ Q ∗ = I k , Q T Q ∗ = 0and a representation of C K given by [31], Section 3.2, we have C K ( HH T ) = XHH T X T − XHH T Y T ( Y HH T Y T ) − Y HH T X T (4.3) = B T B + L ∗ L T ∗ − B T L T V T ( V L L T V T ) − V L | {z } P B, where P is the orthogonal projector on range( L T V T ). Note that (4.1) impliesrange( B ) ⊆ range( L T ), and rank( V ) = m − k gives range( L T ) = range( L T V T ).That is, P B = B , and from (4.3) we obtain the required result C K ( HH T ) = L ∗ L T ∗ .If k = m or n = m , the lemma can be proved in a completely analogousway, treating the matrices Q , L , B (if and only if k = m ) and Q (if andonly if m = n ) as empty . (cid:3) The next theorem shows that the blocks Q ∗ and L ∗ from decomposi-tion (4.2) can be computed by solving an optimization problem over anSOC-representable set. Theorem 4.2.
Let H be an m × n matrix ( m ≤ n ), let K be an m × k matrix ( k ≤ m ) of full column rank and let L ∗ be optimal for the followingproblem: max Q ∈ R n × k L ∈ R k × k det L s . t . L is lower triangular , (4.4) HQ = KL, k Q e j k ≤ j ∈ [ k ]) . Then Φ D | K ( HH T ) = (det( L ∗ )) /k . Proof.
Consider the QR decomposition H T U − T = ˜ Q ˜ R from the state-ment of Lemma 4.1, and the block partition (4.2). We will show that theblocks Q ∗ and L ∗ form an optimal solution to the problem from the theorem.First, L ∗ is clearly lower triangular, and using direct block multiplicationtogether with Q T ∗ Q ∗ = I k , we can verify that Q T ∗ H T = L T ∗ K T , that is, HQ ∗ = KL ∗ . Second, Q ∗ has columns of unit length, which implies k Q ∗ e j k = 1 for G. SAGNOL AND R. HARMAN all j ∈ [ k ]. Therefore, Q ∗ , L ∗ are feasible. From Lemma 4.1, we know that C K ( HH T ) = L ∗ L T ∗ , that is, (det( L ∗ )) /k = Φ D | K ( HH T ). To complete theproof of the theorem, we only need to show that any feasible L satisfies(det( L )) /k ≤ Φ D | K ( HH T ).Let Q, L be a feasible pair of matrices. As in the proof of Lemma 4.1, let U = [ V, K ] be an invertible matrix, and let U − = [ Z T , X T ] T , where X is a k × m matrix. Obviously, U − H = [ C T , D T ] T , where C = ZH and D = XH ,and [ C T , D T ] T Q = U − HQ = U − KL = [0 , I k ] T L = [0 , L T ] T , which implies CQ = 0 and DQ = L . Define the projector P = I n − C T ( CC T ) − C , that is, P = P , and then observe that CQ = 0 entails P Q = Q . From the previousequalities and the Cauchy–Schwarz inequality for determinants [e.g., [38],formula 12.5(c)], we havedet( LL T ) = (det( DQ )) = (det( DP Q )) ≤ det( DP D T ) det( Q T Q ) . (4.5)The Hadamard determinant inequality (e.g., [38], formula 12.27) and thefeasibility of Q givedet( Q T Q ) ≤ k Y i =1 ( Q T Q ) ii = k Y i =1 k Q e i k ≤ . (4.6)Combining (4.5) and (4.6), we obtain det( LL T ) ≤ det( DP D T ), and the proofwill be complete, once we prove DP D T = C K ( HH T ).Note that I m = U U − = [ V, K ][ Z T , X T ] T = V Z + KX , that is, Y := I m − KX = V Z . Moreover, rank( V ) = m − k implies range( H T Y T ) = range( H T × Z T V T ) = range( H T Z T ); that is, the orthogonal projectors H T Y T ( Y HH T × Y T ) − HY and H T Z T ( ZHH T Z T ) − HZ coincide. Consequently, using [31],Section 3.2, we have C K ( HH T ) = XHH T X T − XHH T Y T ( Y HH T Y T ) − Y HH T X T = XHH T X T − XHH T Z T ( ZHH T Z T ) − ZHH T Z T = DD T − DC T ( CC T ) − CD T = DP D T . (cid:3) We next apply Theorem 4.2 to the matrix H = [ √ w A , . . . , √ w s A s ]. Thiswill allow us to express Φ D | K ( M ( w )) as the optimal value of an SOCP.Moreover, we make a change of variables which transforms the optimizationproblem into an SOCP where w may play the role of a variable. Theorem 4.3.
Let K be an m × k matrix ( k ≤ m ) of full column rank.For all nonnegative weight vectors w ∈ R s + , denote by OPT ( w ) the optimalvalue of the following optimization problem, where the optimization variables OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP are t ij ∈ R + ( ∀ i ∈ [ s ] , ∀ j ∈ [ k ] ), Z i ∈ R ℓ i × k ( ∀ i ∈ [ s ]) and J ∈ R k × k : max Z i ,t ij ,J k Y j =1 J j,j ! /k (4.7a) s.t. s X i =1 A i Z i = KJ, (4.7b) J is lower triangular , (4.7c) k Z i e j k ≤ t ij w i ( i ∈ [ s ] , j ∈ [ k ]) , (4.7d) s X i =1 t ij ≤ J j,j ( j ∈ [ k ]) . (4.7e) Then we have OPT ( w ) = Φ D | K ( M ( w )) . Proof.
Let w ∈ R s + , and define H := [ √ w A , . . . , √ w s A s ]. We are go-ing to show that every feasible solution to problem (4.7a)–(4.7e) yields afeasible solution for problem (4.4) in which J j,j = L j,j for all j ∈ [ k ], andvice versa. Hence the optimal value of problem (4.7a)–(4.7e) is OPT ( w ) = (det J ) /k = (det L ) /k = Φ D | K ( HH T ) = Φ D | K ( M ( w )) , from which the conclusion follows.Consider a feasible solution ( Z i , t ij , J ) to problem (4.7a)–(4.7e). We de-note by z ij the j th column of Z i : z ij := Z i e j . We now make the followingchange of variables: denote by Q i the matrix whose j th column is q ij , where q ij = z ij √ w i p J j,j , if w i > J j,j > , otherwise,and define Q as the vertical concatenation of the Q i : Q = [ Q T , . . . , Q Ts ] T .Let j ∈ [ k ]. If J j,j = 0, then q ij = for all i , so k Q e j k = P i k q ij k = 0 ≤ J j,j > t ij implies k q ij k ≤ t ij J j,j , and by constraint (4.7e), we must have k Q e j k = X i k q ij k ≤ X i t ij J j,j ≤ . Observe that constraints (4.7d) and (4.7e) also imply that z ij = when-ever w i = 0 or J j,j = 0, so that for all i ∈ [ s ], j ∈ [ k ], we can write z ij = G. SAGNOL AND R. HARMAN √ w i p J j,j q ij . Now, we define the matrix L column-wise as follows: ∀ j ∈ [ k ] , L e j := J e j p J j,j , if J j,j > , otherwise.Note that L is lower triangular [because so is J ; see (4.7c)]. We can nowprove that HQ = KL , which we do column-wise. If J j,j = 0, then we knowthat Q e j = , so the j th columns of HQ and KL are zero. If J j,j >
0, thenusing (4.7b) we have KL e j = KJ e j p J j,j = P i A i z ij p J j,j = X i √ w i A i q ij = HQ e j . Hence the proposed change of variables transforms a feasible solution (
Z, t ij , J )to problem (4.7a)–(4.7e) into a feasible pair ( Q, L ) for problem (4.4), withthe property J j,j = L j,j for all j ∈ [ k ].Conversely, let ( Q, L ) be feasible for problem (4.4), where H has beenset to [ √ w A , . . . , √ w s A s ]. For i ∈ [ s ], define Z i as the matrix of size ℓ i × k whose j th column is z ij = √ w i L j,j q ij , and J as the lower triangular matrixwhose j th column is J e j = L j,j L e j . We have P i A i Z i = KJ , which can beverified column-wise as follows: KJ e j = L j,j KL e j = L j,j HQ e j = L j,j X i √ w i A i q ij = X i A i z ij = X i A i Z i e j . Define further t ij = L j,j k q ij k , so that constraints (4.7d) and (4.7e) hold.This shows that ( Z i , t ij , J ) is feasible, with J j,j = L j,j for all j ∈ [ k ], and theproof is complete. (cid:3) Corollary 4.4 (SOC-representability of Φ D | K ). For any m × k matrix K of rank k , the function w → Φ D | K ( M ( w )) is SOC-representable. Proof.
Problem (4.7a)–(4.7e) can be reformulated as an SOCP, be-cause by Lemmas 3.4 and 3.2 the geometric mean in (4.7a) and inequalitiesof the form k Z i e j k ≤ t ij w i are SOC-representable. Hence the optimal valueof (4.7a)–(4.7e), w → OPT ( w ), is SOC-representable, and we know fromTheorem 4.3 that OPT ( w ) = Φ D | K ( M ( w )). (cid:3) Corollary 4.5 [(MI)SOCP formulation of the D -optimal design prob-lem]. If the set W is SOC-representable (in particular, if W is defined bya set of linear inequalities), then the constrained D K -optimal design prob-lem (1.1) can be cast as an SOCP. If W is the intersection of an SOC-representable set with the integer lattice Z s , then the exact D K -optimal de-sign problem over W can be cast as an MISOCP. OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP For K = I m , Corollaries 4.4 and 4.5 cover the case of the standard D -optimality. The (MI)SOCP formulation of problem (1.1) for D K -optimality(Φ = Φ D | K ) is summarized in Table 1, together with formulations for theother criteria presented in the Appendix. Finally, we note that the SOCPformulation of the optimal design problem with constraints on the weightshas consequences in terms of complexity, which we next present. Complexity of computing constrained approximate D K -optimal designs. Recall that s denotes the number of candidate support points, and k ≤ m denotes the number of features that we wish to estimate. (The full rankcoefficient matrix K is in R m × k .) Assume for simplicity that ℓ i = ℓ for all i ∈ [ s ], that the set of design weights W is defined by a set of n inequalitiesand that k is a power of 2, so that the geometric mean can be representedby k inequalities and k auxiliary variables; cf. Lemma 3.4 or [36] for moredetails. Then the SOCP formulation for D K -optimality of Table 1 contains: • s + sℓk + sk + k ( k + 1) + k variables, • mk + k + n linear (in)equalities, • k SOC inequalities of size 2 and ks SOC inequalities of size ℓ + 1.The number of iterations required by the interior point methods (IPM) tocompute an ε -approximate solution depends only on the number q of second-order cones. Indeed it is shown in [5] that the IPM finds an ε -approximatesolution after at most √ qO (log ε ) iterations, which is p k ( s + 1) O (log ε )iterations in our setting. However, it is well known that this bound is over-conservative, and in practice the IPM always returns an excellent solutionafter 10 to 40 iterations, almost independently of the problem size. In otherwords, the critical point is the algorithmic complexity of one iteration. Again,a result of [5] (Section 4.6.2) allows us to bound the number of algorithmicoperations for one iteration in O ( ksℓ (( ksℓ ) + ( mk + n ) )), which is O (( ksℓ ) )if m and n are not too large. But it is well known that this bound is veryconservative, too. In fact, the bottleneck of one iteration is the resolutionof a linear system of the form B δ = β , where B is a O ( ksℓ ) × O ( ksℓ ) sym-metric positive semidefinite matrix. In practice, for SOCPs the matrix B has a “ diagonal + sparse low rank ” structure, which allows for an efficientcomputation of the Newton direction δ [1].
5. Examples.
In this section, we will present numerical results for sev-eral examples taken from various application areas of the theory of optimaldesigns. With these examples, we aim to demonstrate the general applicabil-ity of the (MI)SOCP technique for the computation of exact or approximate D -optimal designs. G. SAGNOL AND R. HARMAN
Our computations were conducted on a PC with a 4-core processor at3 GHz. We used MOSEK [2] to solve the approximate optimal design prob-lems and CPLEX [21] for the exact optimal design problems (with integerconstraints). The solvers were interfaced through the Python package PI-COS [35], which allows users to pass (MI)SOCP models to different solversin a simple fashion. We refer the reader to the example section of the PICOSdocumentation for a practical implementation of the (MI)SOCP approachfor optimal design problems.It is common to compare several designs against each other by using themetric of D -efficiency, which is defined aseff D ( w ) = Φ D ( M ( w ))Φ D ( M ( w ∗ )) = (cid:18) det M ( w )det M ( w ∗ ) (cid:19) /m , where w ∗ is a reference design, such that M ( w ∗ ) is nonsingular. Unlessstated otherwise, we always give D -efficiencies relative to the optimal design;that is, w ∗ is a solution to problem (1.1). Block designs with blocks of size two.
An important category of modelsstudied in the experimental design literature is the class of block designs .Here the effect of t treatments should be compared, but their effects canonly be measured inside a number b of blocks , each inducing a block effecton the measurements. The optimal design problem entails choosing whichtreatments should be tested together in each block. We refer the reader toBailey and Cameron [4] for a comprehensive review on the combinatorics ofblock designs.In the case where the blocks are of size two, that is, the treatments can betested pairwise against each other, a design can be represented by a vector w = [ w , , w , , . . . , w ,t , . . . , w t − ,t ] of size s = (cid:0) t (cid:1) . For i < j , w i,j indicatesthe number of blocks where treatments i and j are tested simultaneously.The observation matrix associated with the block ( i, j ) can be chosen as thecolumn vector of dimension m = ( t − A i,j = P ( e i − e j ) , (5.1)where e i denotes the i th unit vector in the canonical basis of R t and P isthe matrix that transforms a t -dimensional vector v to the vector obtainedby keeping the first ( t −
1) coordinates of v .The problem of D -optimality has a nice graph theoretic interpretation: let w ∈ N s be a feasible block design, and denote by G the graph with t verticesand an edge of multiplicity w i,j for every pair of nodes ( i, j ). (If w i,j = 0, thenthere is no edge from i to j .) This graph is called the concurrence graph of thedesign. We have M ( w ) = P L ( w ) P T , where L ( w ) := P i,j w i,j ( e i − e j )( e i − e j ) T ∈ R t × t is the Laplacian of G . In other words, M ( w ) is the submatrix OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP of the Laplacian of G obtained by removing its last row and last column. Soby Kirchhoff’s theorem the determinant of M ( w ) is the number of spanningtrees of G . In other words, the exact D -optimal designs of size N correspondto the graphs with t nodes and N edges that have a maximum number ofspanning trees. Remark 5.1.
There is an alternative parametrization of block designswith blocks of size two; see [17]. Define the observation matrices by A ′ i,j = U T ( e i − e j ) , (5.2)where the columns of U ∈ R t × ( t − form an orthonormal basis of Ker ( is the vector with all components equal to 1); that is, the t × t -matrix[ U, √ t ] is orthogonal. It can be seen that the t − M ′ ( w ) = P i,j w i,j A ′ i,j A ′ Ti,j = U T L ( w ) U coincide with the t − L ( w ), and the smallest eigenvalue of L ( w ) is 0. So the set of D -optimal de-signs for observation models (5.1) and (5.2) coincide. In our experiments, wehave used the former model (5.1) because it involves sparse information ma-trices and yields more efficient computations. However, note that for someother criteria depending on the eigenvalues of the information matrix, themodel given by (5.2) should be used.To illustrate the new capability of the MISOCP approach, we computeddesigns of N = 15 blocks on t = 10 treatments by imposing different typesof constraints on the replication numbers (i.e., the numbers of times thateach treatment is tested). Such constraints can be easily expressed by linear(in)equalities. For example, a design w has treatment j replicated r j timesif and only if j − X i =1 w i,j + t X i = j +1 w j,i = r j . The concurrence graphs of these constrained optimal designs are displayedin Figure 1. Note that these constrained exact optimal designs cannot becomputed by any of the standard methods.Mixed integer optimization solvers rely on sophisticated branch-and-cutalgorithms. After each iteration, the value L = Φ D ( M ( ˆ w )) of the best solu-tion ˆ w found so far is compared to an upper bound U provided by a seriesof continuous relaxation of the problem, and the gap defined by δ = U − LL is displayed. Note that δ can directly be interpreted as a guarantee on the D -efficiency of ˆ w , namely eff D ( ˆ w ) ≥ (1 + δ ) − . The following remark showsthat for block designs, the current best solution is actually proved to beexact D -optimal as soon as the gap δ reaches a small tolerance parameter ε > G. SAGNOL AND R. HARMAN
Design (a) (b) (c) (d)
CPU (s) 9.07 4.9 13.8 5.7Lower bound on eff D (initial) 90.15% 92.56% 91.36% 91.27%Lower bound on eff D (10 min) 100.0% 96.56% 98.04% 100.0% Fig. 1.
Concurrence graphs of the D -optimal designs of N = 15 blocks on t = 10 treat-ments, among the class of -block designs that (a) are equireplicate; (b) have half of thetreatments replicated times, and the other half replicated times; (c) have one treatmentreplicated at least times; (d) have two treatments replicated at least times. For eachcase, the table gives the time required by the MISOCP solver to find the optimal design;the (initial) lower bound on the D -efficiency of the optimal design, compared to the con-strained approximate design; the lower bound on the D -efficiency of the optimal designthat is guaranteed after 10 min of computing time. Remark 5.2.
Let T w denote the number of spanning trees of the con-currence graph G corresponding to an exact design w , and T ∗ denote themaximal number of spanning trees for a particular block design problem.By using the fact that T w = det M ( w ) is an integer, it can be seen that atolerance parameter of ε = (cid:18) T ∗ (cid:19) /m − ≃ mT ∗ ensures that the design w ∗ returned by the MISOCP approach is (perfectly)optimal. We have used this value of ε in our numerical experiments. Whenthe value of T ∗ is unknown, note that an upper bound can be used (e.g., thebound T ∗ ≤ t ( Nt − ) t − given by the optimal design w = [ Ns , . . . , Ns ] T for therelaxed problem without integer constraints).To achieve a faster convergence, a few variables can be set equal to 0 or 1in order to break the symmetry of the problem. For example, if we search fora D -optimal design in a class of exact designs with at least one treatmentreplicated exactly 4 times, we can assume without loss of generality thattreatment 1 has replication number 4, so w , = w , = w , = w , = 1 and w ,i = 0 for all i ∈ { , . . . , t } .The table in Figure 1 gives information on the computing time requiredby CPLEX. In all four situations, the optimal design was found in the first OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP seconds of computation. However, note that the time required to obtain acertificate of optimality can be much longer [a few minutes for cases (a)and (d), and as much as 3 hours for case (b)]. However, the bound onthe D -efficiency provided by the MISOCP solver after a few minutes isalready much better than the standard bound of D -efficiency relative to the(constrained) approximate optimal design.This example also demonstrates that sometimes we can use independenttheoretical results to add some linear constraints to the original optimum de-sign problem that can greatly improve the computational efficiency. Indeed,it has been conjectured that every optimal block design with blocks of sizetwo is (almost) equireplicate for t − ≤ N ≤ (cid:0) t (cid:1) . The conjecture is known tohold for t ≤
11 [8] and for all pairs ( t, N ) such that N ≥ (cid:0) t (cid:1) − t + 2 [29]. TheMISOCP solver required 333.7 s to obtain a certificate of optimality of thedesign plotted in Figure 1(a) in the class of equireplicate designs. In con-trast, several hours of computation are required if we omit the constraintson the replication numbers in the MISOCP formulation.More computational results for optimal block designs can be found inan earlier version of this manuscript that is available on the web [37]. Inparticular, we show that even for the case of standard (unconstrained) ex-act design problems ( W = W N ), the MISOCP approach sometimes outper-forms state-of-the-art algorithms such as the KL -exchange procedure [3].The manuscript [37] also presents numerical results on other criteria, suchas A -optimality and G -optimality. Locally D -optimal design in a study of chemical kinetics. Another clas-sical field of application of the theory of optimal experimental designs is thestudy of chemical kinetics. Here, the goal is to select the points in time atwhich a chemical reaction should be observed, to estimate the kinetic pa-rameters θ ∈ R m of the reaction (rates, orders, etc.). The measurements attime t are of the form y t = η t ( θ ) + ε t , where η t ( θ ) = [ η t , . . . , η kt ] T is the vec-tor of the concentrations of k reactants at time t and ε t is a random error.The kinetic models are usually given as a set of differential equations, whichcan be solved numerically to find the concentrations η t ( θ ) over time. Unlikethe linear model described in the introduction of this paper, in chemicalkinetics the expected measurements E [ y t ] = η t ( θ ) at time t depend nonlin-early on the vector θ of unknown parameters of the reaction. So a classicalapproach is to search for a locally optimal design using a prior estimate θ of the parameter, that is, a design which would be optimal if the truevalue of the parameters was θ . To do this, the observation equations arelinearized around θ , so in practice we replace the observation matrix A t of G. SAGNOL AND R. HARMAN each individual trial at time t by its sensitivity at θ , which is defined as F t := ∂ η t ( θ ) ∂ θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ = ∂η t ∂θ · · · ∂η kt ∂θ ... . . . ... ∂η t ∂θ m · · · ∂η kt ∂θ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ∈ R m × k . A classical example is presented in [3], the study of two consecutive reac-tions A θ → B θ → C. The chemical reactions are assumed to be of order θ and θ , respectively,so the concentrations of the reactants are determined by the differentialequations d [ A ] dt = − θ [ A ] θ ,d [ B ] dt = θ [ A ] θ − θ [ B ] θ , (5.3) d [ C ] dt = θ [ B ] θ , together with the initial condition ([ A ] , [ B ] , [ C ]) | t =0 = (1 , , θ , . . . , θ , which yields another setof differential equations that determines the elements ∂η jt ∂θ i of the sensitivitymatrices.We now assume that measurements can be performed at each t ∈ X = { . , . , . . . , . , } , where the time is expressed in seconds, and that theobserved quantities are the concentrations of the reactants A and C , thatis, k = 2 and η Tt = ([ A ]( t ) , [ C ]( t )). We have solved numerically the differen-tial equations governing the entries of ( F t ) t ∈X for θ := [1 , . , , T . Thesesensitivities are plotted in Figure 2.We used the MISOCP method to compute the exact D -optimal designof size N = 5 for this problem (for the prior estimate θ ). The optimumconsists in taking 1 measurement at t = 0 .
8, 3 measurements at t = 2 . t = 16 .
6. In comparison, the exchange algorithm (usingthe same settings as described for the block designs, with N R = 100) found adesign with 1 measurement for each t ∈ { . , . , . } and 2 measurementsat t = 2 .
6. This design is of course very close to the optimum (its D -efficiencyis 98 . true optimum could not be identified bythe exchange algorithm, even with a very large number of tries. We ranthe exchange procedure N R = 5000 times which took 100 s and returned a OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP Fig. 2.
Measurement sensitivities (entries of F t ) plotted against time for θ = [1 , . , , T . design of D -efficiency 99 . MIP OPTIMAL ).We plotted these designs in Figure 3 together with the concentrations ofthe reactants over time when we assume θ = θ . In the figure, we have alsoplotted other designs which can be of interest to practitioners. For example,it might be natural to search designs where at most 1 measurement is takenat a given point in time. The exchange algorithm can also be adapted to thecase of binary designs (by rejecting candidate points that already support thedesign during the exchange procedure). It returned a design of D -efficiency98 . { w . + w . + w . + w . + w . ≤ ,w . + w . + w . + w . + w . ≤ , . . . , G. SAGNOL AND R. HARMAN
Fig. 3.
Concentration of the reactants against time (determined by solving equa-tion (5.3), assuming θ = θ = [1 , . , , T ). Several designs are represented below thegraph. The marks indicate the time at which the measurements should be performed, andthe size of the marks indicate the number of measurements at a given point in time. Binarymeans that the design space is restricted to designs having at most one measurement foreach t ∈ X , and 1 second means that at least 1 second must separate 2 measurements. w . + w . + w . + w . + w . ≤ } . This model was solved in 42 s with CPLEX, and the corresponding optimaldesign is depicted on the last row of Figure 3. We do not know of any otheralgorithm that can handle this type of exact design problem with severallinear constraints.APPENDIX: OTHER OPTIMALITY CRITERIA
A.1. A K -optimality. Another widely used criterion in optimal design is A -optimality, which is defined byΦ A : M → (cid:26) (trace M − ) − , if M is nonsingular;0 , otherwise.More generally, it is possible to use the criterion of A K -optimality if theexperimenter is interested in the estimation of the parameter subsystem ϑ = K T θ ,Φ A | K : M → (cid:26) (trace K T M − K ) − , if range K ⊆ range M ;0 , otherwise. OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP Here M − denotes a generalized inverse of M ; see the discussion followingequation (1.3) in the Introduction. Note that Φ A | K coincides with Φ A if K = I m , and Φ A | K reduces to the criterion of c -optimality when K = c = is a column vector.The following lemma was already used in [34], under a slightly differentform. In fact, this lemma is a consequence of the Gauss–Markov theorem,which states that the variance–covariance matrix of the best linear unbiasedestimator of K T θ is proportional to K T M ( w ) − K (e.g., Pukelsheim [31]). Lemma A.1.
Let K be an ( m × k ) -matrix, and let w ∈ R s + be a vector ofdesign weights, such that the estimability condition range K ⊆ range M ( w ) is satisfied. Define I := { i ∈ [ s ] : w i > } . Then trace K T M ( w ) − K = min ( Z i ) i ∈ I X i ∈ I k Z i k F w i (A.1) s.t. X i ∈ I A i Z i = K, where the variables Z i ( i ∈ I ) are of size ℓ i × k . After some changes of variable, we obtain an SOC representation of Φ A | K : Proposition A.2.
Let K be an ( m × k ) -matrix, and let w ∈ R s + be avector of design weights. Then Φ A | K ( M ( w )) = max µ ∈ R s + ,Y i ∈ R ℓi × k X i ∈ [ s ] µ i s.t. X i ∈ [ s ] A i Y i = (cid:18)X i ∈ [ s ] µ i (cid:19) K, (A.2) ∀ i ∈ [ s ] , k Y i k F ≤ w i µ i . Proof.
We first handle the case where the estimability condition is notsatisfied. In this situation, we have Φ A | K ( M ( w )) = 0, and we will see thatthe first constraint of problem (A.2) can only be satisfied if P si =1 µ i = 0.Note that the second constraint of problem (A.2) implies Y i = 0 for all i / ∈ I . Hence every column of the matrix P si =1 A i Y i must be in the setrange[ √ w A , . . . , √ w s A s ] = range M ( w ). Thus if (at least) one column of K is not included in the range of M ( w ), then we must have P i µ i = 0.Now, assume that the estimability condition range K ⊆ range M ( w ) holds,so that Φ A | K ( M ( w )) = (trace K T M ( w ) − K ) − > . G. SAGNOL AND R. HARMAN
Let Z i ( ∀ i ∈ I ) be optimal matrices for the problem on the right-hand sideof (A.1). Then for all i ∈ I , define λ i := trace w − i Z Ti Z i = w − i k Z i k F , µ i :=( P j λ j ) − λ i , and Y i := ( P j λ j ) − Z i [note that P i ∈ I λ i = trace K T M ( w ) − K > i ∈ [ s ] \ I let µ i := 0 and Y i := 0 ∈ R ℓ i × k . We have P i ∈ [ s ] µ i =( P i ∈ I λ i ) − = Φ A | K ( M ( w )), and by construction the variables µ i and Y i satisfy the constraints of problem (A.2).Conversely, let µ i and Y i be feasible variables for problem (A.2). If P i µ i =0, then we have P i µ i < Φ A | K ( M ( w )). Otherwise, define Z i := ( P i ∈ [ s ] µ i ) − Y i ,so that the variables Z i ( i ∈ I ) are feasible for the problem on the right-handside of (A.1). Hencetrace K T M ( w ) − K ≤ X i ∈ I w i k Z i k F = X i ∈ I w i ( P i ∈ [ s ] µ i ) k Y i k F ≤ X i ∈ I w i µ i w i ( P i ∈ [ s ] µ i ) ≤ P i ∈ [ s ] µ i . Finally, we obtain the desired inequality by taking the inverse X i ∈ [ s ] µ i ≤ Φ A | K ( M ( w )) . This completes the proof of the proposition. (cid:3)
Corollary A.3.
Let K be an m × k matrix. The function w Φ A | K ( M ( w )) is SOC-representable. The reformulation of problem (1.1) for the criterion Φ = Φ A | K as an(MI)SOCP is indicated in Table 1. Remark A.4 (The case of c -optimality). The case of c -optimality arisesas a special case of both A K and D K -optimality when the matrix K = c = is a column vector ( k = 1). The two SOCP formulations (for Φ A | c and Φ D | c -in Table 1) are equivalent, which can be verified by the change of variables Y i = J − , Z i , µ i = J − , t i . (Note that here the matrix J is of size 1 ×
1, i.e., ascalar.)We next show how Proposition A.2 can be used to obtain an SOC repre-sentation of G and I -optimality. A.2. G -optimality. A criterion closely related to D -optimality is the cri-terion of G -optimality,Φ G : M → (cid:16) max i ∈ [ s ] trace A Ti M − A i (cid:17) − = min i ∈ [ s ] Φ A | A i ( M ) , OMPUTING EXACT OPTIMAL DESIGNS BY MISOCP where the equality holds if we use the convention trace K T M − K := + ∞ for all matrices M that do not satisfy the estimability condition (range K * range M ). In the common case of single-response experiments for linear mod-els, the matrices A i are column vectors, and the scalar σ A Ti M ( w ) − A i rep-resents the variance of the prediction ˆ y i = A Ti ˆ θ . Hence G -optimal designsminimize the maximum variance of the predicted values ˆ y , . . . , ˆ y s .The G and D -optimality criteria are related to each other by the cele-brated equivalence theorem of Kiefer and Wolfowitz [22], which was gener-alized to the case of multivariate regression ( ℓ i >
1) by Fedorov in 1972 [13].An important consequence of this theorem is that D - and G -optimal designscoincide when the weight domain W is the standard probability simplex W ∆ .However, exact G -optimal designs do not necessarily coincide with their D -optimal counterparts. In a recent article [33], the Brent minimization algo-rithm was proposed to compute near exact G -optimal factorial designs. Butin general, we do not know any standard algorithm for the computation ofexact G -optimal designs or G -optimal designs over arbitrary weight domains W that are defined by a set of linear inequalities.We know from Corollary A.3 that the concave functions f i : w → Φ A | A i ( M ( w )) are SOC-representable, and hence their minimum is also con-cave and SOC-representable. An (MI)SOCP formulation of problem (1.1)for the criterion Φ = Φ G is indicated in Table 1. For the case where theweight domain W is the probability simplex W ∆ , it gives a new alterna-tive SOCP formulation for D -optimality. Note, however, that in this sit-uation, the SOCP formulation (2.1) for D -optimality from [34] is usuallymore compact (i.e., it involves fewer variables and fewer constraints) thanthe G -optimality SOCP of Table 1. A.3. I -optimality. Another widely used criterion is the one of I -optimality(or V -optimality). Here, the criterion is the inverse of the average of the vari-ances of the predicted values ˆ y , . . . , ˆ y s :Φ I : M → (cid:18) s X i ∈ [ s ] trace A Ti M − A i (cid:19) − . In fact, this criterion coincides with the Φ A | K criterion, by setting K to any matrix of full column rank satisfying KK T = s P si =1 A i A Ti ; see, forexample, Section 9.8 in [31]. Hence Φ I -optimal designs can be computed bySOCP. Note that there is also a weighted version of I -optimality, which canbe reduced to an A K -optimal design problem in the same manner. A.4. Bayesian optimal designs for nonlinear models.
For nonlinear mod-els, the information matrix of a design w depends on the value of the un-known parameter θ [we denote it by M ( w , θ )]; see, for example, [7]. One G. SAGNOL AND R. HARMAN way to handle this challenging cyclic problem is to search a design w max-imizing the expected value Φ π ( w ) of the criterion Φ with respect to someprior π , Φ π ( w ) := Z θ ∈ R m Φ( M ( w , θ )) π ( d θ ) . Another alternative, known as standardized Bayesian design , is to search fora design maximizing the expected efficiency φ π ( w ) := Z θ ∈ R m Φ( M ( w , θ ))max ω ∈W Φ( M ( ω , θ )) π ( d θ ) . In a recent article, Duarte and Wong approximated such integrals byfinite sums using Gaussian quadrature formulas [11], in order to obtainSDP formulations of Bayesian optimal design problems. By using the sametechnique, we immediately see that the Bayesian versions Φ π and φ π of aSOC-representable criterion Φ are also SOC-representable (modulo the ap-proximation of the integral by a finite sum). This offers the possibility ofcomputing (constrained) exact Bayesian designs by using MISOCP solvers.Finally, we point out that the standard Bayesian versions of the D - and A -criteria have forms that slightly differ from the formulas given above, andwhich have other statistical interpretations (see [7] for more details),Φ D,π ( w ) := Z θ ∈ R m logdet M ( w , θ ) π ( d θ ) , Φ A,π ( w ) := − Z θ ∈ R m trace M ( w , θ ) − π ( d θ ) . Bayesian optimality with respect to the above criteria can also be for-mulated as an (MI)SOCP, by combining the techniques used in the presentpaper with those of [11]. REFERENCES [1]
Alizadeh, F. and
Goldfarb, D. (2003). Second-order cone programming.
Math.Program. Andersen, E. D. , Jensen, B. , Jensen, J. , Sandvik, R. and
Worsøe, U. (2009).MOSEK Version 6. Technical Report TR–2009–3, MOSEK.[3]
Atkinson, A. C. and
Donev, A. N. (1992).
Optimum Experimental Designs .Oxford Univ. Press, Oxford.[4] Bailey, R. A. and
Cameron, P. J. (2009). Combinatorics of optimal designs. In
Surveys in Combinatorics 2009 . London Mathematical Society Lecture Note Se-ries [5] Ben-Tal, A. and
Nemirovski, A. (1987).
Lectures on Modern Convex Optimization:Analysis, Algorithms, and Engineering Applications . SIAM, Philadelphia.[6] Boyd, S. and
Vandenberghe, L. (2004).
Convex Optimization . Cambridge Univ.Press, Cambridge. MR2061575[7]
Chaloner, K. and
Verdinelli, I. (1995). Bayesian experimental design: A review.
Statist. Sci. Chen, A. and
Esfahanian, A. H. (2005). A demography of t -optimal ( n, m ) graphswhere n < = 12. In AMCS
Cook, D. and
Fedorov, V. (1995). Constrained optimization of experimental de-sign.
Statistics Cook, R. D. and
Thibodeau, L. A. (1980). Marginally restricted D -optimal designs. J. Amer. Statist. Assoc. Duarte, B. P. M. and
Wong, W. K. (2015). Finding Bayesian optimal designs fornonlinear models: A semidefinite programming-based approach.
Int. Stat. Rev.
To appear. DOI:10.1111/insr.12073.[12]
Fedorov, V. and
Lee, J. (2000). Design of experiments in statistics. In
Handbook ofSemidefinite Programming ( H. Wolkowicz , R. Saigal and
L. Vandenberghe ,eds.). Kluwer, Dordrecht.[13]
Fedorov, V. V. (1972).
Theory of Optimal Experiments . Academic Press, New York.MR0403103[14]
Filov´a, L. , Trnovsk´a, M. and
Harman, R. (2012). Computing maximin efficientexperimental designs using the methods of semidefinite programming.
Metrika Haines, L. M. (1987). The application of the annealing algorithm to the constructionof exact optimal designs for linear-regression models.
Technometrics Harman, R. (2014). Multiplicative methods for computing D -optimal stratified de-signs of experiments. J. Statist. Plann. Inference
Harman, R. and
Filov´a, L. (2014). Computing efficient exact designs of exper-iments using integer quadratic programming.
Comput. Statist. Data Anal. Harman, R. and
Jur´ık, T. (2008). Computing c -optimal experimental designs usingthe simplex method of linear programming. Comput. Statist. Data Anal. Harman, R. and
Pronzato, L. (2007). Improvements on removing nonoptimal sup-port points in D -optimum design algorithms. Statist. Probab. Lett. Heredia-Langner, A. , Carlyle, W. M. , Montgomery, D. C. , Borror, C. M. and
Runger, G. C. (2003). Genetic algorithms for the construction of D -optimal designs. J. Qual. Technol. Kiefer, J. and
Wolfowitz, J. (1960). The equivalence of two extremum problems.
Canad. J. Math. Lobo, M. S. , Vandenberghe, L. , Boyd, S. and
Lebret, H. (1998). Applications ofsecond-order cone programming.
Linear Algebra Appl.
L¨ofberg, J. (2004). YALMIP: A toolbox for modeling and optimization in Matlab.In
Lu, Z. and
Pong, T. K. (2013). Computing optimal experimental designs via interiorpoint method.
SIAM J. Matrix Anal. Appl. G. SAGNOL AND R. HARMAN[26]
Mart´ın-Mart´ın, R. , Torsney, B. and
L´opez-Fidalgo, J. (2007). Construction ofmarginally and conditionally restricted designs using multiplicative algorithms.
Comput. Statist. Data Anal. Mitchell, T. J. (1974). An algorithm for the construction of “ D -optimal” experi-mental designs. Technometrics Papp, D. (2012). Optimal designs for rational function regression.
J. Amer. Statist.Assoc.
Petingi, L. , Boesch, F. and
Suffel, C. (1998). On the characterization ofgraphs with maximum number of spanning trees.
Discrete Math.
Pronzato, L. and
Zhigljavsky, A. (2014). Algorithmic construction of optimaldesigns on compact sets for concave and differentiable criteria.
J. Statist. Plann.Inference
Pukelsheim, F. (1993).
Optimal Design of Experiments . Wiley, New York.MR1211416[32]
Pukelsheim, F. and
Rieder, S. (1992). Efficient rounding of approximate designs.
Biometrika Rodriguez, M. , Jones, B. , Borror, C. M. and
Montgomery, D. C. (2010).Generating and assessing exact G -optimal designs. J. Qual. Technol. Sagnol, G. (2011). Computing optimal designs of multiresponse experiments reducesto second-order cone programming.
J. Statist. Plann. Inference
Sagnol, G. (2012). PICOS, a Python interface to conic optimization solvers. Tech-nical Report No. 12–48, ZIB, http://picos.zib.de.[36]
Sagnol, G. (2013). On the semidefinite representation of real functions applied tosymmetric matrices.
Linear Algebra Appl.
Sagnol, G. and
Harman, R. (2013). Computing exact D -optimal designsby mixed integer second order cone programming. Preprint. Available atarXiv:1307.4953v2.[38] Seber, G. A. F. (2008).
A Matrix Handbook for Statisticians . Wiley, Hoboken, NJ.MR2365265[39]
Silvey, S. D. , Titterington, D. M. and
Torsney, B. (1978). An algorithm foroptimal designs on a finite design space.
Comm. Statist. Theory Methods Tack, L. and
Vandebroek, M. (2004). Budget constrained run orders in optimumdesign.
J. Statist. Plann. Inference
Titterington, D. M. (1976). Algorithms for computing D -optimal design on finitedesign spaces. In Proceedings of the 1976 Conf. on Information Science and Sys-tems
Vandenberghe, L. , Boyd, S. and
Wu, S.-P. (1998). Determinant maximizationwith linear matrix inequality constraints.
SIAM J. Matrix Anal. Appl. Welch, W. J. (1982). Branch-and-bound search for experimental designs based on D -optimality and other criteria. Technometrics Wright, S. E. , Sigal, B. M. and
Bailer, A. J. (2010). Workweek optimization ofexperimental designs: Exact designs for variable sampling costs.
J. Agric. Biol.Environ. Stat. Wynn, H. P. (1970). The sequential generation of D -optimum experimental designs. Ann. Math. Statist. [46] Yang, M. , Biedermann, S. and
Tang, E. (2013). On optimal designs for nonlinearmodels: A general and efficient algorithm.
J. Amer. Statist. Assoc.
Yu, Y. (2010). Monotonic convergence of a general algorithm for computing optimaldesigns.
Ann. Statist. Yu, Y. (2011). D -optimal designs via a cocktail algorithm. Stat. Comput. Department OptimizationZuse Institut BerlinTakustr. 714195 BerlinGermanyE-mail: [email protected]