Equivalence theorems for compound design problems with application in mixed models
EEquivalence theorems for compound design problems withapplication in mixed models
Maryna Prus
Abstract
In the present paper we consider design criteria which depend on several designs simul-taneously. We formulate equivalence theorems based on moment matrices (if criteria dependon designs via moment matrices) or with respect to the designs themselves (for finite de-sign regions). We apply the obtained optimality conditions to the multiple-group randomcoefficient regression models and illustrate the results by simple examples.
Keywords:
Optimal design, optimality condition, multiple-group, mixed model, randomcoefficient regression, multi-response
The subject of this work is compound design problems - optimization problems with optimalitycriteria depending on several designs simultaneously. Such optimality criteria can be, for example,commonly used design criteria for estimation of unknown model parameters in case when thecovariance matrix of the estimation depends on several designs (see e. g. Fedorov and Jones(2005), Schmelter (2007a)). For such criteria general equivalence theorem proposed in Kiefer(1974) cannot be used directly. In Fedorov and Jones (2005) optimal designs were obtained forspecific regression functions. In Schmelter (2007a) particular group-wise identical designs havebeen discussed.In this paper we formulate equivalence theorems for two kinds of compound design problems:1) problems on finite experimental regions and 2) problems with optimality criteria depending ondesigns via moment (or information) matrices. For both cases we assume the optimality criteriato be convex and differentiable in the designs themselves or the moment matrices, respectively.In case 1) we formulate optimality conditions with respect to the designs directly (as proposed inWhittle (1973) for one-design problems). These results can be useful in situations when designcriteria cannot be presented as functions of moment matrices (see e. g. Bose and Mukerjee(2015)). In case 2) optimality conditions are formulated with respect to the moment matrices.Therefore, no additional restrictions of the experimental regions are needed.We apply the equivalence theorems to multiple-group random coefficient regression (RCR)models. In these models observational units (individuals) are assigned to several groups. Withinone group same designs (group-designs) for all individuals have been assumed. Group-designsfor individuals from different groups are in general not the same. Most of commonly used designcriteria in multiple-group RCR models are functions of several group-designs. The particularcase of these models with one observation per individual has been considered in Graßhoff et al. (2012). In Prus (2015), ch. 6, models with group-specific mean parameters were briefly discussed.Bludowsky et al. (2015), Kunert et al. (2010), Lemme et al. (2015) and Prus (2019) consideredmodels with particular regression functions and specific covariance structure of random effects.In Entholzner et al. (2005) and Prus and Schwabe (2016) same design for all observational unitshave been assumed.The paper has the following structure: Section 2 provides equivalence theorems for the com-pound design problems. In Section 3 we apply the obtained optimality conditions to the multiple-group RCR models. In Section 4 we illustrate the results by a simple example. The paper isconcluded by a short discussion in Section 5. 1 a r X i v : . [ m a t h . S T ] J u l Optimality Conditions for Compound Design Problems
We consider a compound design problem in which ξ , . . . , ξ s are probability measures (designs)on experimental regions X , . . . , X s , respectively, and φ is a design criterion which depends on ξ , . . . , ξ s simultaneously and has to be minimized. Ξ i denotes the set of all designs on X i , i = 1 , . . . , s . For any x i ∈ X i , δ x i denotes the particular design ξ i with all observations atpoint x i . For convenience we use the notation ξ = ( ξ , . . . ξ s ) for a vector of designs ξ i ∈ Ξ i , i = 1 , . . . , s . Then ξ ∈ Ξ for Ξ = × si =1 Ξ i , where ” × ” denotes the Cartesian product.In Section 2.1 we consider compound design problems, where all design regions are assumedto be finite. We formulate an equivalence theorem (Theorem 1) with respect to the designsdirectly.In Section 2.2 we consider design criteria depending on designs via moment matrices and wepropose an equivalence theorem (Theorem 2) based on this structure. In this case no additionalrestrictions of the experimental regions are needed. In this section we restrict ourselves on optimization problems on finite design regions: |X i | = k i < ∞ for all i = 1 , . . . , s . φ : Ξ → ( −∞ ; ∞ ] denotes a design criterion. We use the notation Φ( ξ , ˜ ξ ) for the directional derivative of φ at ξ in direction of ˜ ξ : Φ( ξ , ˜ ξ ) = lim α (cid:38) α (cid:16) φ ((1 − α ) ξ + α ˜ ξ ) − φ ( ξ ) (cid:17) . (1)Further we define the partial directional derivative of φ at ξ i in direction of ˜ ξ i as follows: Φ ξ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ i , ˜ ξ i ) = Φ( ξ , ˘ ξ ) , (2)where ˘ ξ = ( ˘ ξ , . . . ˘ ξ s ) with ˘ ξ i (cid:48) = ξ i (cid:48) , i (cid:48) = 1 , . . . , s , i (cid:48) (cid:54) = i , and ˘ ξ i = ˜ ξ i . Theorem 1.
Let φ : Ξ → ( −∞ ; ∞ ] be convex and differentiable.a) The following statements are equivalent:(i) ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) minimizes φ ( ξ ) (ii) (cid:80) si =1 Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) ≥ , ∀ ξ i ∈ Ξ i , i = 1 , . . . , s (iii) Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) ≥ , ∀ ξ i ∈ Ξ i , i = 1 , . . . , s (iv) Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , δ x i ) ≥ , ∀ x i ∈ X i , i = 1 , . . . , s .b) Let ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) minimize φ ( ξ ) . Let x i be a support point of ξ ∗ i , i ∈ { , . . . , s } . Then Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , δ x i ) = 0 .c) Let ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) minimize φ ( ξ ) . Then the point ( ξ ∗ , ξ ∗ ) is a saddle point of Φ in that Φ( ξ ∗ , ξ ) ≥ ξ ∗ , ξ ∗ ) ≥ Φ(˜ ξ , ξ ∗ ) , ∀ ξ , ˜ ξ ∈ Ξ (3) and the point ( ξ ∗ i , ξ ∗ i ) is a saddle point of Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i in that Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) ≥ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ ∗ i ) ≥ Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ˜ ξ i , ξ ∗ i ) , ∀ ξ i , ˜ ξ i ∈ Ξ i , (4) for all i = 1 , . . . , s . roof. a) (i) ⇔ (ii) :For this proof we present designs in form of row vectors ξ i = ( w i , . . . , w ik i ) , where w it ≥ isthe weight of observations at x it , the t -th point of the experimental region X i , t = 1 , . . . , k i , (cid:80) k i t =1 w it = 1 (see also Boyd and Vandenberghe (2004), ch. 7). Then ξ = ( ξ , . . . , ξ s ) is thefull (row) vector of all weights of observations at all points of all experimental regions.We use the notations ∇ ξ i φ for the gradient of φ with respect to ξ i : ∇ ξ i φ = (cid:16) ∂φ∂w it (cid:17) t =1 ,...,k i ,and ∇ ξ φ for the gradient of φ with respect to ξ , which means ∇ ξ φ = ( ∇ ξ φ, . . . , ∇ ξ s φ ) .(Gradients ∇ ξ i φ and ∇ ξ φ are row vectors).According to convex optimization theory (see e. g. Boyd and Vandenberghe (2004), ch. 4) ξ ∗ minimizes φ iff Φ( ξ ∗ , ξ ) ≥ for all ξ ∈ Ξ . Then the equivalence of (i) and (ii) followsfrom Φ( ξ ∗ , ξ ) = ∇ ξ φ ( ξ ∗ )( ξ − ξ ∗ ) (cid:62) = s (cid:88) i =1 ∇ ξ i φ ( ξ ∗ )( ξ i − ξ ∗ i ) (cid:62) = s (cid:88) i =1 Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) . (ii) ⇔ (iii) : (iii) ⇒ (ii) Straightforward(ii) ⇒ (iii): Let ∃ ˜ ξ i ∈ X i with Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ˜ ξ i ) < . Let ξ i = ˜ ξ i and ξ i = ξ ∗ i , ∀ i (cid:54) = i .Then for all i (cid:54) = i we have Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) = Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ ∗ i ) = 0 , which results in s (cid:88) i =1 Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) = Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ˜ ξ i ) + (cid:88) i ∈{ ,...,s }\ i Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ ∗ i )= Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ˜ ξ i ) < . (iii) ⇔ (iv) : (iii) ⇒ (iv) Straightforward(iv) ⇒ (iii) Let x i = x it be the t -th point in X i , t = 1 , . . . , k i . Then the one-point designwith all observations at x i is given by δ x it = e t , where e t is the t -th unit (row) vector oflength k i . A design ξ i can be written as ξ i = (cid:80) k i t =1 w it . Then the directional derivative of φ at ξ ∗ i in direction of ξ i can be presented in form Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) = k i (cid:88) t =1 w it ∇ ξ i φ ( ξ ∗ )( e t − ξ ∗ i ) (cid:62) , which results in Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ i ) = k i (cid:88) t =1 w it Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , δ x it ) ≥ . b) Let the support point x i = x it (cid:48) be the t (cid:48) -th point in X i , t (cid:48) ∈ , . . . , k i . Then for ξ ∗ i =( w ∗ i , . . . , w ∗ ik i ) we have w ∗ it (cid:48) > and Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ ∗ i ) = k i (cid:88) t =1 w ∗ it Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , δ x it ) . Let Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , δ x it (cid:48) ) > . Then since Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , δ x it ) ≥ , t = 1 , . . . , k i , we obtain Φ ξ ∗ i (cid:48) ,i (cid:48) (cid:54) = i ( ξ ∗ i , ξ ∗ i ) > . 3) The left-hand sides of both (3) and (4) are straightforward. From formula (1) and convexityof φ we obtain Φ(˜ ξ , ξ ∗ ) ≤ φ ( ξ ∗ ) − φ (˜ ξ ) , which is non-positive for optimal ξ ∗ and all ˜ ξ ∈ Ξ . Similarly using formula (2) we obtainthe right-hand side of (4). We use the notation M i ( ξ i ) for a matrix which characterizes a design ξ i . We assume M i ( ξ i ) tosatisfy the condition M i ( ξ i ) = (cid:90) X i M i ( δ x i ) ξ i ( d x i ) (5)for all i = 1 , . . . , s . We call this matrix moment matrix of a design ξ i (see e. g. Pukelsheim (1993)). M i denotes the set of all moment matrices M i ( ξ i ) , ξ i ∈ Ξ i . For M ( ξ ) = ( M ( ξ ) , . . . , M s ( ξ s )) , M denotes the set of all M ( ξ ) , ξ ∈ Ξ . Then M = × si =1 M i and M is convex. φ : M → ( −∞ ; ∞ ] is a design criterion. Φ( M , ˜ M ) denotes the directional derivative of φ at M in direction of ˜ M : Φ( M , ˜ M ) = lim α (cid:38) α (cid:16) φ ((1 − α ) M + α ˜ M ) − φ ( M ) (cid:17) . (6)We define the partial directional derivative of φ at M i in direction of ˜ M i as follows: Φ M i (cid:48) ,i (cid:48) (cid:54) = i ( M i , ˜ M i ) = Φ( M , ˘ M ) , (7)where ˘ M i (cid:48) = M i (cid:48) , i (cid:48) = 1 , . . . , s , i (cid:48) (cid:54) = i , and ˘ M i = ˜ M i . Theorem 2.
Let φ : M → ( −∞ ; ∞ ] be convex and differentiable.a) The following statements are equivalent:(i) ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) minimizes φ ( M ( ξ )) (ii) (cid:80) si =1 Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i )) ≥ , ∀ ξ i ∈ Ξ i , i = 1 , . . . , s (iii) Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i )) ≥ , ∀ ξ i ∈ Ξ i , i = 1 , . . . , s (iv) Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( δ x i )) ≥ , ∀ x i ∈ X i , i = 1 , . . . , s .b) Let ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) minimize φ ( M ( ξ )) . Let x i be a support point of ξ ∗ i , i ∈ { , . . . , s } .Then Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( δ x i )) = 0 .c) Let ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) minimize φ ( M ( ξ )) . Then the point ( M ( ξ ∗ ) , M ( ξ ∗ )) is a saddle pointof Φ in that Φ( M ( ξ ∗ ) , M ( ξ )) ≥ M ( ξ ∗ ) , M ( ξ ∗ )) ≥ Φ( M (˜ ξ ) , M ( ξ ∗ )) , ∀ ξ , ˜ ξ ∈ Ξ (8) and the point ( M i ( ξ ∗ i ) , M i ( ξ ∗ i )) is a saddle point of Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i in that Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i )) ≥ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ ∗ i )) ≥ Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ˜ ξ i ) , M i ( ξ ∗ i )) , ∀ ξ i , ˜ ξ i ∈ Ξ i , (9) for all i = 1 , . . . , s . roof. a) (i) ⇔ (ii) :We use for the gradients of φ with respect to M i and M the notation ∇ M i φ = (cid:18) ∂φ∂m kl (cid:19) k,l , M i = ( m kl ) k,l and ∇ M φ = (cid:18) ∂φ∂m kl (cid:19) k,l , M = ( m kl ) k,l , respectively. ξ ∗ minimizes φ iff Φ( M ( ξ ∗ ) , M ( ξ )) ≥ for all ξ ∈ Ξ . The directional derivative can becomputed by formula Φ( M , ˜ M ) = ∂φ∂α (cid:16) (1 − α ) M + α ˜ M (cid:17) | α =0 . (10)Then using some matrix differentiation rules (see e. g. Seber (2007), ch. 17) we receive Φ( M ( ξ ∗ ) , M ( ξ )) = tr (cid:16) ∇ M φ ( M ( ξ ∗ ))( M ( ξ ) − M ( ξ ∗ )) (cid:62) (cid:17) = s (cid:88) i =1 tr ( ∇ M i φ ( M ( ξ ∗ ))( M i ( ξ i ) − M i ( ξ ∗ i )))= s (cid:88) i =1 Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i )) , which implies the equivalence of (i) and (ii). (ii) ⇔ (iii) : (iii) ⇒ (ii) Straightforward(ii) ⇒ (iii): Let ∃ ˜ ξ i with Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ˜ ξ i )) < . Then for ξ i = ˜ ξ i and ξ i = ξ ∗ i , ∀ i (cid:54) = i , we obtain s (cid:88) i =1 Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i )) = Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ˜ ξ i ))+ (cid:88) i ∈{ ,...,s }\ i Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i ))= Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ˜ ξ i )) < . (iii) ⇔ (iv) : (iii) ⇒ (iv) Straightforward(iv) ⇒ (iii) The directional derivative of φ at M i in direction of ˜ M i is linear in the secondargument: Φ M i (cid:48) ,i (cid:48) (cid:54) = i ( M i , ˜ M i ) = tr (cid:16) ∇ M i φ ( M )( ˜ M i − M i ) (cid:17) . Then using formula (5) we obtain Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ i )) = (cid:90) X i Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( δ x i )) ξ i ( d x i ) (11)for each ξ i ∈ Ξ i .b) The result follows from formula (11), Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( δ x i )) ≥ , for all x i ∈ X i ,and Φ M i (cid:48) ( ξ ∗ i (cid:48) ) ,i (cid:48) (cid:54) = i ( M i ( ξ ∗ i ) , M i ( ξ ∗ i )) = 0 . 5) The left-hand sides of both (8) and (9) are straightforward. From convexity of φ andformula (6) we obtain Φ( M (˜ ξ ) , M ( ξ ∗ )) ≤ φ ( M ( ξ ∗ )) − φ ( M (˜ ξ )) , ∀ ˜ ξ ∈ Ξ , which implies the right-hand side of (8). Similarly using formula (7) we obtain the right-hand side of (9). We consider multiple-group RCR models in which N observational units are assigned to s groups: n i observational units in the i -th group, (cid:80) si =1 n i = N . The group sizes and the group allocationof observational units are fixed. Experimental designs are assumed to be the same for all obser-vational units within one group (group-design): m i observations per unit in design points x ih , h = 1 , . . . , m i , in group i . However, for units from different groups experimental designs are ingeneral not the same: m i (cid:48) (cid:54) = m i (cid:48)(cid:48) and (or) x i (cid:48) h (cid:54) = x i (cid:48)(cid:48) h , i (cid:48) (cid:54) = i (cid:48)(cid:48) .Note that the experimental settings x i , . . . , x im i in group i are not necessarily all distinct(repeated measurements are not excluded).Note also that observational units (often called individuals in the literature) are usuallyexpected to be people, animals or plants. However, they may also be studies, centers, clinics,plots, etc. In multiple-group random coefficient regression models the h -th observation of the j -th observa-tional unit in the i -th group is given by the following l -dimensional random column vector Y ijh = G i ( x ih ) β ij + ε ijh , x ih ∈ X i , h = 1 , . . . , m i , j = 1 , . . . , n i , i = 1 , . . . , s, (12)where G i denotes a group-specific ( l × p ) matrix of known regression functions in group i (inparticular case l = 1 : G i = f (cid:62) i , where f i is a p -dimensional column vector of regression functions),experimental settings x ih come from some experimental region X i , β ij = ( β ij , . . . , β ijp ) (cid:62) areunit-specific random parameters with unknown mean β and given ( p × p ) covariance matrix D i , ε ijh denote column vectors of observational errors with zero mean and non-singular ( l × l )covariance matrix Σ i . All observational errors and all random parameters are assumed to beuncorrelated.For the vector Y ij = ( Y (cid:62) ij , ..., Y (cid:62) ijm i ) (cid:62) of observations at the j -th observational unit in the i -th group we obtain Y ij = F i β ij + ε ij , j = 1 , . . . , n i , i = 1 , . . . , s, (13)where F i = ( G (cid:62) i ( x i ) , ..., G (cid:62) i ( x im i )) (cid:62) is the design matrix in group i and ε ij = ( ε (cid:62) ij , ..., ε (cid:62) ijm i ) (cid:62) .Then the vector Y i = ( Y (cid:62) i , . . . , Y (cid:62) in i ) (cid:62) of all observations in group i is given by Y i = ( I n i ⊗ F i ) β i + ε i , i = 1 , . . . s, (14)where β i = ( β (cid:62) i , . . . , β (cid:62) in i ) (cid:62) , ε i = ( ε (cid:62) i , . . . , ε (cid:62) in i ) (cid:62) , I n i is the n i × n i identity matrix and ” ⊗ ”denotes the Kronecker product, and the total vector Y = ( Y (cid:62) , . . . , Y (cid:62) s ) (cid:62) of all observations inall groups results in Y = n ⊗ F . . . n s ⊗ F s β + ˜ ε (15)6ith ˜ ε = block-diag ( I n ⊗ F , . . . , I n s ⊗ F s ) b + ε , where block-diag ( A , . . . , A s ) is the block-diagonal matrix with blocks A , . . . , A s , b = β − N ⊗ β for β = ( β (cid:62) , . . . , β (cid:62) s ) (cid:62) , ε = ( ε (cid:62) , . . . , ε (cid:62) s ) (cid:62) and n i is the column vector of length n i with all entries equal to .Using Gauss-Markov theory we obtain the following best linear unbiased estimator for themean parameters β : ˆ β = (cid:34) s (cid:88) i =1 n i (( ˜ F (cid:62) i ˜ F i ) − + D i ) − (cid:35) − s (cid:88) i =1 n i (( ˜ F (cid:62) i ˜ F i ) − + D i ) − ˆ β ,i , (16)where ˆ β ,i = ( ˜ F (cid:62) i ˜ F i ) − ˜ F (cid:62) i ˜¯ Y i is the estimator based only on observations in the i -th group, ˜ F i = ( I n i ⊗ Σ − / i ) F i and ˜¯ Y i = ( I n i ⊗ Σ − / i ) ¯ Y i (for the mean observational vector ¯ Y i = n i (cid:80) n i j =1 Y ij in the i -th group and the symmetric positive definite matrix Σ / i with the property Σ i = Σ / i Σ / i ) are the transformed design matrix and the transformed mean observationalvector with respect to the covariance structure of observational errors.Note that BLUE (16) exists only if all matrices ˜ F (cid:62) i ˜ F i are non-singular. Therefore, we restrictourselves on the case where design matrices F i are of full column rank for all i .The covariance matrix of the best linear unbiased estimator ˆ β is given byCov (cid:16) ˆ β (cid:17) = (cid:34) s (cid:88) i =1 n i (( ˜ F (cid:62) i ˜ F i ) − + D i ) − (cid:35) − . (17)In Fedorov and Jones (2005) similar results were obtained for the multi-center trials models. We define an exact design in group i as ξ i = (cid:18) x i , . . . , x ik i m i , . . . , m ik i (cid:19) , where x i , . . . , x ik i are the (distinct) experimental settings in X i with the related numbers of ob-servations m i , . . . , m ik i , (cid:80) k i h =1 m ih = m i . For analytical purposes we also introduce approximatedesigns: ξ i = (cid:18) x i , ..., x ik i w i , ..., w ik i (cid:19) , where w ih ≥ denotes the weight of observations at x ih , h = 1 , . . . k i , and (cid:80) k i h =1 w ih = 1 .We will use the following notation for the moment matrix in group i : M i ( ξ i ) = k i (cid:88) h =1 w ih ˜ G i ( x ih ) (cid:62) ˜ G i ( x ih ) , (18)where ˜ G i = Σ − / i G i . For exact designs we have w ih = m ih /m i and M i ( ξ i ) = 1 m i ˜ F (cid:62) i ˜ F i . We will also use the notation ∆ i = m i D i for the adjusted dispersion matrix of random effectsin group i . 7hen we extend the definition of the variance-covariance matrix (17) with respect to approx-imate designs: Cov ξ ,...,ξ s = (cid:34) s (cid:88) i =1 n i m i (cid:0) M i ( ξ i ) − + ∆ i (cid:1) − (cid:35) − . (19)Further we focus on the linear ( L -) and determinant ( D -) criteria for estimation of the meanparameters β . The linear criterion is defined as φ L = tr (cid:104) Cov (cid:16) L ˆ β (cid:17)(cid:105) , (20)where L is some linear transformation of β . For approximate designs we obtain φ L ( ξ , . . . , ξ s ) = tr (cid:34) s (cid:88) i =1 n i m i (cid:0) M i ( ξ i ) − + ∆ i (cid:1) − (cid:35) − V , (21)where V = L (cid:62) L . Remark 1.
The A-, and c-criteria for estimation of β are the particular linear criteria with V = I p and V = cc (cid:62) , for a p -dimensional real column vector c , respectively. If the regression matrices and the experimental regions are the same among all groups: G i = G , X i = X , i = 1 , . . . , s , the integrated mean squared error (IMSE-) criterion can also be usedfor multiple group models. We define the IMSE-criterion for estimation of the mean parameters β as follows: φ IMSE = tr (cid:18) E (cid:20)(cid:90) X (cid:16) G ( x ) ˆ β (cid:17) (cid:16) G ( x ) ˆ β (cid:17) (cid:62) ν ( d x ) (cid:21)(cid:19) , (22)where ν is some suitable measure on the experimental region X , which is typically chosen to beuniform on X with ν ( X ) = 1 . IMSE-criterion (22) can be recognized as the particular linearcriterion with V = (cid:82) X G ( x ) (cid:62) G ( x ) ν ( d x ) .The D -criterion is defined as the logarithm of the determinant of the covariance matrix ofthe estimation, which results in φ D ( ξ , . . . , ξ s ) = − ln det (cid:32) s (cid:88) i =1 n i m i (cid:0) M i ( ξ i ) − + ∆ i (cid:1) − (cid:33) (23)for approximate designs.Note that optimal designs depend on the group sizes n i only via the proportions n i /N , i = 1 , . . . , s , and are, hence, independent of the total number of observational units N itself.This statement is easy to verify by formulas (21) and (23). To make use of the equivalence theorems proposed in Section 2 we verify the convexity of thedesign criteria.
Lemma 1.
The L- and D-criteria for estimation of the mean parameters β are convex withrespects to M ( ξ ) = ( M ( ξ ) , . . . M s ( ξ s )) .Proof. The function φ ( N ) = N − is matrix-convex for any positive definite matrix N , i. e. ( α N + (1 − α ) N ) − ≤ α N − + (1 − α ) N − (24)8n Loewner ordering for any α ∈ [0 , and all positive definite N and N (see e. g. Seber(2007), ch. 10). Since φ is non-increasing in Loewner ordering, it is easy to verify that ψ i ( M i ) = (cid:0) M − i + ∆ i (cid:1) − is matrix-concave for any positive definite M i (see e. g. Bernstein (2018), ch. 10).Then ψ ( M , . . . M s ) = (cid:80) si =1 n i m i ψ i ( M i ) is matrix-concave with respects to M = ( M , . . . M s ) .The functions φ ( N ) = − ln det( N ) and φ ( N ) = tr (cid:0) N − V (cid:1) are non-increasing in Loewnerordering and convex for any positive definite matrix N and any positive semi-definite matrix V as the standard D - and L -criteria (see e. g. Pázman (1986), ch. 4, or Fedorov and Leonov (2013),ch. 2). Then the functions φ ◦ ψ and φ ◦ ψ are convex.We formulate optimality conditions for criteria (21) and (23) based on the results of Theo-rem 2. Theorem 3.
Approximate designs ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) are L-optimal for estimation of the meanparameters β iff tr ˜ G i ( x i ) M i ( ξ ∗ i ) − (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − (cid:34) s (cid:88) r =1 n r m r (cid:0) M r ( ξ ∗ r ) − + ∆ r (cid:1) − (cid:35) − V · (cid:34) s (cid:88) r =1 n r m r (cid:0) M r ( ξ ∗ r ) − + ∆ r (cid:1) − (cid:35) − (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − M i ( ξ ∗ i ) − ˜ G i ( x i ) (cid:62) ≤ tr M i ( ξ ∗ i ) − (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − (cid:34) s (cid:88) r =1 n r m r (cid:0) M r ( ξ ∗ r ) − + ∆ r (cid:1) − (cid:35) − V · (cid:34) s (cid:88) r =1 n r m r (cid:0) M r ( ξ ∗ r ) − + ∆ r (cid:1) − (cid:35) − (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − (25) for x i ∈ X i , i = 1 , . . . , s .For support points of ξ ∗ i equality holds in (25) .Proof. We obtain the results using Lemma 1 and parts a) (equivalence of (i) and (iv)) and b) ofTheorem 2 for the partial directional derivatives Φ L, M i (cid:48) ,i (cid:48) (cid:54) = i ( M i , ˜ M i ) = − n i m i tr (cid:34) s (cid:88) r =1 n r m r (cid:0) M − r + ∆ r (cid:1) − (cid:35) − (cid:0) M − i + ∆ i (cid:1) − M − i ( ˜ M i − M i ) · M − i (cid:0) M − i + ∆ i (cid:1) − (cid:34) s (cid:88) r =1 n r m r (cid:0) M − r + ∆ r (cid:1) − (cid:35) − V (26)for i = 1 , . . . , s . Theorem 4.
Approximate designs ξ ∗ = ( ξ ∗ , . . . , ξ ∗ s ) are D-optimal for estimation of the mean arameters β iff tr ˜ G i ( x i ) M i ( ξ ∗ i ) − (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − (cid:34) s (cid:88) r =1 n r m r (cid:0) M r ( ξ ∗ r ) − + ∆ r (cid:1) − (cid:35) − · (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − M i ( ξ ∗ i ) − (cid:105) ˜ G i ( x i ) (cid:62) (cid:111) ≤ tr M i ( ξ ∗ i ) − (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − (cid:34) s (cid:88) r =1 n r m r (cid:0) M r ( ξ ∗ r ) − + ∆ r (cid:1) − (cid:35) − · (cid:0) M i ( ξ ∗ i ) − + ∆ i (cid:1) − (cid:111) (27) for x i ∈ X i , i = 1 , . . . , s .For support points of ξ ∗ i equality holds in (27) .Proof. The optimality condition follows from Lemma 1 and Theorem 2 with the partial direc-tional derivatives Φ D, M i (cid:48) ,i (cid:48) (cid:54) = i ( M i , ˜ M i ) = − n i m i tr (cid:34) s (cid:88) r =1 n r m r (cid:0) M − r + ∆ r (cid:1) − (cid:35) − (cid:0) M − i + ∆ i (cid:1) − · M − i ( ˜ M i − M i ) M − i (cid:0) M − i + ∆ i (cid:1) − (cid:111) (28)for i = 1 , . . . , s .Note that the results of Theorems 3 and 4 coincide for l = 1 and n = n = 1 with optimalityconditions for group-wise identical designs in Schmelter (2007b), ch. 8. We consider the particular multiple-group models, where the regression matrices G i , the numbers m i of observations per observational unit, the covariance matrices D i and Σ i of random effectsand observational errors and the experimental regions X i are the same among all groups. Forthese models we have m i = m , ∆ i = ∆ and M i ( ξ ) = M ( ξ ) , i = 1 , . . . , s . Then L - and D -criteria(21) and (23) simplify to φ L ( ξ , . . . , ξ s ) = tr (cid:34) m s (cid:88) i =1 n i (cid:0) M ( ξ i ) − + ∆ (cid:1) − (cid:35) − V (29)and φ D ( ξ , . . . , ξ s ) = − ln det (cid:32) m s (cid:88) i =1 n i (cid:0) M ( ξ i ) − + ∆ (cid:1) − (cid:33) . (30)We denote by ξ ∗ L an optimal design for the classical linear criterion φ L ( ξ ) = tr (cid:0) M ( ξ ) − V (cid:1) (31)and ξ ∗ D is an optimal design for the D -criterion in the single-group model φ D ( ξ ) = ln det (cid:0) M ( ξ ) − + ∆ (cid:1) . (32)Then it can be easily verified that the designs ξ ∗ i = ξ ∗ L and ξ ∗ i = ξ ∗ D , i = 1 , . . . , s , satisfy theoptimality conditions in Theorems 3 and 4, respectively (see Fedorov and Hackl (1997), ch. 5,for the optimality condition with respect to D -criterion (32)).10 orollary 1. L-optimal designs in the fixed effects model are L-optimal as group-designs inthe multiple-group RCR model in which the numbers of observations m i , the regression matri-ces G i , the covariance matrices of random effects and observational errors D i and Σ i and theexperimental regions X i are the same for all groups. Corollary 2.
D-optimal designs in the single-group RCR model are D-optimal as group-designsin the multiple-group RCR model in which the numbers of observations m i , the regression ma-trices G i , the covariance matrices of random effects and observational errors D i and Σ i and theexperimental regions X i are the same for all groups. The latter statements are expected results in that all observational units in all groups havesame statistical properties and there are no group-specific restrictions on designs. Note that thegroup sizes have no influence on the designs in this case. However, as we will see in Section 4even for statistically identical observational units optimal designs may depend on the group sizesif the numbers of observations per unit differ from group to group.
We consider the two-groups model of general form (12) with the regression functions G i ( x ) =(1 , x ) (cid:62) , x ∈ X i , and the design region X i = [0 , , i = 1 , : Y ijh = β ij + β ij x ih + ε ijh , h = 1 , . . . , m i , j = 1 , . . . , n i . (33)The covariance structures of random effects and observational errors are given by D i = diag( d i , d i ) and Σ i = 1 for both groups. Group sizes n i and numbers observations per unit m i are in generalnot the same for the first and the second group.The left-hand sides of optimality conditions (25) and (27) result in parabolas with positiveleading terms for model (33). Then L - and D -optimal approximate designs have the form ξ i = (cid:18) − w i w i (cid:19) , (34)where w i denotes the weight of observations in point for the i -th group, and the momentmatrices are given by M i = (cid:18) w i w i w i (cid:19) . (35) We consider first the particular case of model (33) in which only the intercept β ij is random,i. e. d i = 0 . We focus on the A - and D -criteria, which are given by (21) for V = I and (23),respectively. The D -criterion for the random intercept model is given by φ D ( w , w ) = − ln (cid:88) i =1 n i m i d i m i + 1 (cid:88) i =1 n i m i w i ( d i m i (1 − w i ) + 1) d i m i + 1 − (cid:32) (cid:88) i =1 n i m i w i d i m i + 1 (cid:33) . (36)This function achieves for all values of n i and m i its minimum at point w ∗ = w ∗ = 0 . , whichcoincides with the optimal design in the fixed effects model and in the single-group randomintercept model (see Schwabe and Schmelter (2008)). The A -criterion for the random interceptmodel is given by φ A ( w , w ) = (cid:80) i =1 n i m i ( d i m i w i (1 − w i )+1+ w i ) d i m i +1 (cid:80) i =1 n i m i d i m i +1 (cid:80) i =1 n i m i w i ( d i m i (1 − w i )+1) d i m i +1 − (cid:16)(cid:80) i =1 n i m i w i d i m i +1 (cid:17) . (37)11able 1: A -optimal designs in random intercept model in dependence on group sizes n i andnumbers of observations m i for d i = 1 , i = 1 , Case no. n n m m m /m w ∗ − w ∗ w ∗ − w ∗ D -criterion, the A -optimal weights in general depend on the group sizes n i and the numbers of observations m i . Some numerical results for d i = 1 , i = 1 , , are presentedin Table 1. As we can see in the table, if the numbers of observations m i are the same for bothgroups: cases 2, 5, 8 and 11, the optimal weight w ∗ A = 0 . in the fixed effects model retains itsoptimality for the multiple-group model, which is in accordance with Corollary 1. In these casesoptimal designs are independent of the group sizes n i and the numbers of observations m i . Forall other cases the optimal weight w ∗ i is smaller (larger) than w ∗ A if the number of observations m i is smaller (larger) than the mean number of observations ( m + m ) / . For same groupsizes ( n = n ) optimal designs ”swap places” if the numbers of observations ”swap places”: theoptimal weight w ∗ in case 1 is the same as the optimal weight w ∗ in case 3 (same for cases 4 and6). This property, however, does not hold for different group sizes (cases 7 and 9 or 10 and 12). Now we consider the particular case of straight line regression model (33) in which only the slope β ij is random: d i = 0 . For this model the D -criterion is given by φ D ( w , w ) = − ln (cid:32) (cid:88) i =1 n i m i w i d i m i w i + 1 (cid:88) i =1 n i m i (1 − w i ) (cid:33) . (38)In contrast to the random intercept model, D -optimal designs for the random slope depend onthe group sizes and the numbers of observations. Numerical results for d i = 1 , i = 1 , , arepresented in Table 2. As we can see in the table, if m = m optimal designs are the samefor both groups (cases 2, 5, 8 and 11). However, they depend on the numbers of observations m i themselves (optimal weights in cases 2 and 8 differ from those in cases 5 and 11). Thisphenomenon is in accordance with Corollary 2 since the D -optimal weight w ∗ D in the single-group model (which minimizes criterion (32)) also depends on the number of observations (viamatrix ∆ ). In contrast to the random intercept model, for the random slope the optimal weight w ∗ i is larger (smaller) than w ∗ D if m i is smaller (larger) than ( m + m ) / .The minimization of the A -criterion for the random slope model leads (with the only exceptionin case m = m ) to a singular design matrix. Therefore, this criterion will not be considered12able 2: D -optimal designs in random slope model in dependence on group sizes n i and numbersof observations m i for d i = 1 , i = 1 , Case no. n n m m m /m w ∗ − w ∗ w ∗ − w ∗ In this work we considered design problems depending on several designs simultaneously. Weproposed equivalence theorems based on the assumptions of convexity and differentiability ofthe optimality criteria. For design problems with finite experimental regions we formulatedoptimality conditions with respect to the designs themselves (Theorem 1). If the optimalitycriteria depend on the designs via moment matrices only, optimality conditions are formulatedwith respect to the moment matrices (Theorem 2).We applied the proposed optimality conditions to the multiple-group RCR models. If allobservational units have the same statistical properties and there are no group-specific design-restrictions, optimal designs in the single-group models are also optimal as group-designs in themultiple-group models. In this case the group sizes have no influence on the designs. However,if the numbers of observations differ from group to group, optimal group-designs may dependon the numbers of observations and the group sizes. This behavior has been illustrated by theexample of straight line regression models.The proposed results solve the problem of optimal designs for the estimation of fixed effectsin the multiple-group RCR models. The problem of prediction of random parameters remains,however, open. Also for the computation of optimal approximate and exact designs some newapproach has to be developed. Moreover, we assumed fixed numbers of observational units n i and observations per unit m i . Design optimization with respect to these numbers may be aninteresting direction for a future research. Acknowledgment
This research was supported by grant SCHW 531/16 of the German Research Foundation (DFG).The author is grateful to Norbert Gaffke for fruitful discussions on optimality conditions inTheorem 2 and convexity in Lemma 1. Also discussions with Rainer Schwabe on application inmixed models have been very helpful. 13 eferences
Bernstein, D. S. (2018).
Scalar, Vector, and Matrix Mathematics . Princeton University Press.Bludowsky, A., Kunert, J., and Stufken, J. (2015). Optimal designs for the carryover model withrandom interactions between subjects and treatments.
Australian and New Zealand Journalof Statistics , , 517–533.Bose, M. and Mukerjee, R. (2015). Optimal design measures under asymmetric errors, withapplication to binary design points. Journal of Statistical Planning and Inference , , 28–36.Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge University Press.Entholzner, M., Benda, N., Schmelter, T., and Schwabe, R. (2005). A note on designs forestimating population parameters.
Biometrical Letters - Listy Biometryczne , , 25–41.Fedorov, V. and Hackl, P. (1997). Model-Oriented Design of Experiments . Springer, New York.Fedorov, V. and Jones, B. (2005). The design of multicentre trials.
Statistical Methods in MedicalResearch , , 205–248.Fedorov, V. and Leonov, S. (2013). Optimal Design for Nonlinear Response Models . CRC Press,Boca Raton.Graßhoff, U., Doebler, A., Holling, H., and Schwabe, R. (2012). Optimal design for linearregression models in the presence of heteroscedasticity caused by random coefficients.
Journalof Statistical Planning and Inference , , 1108–1113.Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Annalsof Statistics , , 849–879.Kunert, J., Martin, R. J., and Eccleston, J. (2010). Optimal block designs comparing treatmentswith a control when the errors are correlated. Journal of Statistical Planning and Inference , , 2719–2738.Lemme, F., van Breukelen, G. J. P., and Berger, M. P. F. (2015). Efficient treatment allocationin two-way nested designs. Statistical Methods in Medical Research , , 494–512.Pázman, A. (1986). Foundations of optimum Experimental Design . D. Reidel Publishing Com-pany, New York.Prus, M. (2015).
Optimal Designs for the Prediction in Hierarchical Random Coefficient Regres-sion Models . Ph.D. thesis, Otto-von-Guericke University, Magdeburg.Prus, M. (2019). Optimal designs in multiple group random coefficient regression models.
TEST .https://doi.org/10.1007/s11749-019-00654-6.Prus, M. and Schwabe, R. (2016). Optimal designs for the prediction of individual parametersin hierarchical models.
Journal of the Royal Statistical Society: Series B , , 175–191.Pukelsheim, F. (1993). Optimal Design of Experiments . Wiley, New York.Schmelter, T. (2007a). Considerations on group-wise identical designs for linear mixed models.
Journal of Statistical Planning and Inference , , 4003–4010.Schmelter, T. (2007b). Experimental Design For Mixed Models With Application to PopulationPharmacokinetic Studies . Ph.D. thesis, Otto-von-Guericke University Magdeburg.14chwabe, R. and Schmelter, T. (2008). On optimal designs in random intercept models.
TatraMountains Mathematical Publications , , 145–153.Seber, G. A. F. (2007). A Matrix Handbook for Statisticians . Wiley Series in Probability andStatistics.Whittle, P. (1973). Some general points in the theory of optimal experimental design.
Journalof the Royal Statistical Society. Series B ,35