Identifying locally optimal designs for nonlinear models: A simple extension with profound consequences
aa r X i v : . [ m a t h . S T ] O c t The Annals of Statistics (cid:13)
Institute of Mathematical Statistics, 2012
IDENTIFYING LOCALLY OPTIMAL DESIGNS FOR NONLINEARMODELS: A SIMPLE EXTENSION WITH PROFOUNDCONSEQUENCES
By Min Yang and John Stufken University of Illinois at Chicago and University of Georgia
We extend the approach in [
Ann. Statist. (2010) 2499–2524] foridentifying locally optimal designs for nonlinear models. Conceptuallythe extension is relatively simple, but the consequences in terms ofapplications are profound. As we will demonstrate, we can obtainresults for locally optimal designs under many optimality criteria andfor a larger class of models than has been done hitherto. In manycases the results lead to optimal designs with the minimal number ofsupport points.
1. Introduction.
During the last decades nonlinear models have becomea workhorse for data analysis in many applications. While there is now anextensive literature on data analysis for such models, research on designselection has not kept pace, even though there has seen a spike in activityin recent years. Identifying optimal designs for nonlinear models is indeedmuch more difficult than the much better studied corresponding problem forlinear models. For nonlinear models results can typically only be obtainedon a case-by-case basis, meaning that each combination of model, optimalitycriterion and objective of the experiment requires its own proof.Another challenge is that for a nonlinear model an optimal design typ-ically depends on the unknown parameters. This leads to the concept oflocally optimal designs, which are optimal for a priori chosen values of theparameters. The designs may be poor if the choice of values is far fromthe true values. Where feasible, a multistage approach could help with this.A small initial design is then used to obtain some information about theparameters, and this information is used at the next stage to estimate the
Received August 2011; revised February 2012. Supported by NSF Grants DMS-07-07013 and DMS-07-48409. Supported by NSF Grants DMS-07-06917 and DMS-10-07507.
AMS 2000 subject classifications.
Primary 62K05; secondary 62J12.
Key words and phrases.
Locally optimal, Loewner ordering, principal submatrix, sup-port points.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Statistics ,2012, Vol. 40, No. 3, 1665–1681. This reprint differs from the original inpagination and typographic detail. 1
M. YANG AND J. STUFKEN true parameter values and to extend the initial design in a locally optimalway to a larger design. The design at this second stage could be the finaldesign, or there could be additional stages at which more design points areselected. The solution presented in this paper is applicable for a one-shotapproach for finding a locally optimal design as well as for a multistage ap-proach. The argument that our method can immediately be applied for themultistage approach is exactly as in Yang and Stufken (2009).For a broader discussion on the challenges to identify optimal designsfor generalized linear models, many of which apply also for other nonlinearmodels, we refer the reader to Khuri et al. (2006).The work presented here is an extension of Yang and Stufken (2009), Yang(2010) and Dette and Melas (2011). The analytic approach in those papersunified and extended many of the results on locally optimal designs that wereavailable through the so-called geometric approach. The extension in thecurrent paper has major consequences for two reasons. First, it enables theapplication of the basic approach in the three earlier papers to many modelsfor which it could until now not be used. As a result, this paper opens thedoor to finding locally optimal designs for models where no feasible approachwas known so far. Second, for a number of models for which answers couldbe obtained by earlier work, the current extension enables the identificationof locally optimal designs with a smaller support. This is important becauseit simplifies the search for optimal designs, whether by computational oranalytical methods. Section 4 will illustrate the impact of our results.The basic approach in Yang and Stufken (2009), Yang (2010) and Detteand Melas (2011), which is also adopted here, is to identify a subclass ofdesigns with a simple format, so that for any given design ξ , there existsa design ξ ∗ in that subclass with I ξ ∗ ≥ I ξ under the Loewner ordering. Wewill refer to this subclass as a complete class for this problem. Here, I ξ ∗ and I ξ are information matrices for a parameter vector θ under ξ ∗ and ξ ,respectively. Others, such as Pukelsheim (1989) have called such a class es-sentially complete, which is admittedly indeed more accurate, but also morecumbersome. When searching for a locally optimal design, for the commoninformation-based optimality criteria, including A -, D -, E - and Φ p -criteria,one can thus restrict consideration to this complete class, both for a one-shot or multistage approach. Also, as shown in Yang and Stufken (2009), thisconclusion holds for arbitrary functions of the parameters. Ideally, the samecomplete class results would apply for all a priori values of the parametervector θ . However, it turns out, as we will see in Section 4, that there are in-stances where complete class results hold only for certain a priori values of θ .Yang and Stufken (2009), Yang (2010) and Dette and Melas (2011) iden-tify small complete classes for certain models. They do so by showing thatfor any design ξ that is not in their complete class, there is a design ξ ∗ thatis in the complete class such that all elements of I ξ ∗ are the same as thecorresponding elements in I ξ , except that one diagonal element in I ξ ∗ is at OCALLY OPTIMAL DESIGNS least as large as that in I ξ . This guarantees of course that I ξ ∗ ≥ I ξ . The con-tribution of this paper is that we focus on increasing a principal submatrix rather than just a single diagonal element . This allows us to obtain resultsfor more models than could be addressed by Yang and Stufken (2009), Yang(2010) and Dette and Melas (2011), and also facilitates the identification ofsmaller complete classes for some models considered in these earlier papers.In Section 2 we will present the necessary background, while the mainresults are featured in Section 3. The power of the proposed extension isseen through applications in Section 4. We conclude with a short discussionin Section 5.
2. Information matrix and approximate designs.
Consider a nonlinearregression model for which a response variable y depends on a single regres-sion variable x . We assume that the y ’s are independent and follow someexponential distribution G with mean η ( x, θ ), where θ is the p × x can be chosen by the experimenter. Typically,approximate designs are used to study optimality in this context. An approx-imate design ξ can be written as ξ = { ( x i , ω i ) , i = 1 , . . . , N } , where ω i > x i and P Ni =1 ω i = 1. It is often more conve-nient to present ξ as ξ = { ( c i , ω i ) , i = 1 , . . . , N } , c i ∈ [ A, B ], with the c i ’sobtained from the x i ’s through a bijection that may depend on θ . Typically,the information matrix for θ under design ξ can be written as I ξ ( θ ) = P ( θ ) N X i =1 ω i C ( θ, c i ) ! ( P ( θ )) T , (2.1)where C ( θ, c ) = Ψ ( c )Ψ ( c ) Ψ ( c )... ... . . .Ψ p ( c ) Ψ p ( c ) · · · Ψ pp ( c ) . (2.2)The functions Ψ are allowed to depend on θ not just through c , but inan attempt to simplify notation we write, for example, Ψ ( c ) rather thanΨ ( θ, c ). In (2.2), C ( θ, c ) is a symmetric matrix, and P ( θ ) is a p × p non-singular matrix that depends only on θ . Some examples of (2.1) and (2.2)will be seen in Section 4.For some p , 1 ≤ p < p , we partition C ( θ, c ) as C ( θ, c ) = (cid:18) C ( c ) C T ( c ) C ( c ) C ( c ) (cid:19) . (2.3)Here, C ( c ) is the lower p × p principal submatrix of C ( θ, c ), that is, C ( c ) = Ψ p − p +1 ,p − p +1 ( c ) · · · Ψ p − p +1 ,p ( c )... . . . ...Ψ p,p − p +1 ( c ) · · · Ψ pp ( c ) . (2.4) M. YANG AND J. STUFKEN
In the context of local optimality, if designs ξ = { ( c i , ω i ) , i = 1 , . . . , N } and˜ ξ = { (˜ c j , ˜ ω j ) , j = 1 , . . . , ˜ N } satisfy P Ni =1 ω i C ( θ, c i ) ≤ P ˜ Ni =1 ˜ ω i C ( θ, ˜ c i ), then itfollows from (2.1) that I ξ ( θ ) ≤ I ˜ ξ ( θ ). Hence, I ξ ( θ ) ≤ I ˜ ξ ( θ ) follows if it holdsthat N X i =1 ω i C ( c i ) = ˜ N X i =1 ˜ ω i C (˜ c i ) , N X i =1 ω i C ( c i ) = ˜ N X i =1 ˜ ω i C (˜ c i ) and(2.5) N X i =1 ω i C ( c i ) ≤ ˜ N X i =1 ˜ ω i C (˜ c i ) . This is what we explore in this paper. Note that this is more general thanYang and Stufken (2009), Yang (2010) and Dette and Melas (2011), where p = 1. We develop a theoretical framework for general values of p .
3. Main results.
Following Karlin and Studden (1966) and Dette andMelas (2011), a set of k + 1 real-valued continuous functions u , . . . , u k de-fined on an interval [ A, B ] is called a Chebyshev system on [
A, B ] if (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) u ( z ) u ( z ) · · · u ( z k ) u ( z ) u ( z ) · · · u ( z k )... ... . . . ... u k ( z ) u k ( z ) · · · u k ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (3.1)is strictly positive whenever A ≤ z < z < · · · < z k ≤ B .Along the lines of Yang (2010), we select a maximal set of linearly in-dependent nonconstant functions from the Ψ functions that appear in thefirst p − p columns of the matrix C ( θ, c ) defined in (2.2), and rename theselected functions as Ψ , . . . , Ψ k − . For a given nonzero p × Q , letΨ Qk = Q T C ( c ) Q, (3.2)where C ( c ) is as defined in (2.4).For Ψ = 1, Ψ , . . . , Ψ k − and C ( c ), we will say that a set of n pairs( c i , ω i ) is dominated by a set of n pairs (˜ c i , ˜ ω i ) if X i ω i Ψ l ( c i ) = X i ˜ ω i Ψ l (˜ c i ) , l = 0 , , . . . , k − X i ω i Ψ Qk ( c i ) < X i ˜ ω i Ψ Qk (˜ c i ) for every nonzero vector Q, (3.4) OCALLY OPTIMAL DESIGNS where the summations on the left-hand sides are over the n subscripts forthe pairs ( c i , ω i ) and those on the right-hand sides over the n subscripts forthe pairs (˜ c i , ˜ ω i ).The following two lemmas provide the basic tools for the main results.We point out that the pairs ( c i , ω i ) in these lemmas need not form a design;in particular, the ω i ’s need not add to 1. Lemma 1.
For the functions Ψ = 1 , Ψ , . . . , Ψ k − , Ψ Qk defined on an in-terval [ A, B ] , suppose that either { Ψ , Ψ , . . . , Ψ k − } and { Ψ , Ψ , . . . , Ψ k − , Ψ Qk } (3.5) form Chebyshev systems for every nonzero vector Q or { Ψ , Ψ , . . . , Ψ k − } and { Ψ , Ψ , . . . , Ψ k − , − Ψ Qk } (3.6) form Chebyshev systems for every nonzero vector Q. Then the following conclusions hold: (a)
For k = 2 n − , if (3.5) holds, then for any set S = { ( c i , ω i ) : ω i > ,i = 1 , . . . , n } with A ≤ c < · · · < c n < B , there exists a set S = { (˜ c i , ˜ ω i ) :˜ ω i > , i = 1 , . . . , n } with c < ˜ c < c < · · · < ˜ c n − < c n < ˜ c n = B , such that S is dominated by S . (b) For k = 2 n − , if (3.6) holds, then for any set S = { ( c i , ω i ) : ω i > ,i = 1 , . . . , n } with A < c < · · · < c n ≤ B , there exists a set S = { (˜ c i , ˜ ω i ) :˜ ω i > , i = 0 , . . . , n − } with A = ˜ c < c < ˜ c < c < · · · < ˜ c n − < c n , suchthat S is dominated by S . (c) For k = 2 n , if (3.5) holds, then for any set S = { ( c i , ω i ) : ω i > ,i = 1 , . . . , n } with A < c < · · · < c n < B , there exists a set S = { (˜ c i , ˜ ω i ) :˜ ω i > , i = 0 , . . . , n } with A = ˜ c < c < ˜ c < · · · < c n < ˜ c n = B , such that S is dominated by S . (d) For k = 2 n , if (3.6) holds, then for any set S = { ( c i , ω i ) , ω i > ,i = 1 , . . . , n + 1 with A ≤ c < · · · < c n +1 ≤ B , there exists a set S = { (˜ c i , ˜ ω i ) :˜ ω i > , i = 1 , . . . , n } with c < ˜ c < · · · < c n < ˜ c n < c n +1 , such that S is dom-inated by S . Proof.
Since the proof is similar for all parts, we only provide a prooffor part (a).Let S be as in part (a). First consider the special case that Q =(1 , , . . . , T .By (1a) of Therorem 3.1 in Dette and Melas (2011), there exists a set ofat most n pairs (˜ c i , ˜ ω i ) with one of the points equal to B so that (3.3)and (3.4) hold for this Q . By part (a) of Proposition 1 in the Appendix,the number of distinct points with ˜ ω i > n . Thus wehave ˜ c < · · · < ˜ c n = B , and the c i ’s and ˜ c i ’s must alternate by part (b) of M. YANG AND J. STUFKEN
Proposition 1. The result follows now for an arbitrary nonzero Q by applyingProposition 2 in the Appendix and using (3.5) and (3.4). (cid:3) Lemma 2 partially extends Lemma 1 by observing that larger sets S thanin Lemma 1 are also dominated by sets S as in that lemma. Lemma 2.
With the same notation and assumptions as in Lemma 1, let S = { ( c i , ω i ) : ω i > , A ≤ c i ≤ B, i = 1 , . . . , N } , where N ≥ n for cases ( a ),( b ), and ( c ) of Lemma 1, and N ≥ n + 1 for case ( d ). Then the followingconclusions hold: (a) For k = 2 n − , if (3.5) holds, then S is dominated by a set S ofsize n that includes B as one of the points. (b) For k = 2 n − , if (3.6) holds, then S is dominated by a set S ofsize n that includes A as one of the points. (c) For k = 2 n , if (3.5) holds, then S is dominated by a set S of size n + 1 that includes both A and B as points. (d) For k = 2 n , if (3.6) holds, then S is dominated by a set S of size n . Proof.
The results follow by application of Lemma 1. For example, forcase (a), if N = n , the result follows directly from Lemma 1. If N > n , westart with the points c < c < · · · < c N in S . Using Lemma 1, we obtainpoints c , . . . , c N − n , ˜ c N − n +1 , . . . , ˜ c N = B in a set ˜ S that dominates S . UsingLemma 1 again on the n largest points other than ˜ c N in ˜ S , we move onemore point to B , obtaining a new set with N − S .Continue until the size of the set is reduced to n ; this is the desired set S . (cid:3) The first main result is an immediate consequence of Lemma 2.
Theorem 1.
For a regression model with a single regression variable x ,suppose that the information matrix C ( θ, c ) can be written as in (2.1) for c ∈ [ A, B ] . Partitioning the information matrix as in (2.3), let Ψ , . . . , Ψ k − bea maximum set of linearly independent nonconstant Ψ functions in the first p − p columns of C ( θ, c ) . Define Ψ Qk as in (3.2). Suppose that either (3.5)or (3.6) in Lemma 1 holds. Then the following complete class results hold: (a) For k = 2 n − , if (3.5) holds, the designs with at most n supportpoints, including B , form a complete class. (b) For k = 2 n − , if (3.6) holds, the designs with at most n supportpoints, including A , form a complete class. (c) For k = 2 n , if (3.5) holds, the designs with at most n + 1 supportpoints, including both A and B , form a complete class. (d) For k = 2 n , if (3.6) holds, the designs with at most n support pointsform a complete class. OCALLY OPTIMAL DESIGNS Note that if (3.3) holds for Ψ l ( c ), l = 1 , . . . , k −
1, then the same is trueif we replace one or more of the Ψ l ’s by − Ψ l . Therefore, if (3.5) or (3.6)do not hold for the original Ψ l ’s, conclusions in Theorem 1 would still bevalid if (3.5) and (3.6) hold after multiplying one or more of the Ψ l ’s, l =1 , . . . , k −
1, by − f l,t , 1 ≤ t ≤ k ; t ≤ l ≤ k as follows: f l,t ( c ) = Ψ ′ l ( c ) , if t = 1, l = 1 , . . . , k − C ′ ( c ) , if t = 1, l = k , (cid:18) f l,t − ( c ) f t − ,t − ( c ) (cid:19) ′ , if 2 ≤ t ≤ k , t ≤ l ≤ k .(3.7)The following lower triangular matrix contains all of these functions, andsuggest an order in which to compute them: f , = Ψ ′ f , = Ψ ′ f , = ( f , f , ) ′ f , = Ψ ′ f , = ( f , f , ) ′ f , = ( f , f , ) ′ ... ... ... . . . f k, = C ′ f k, = ( f k, f , ) ′ f k, = ( f k, f , ) ′ ... f k,k = ( f k,k − f k − ,k − ) ′ . (3.8)Note that, for p ≥
2, the functions in the last row are matrix functions,which is a key difference with Yang (2010). The derivatives of matricesin (3.7) are element-wise derivatives. For the next result, we will make thefollowing assumptions:(i) All functions Ψ in the information matrix C ( θ, c ) are at least k thorder differentiable on ( A, B ).(ii) For 1 ≤ l ≤ k −
1, the functions f l,l ( c ) have no roots in [ A, B ].For ease of notation, in the remainder we will write f l,l instead of f l,l ( c ),and f l,l > f l,l ( c ) > c ∈ [ A, B ]. This also applies for l = k , in which case it means that the matrix f k,k is positive definite for all c ∈ [ A, B ]. Theorem 2.
For a regression model with a single regression variable x ,let c ∈ [ A, B ] , C ( θ, c ) , Ψ , . . . , Ψ k − and Ψ Qk be as in Theorem 1. For thefunctions f l,l in (3.7), define F ( c ) = Q kl =1 f l,l , c ∈ [ A, B ] . Suppose that ei-ther F ( c ) or − F ( c ) is positive definite for all c ∈ [ A, B ] . Then the followingcomplete class results hold: M. YANG AND J. STUFKEN (a)
For k = 2 n − , if F ( c ) > , the designs with at most n support points,including B , form a complete class. (b) For k = 2 n − , if − F ( c ) > , the designs with at most n supportpoints, including A , form a complete class. (c) For k = 2 n , if F ( c ) > , the designs with at most n + 1 support points,including both A and B , form a complete class. (d) For k = 2 n , if − F ( c ) > , the designs with at most n support pointsform a complete class. Proof.
We only present the proof for case (a) since the other cases aresimilar. For any nonzero vector Q , Q T F ( c ) Q > c ∈ [ A, B ]. Amongall f l,l , l = 1 , . . . , k −
1, and Q T f k,k Q , suppose that a of them are negative.Let 1 ≤ l < · · · < l a ≤ k denote the subscripts for these negative terms, andnote that a must be even. Note also that the labels l < · · · < l a do notdepend on the choice of the vector Q since f , , . . . , f k − ,k − do not dependon Q . Finally, note that for any l with 1 ≤ l ≤ k −
1, if we replace Ψ l ( c )by − Ψ l ( c ), then the signs of f l,l and f l +1 ,l +1 are switched while all othersremain unchanged.We now change some of the Ψ l ’s to − Ψ l . This is done for those l thatsatisfy l b − ≤ l < l b for some value of b ∈ { , . . . , a/ } . Denote the new Ψ-functions by { , b Ψ , . . . , b Ψ Qk } . Notice that b Ψ Qk = Ψ Qk . From the last observa-tion in the previous paragraph, it is easy to check that f l,l > l = 1 , . . . , k ,for the functions f l,l that correspond to this new set of b Ψ-functions. ByProposition 4 in the Appendix, { , b Ψ , . . . , b Ψ k − } and { , b Ψ , . . . , b Ψ k − , b Ψ Qk } are Chebyshev systems on [ A, B ], regardless of the choice for Q = 0. The re-sult follows now from case (a) of Theorem 1 and the observation immediatelyafter Theorem 1. (cid:3) For case (a) in Theorem 2, the value of A in the interval [ A, B ] is allowed tobe −∞ . In this situation, for any given design ξ , we can choose A = min i c i ,and the conclusion of the theorem holds. Similarly, B can be ∞ in case (b),and the interval can be unbounded at either side for case (d).As noted at the end of Section 2, the results in Yang and Stufken (2009),Yang (2010), and Dette and Melas (2011) correspond to p = 1. The ex-tension in this paper allows the choice of larger values of p where feasible.Larger values of p lead to designs with smaller support sizes. The reasonfor this is that the value of k in Theorems 1 and 2 corresponds to the num-ber of equations in (3.3). For a particular model, this number is smaller forlarger p . Since the support size of the designs is roughly half the value of k ,the support size is smaller for larger values of p .We will provide some examples of the application of Theorems 1 and 2 inthe next section, and will offer some further thoughts on the ease of theirapplication in Section 5. OCALLY OPTIMAL DESIGNS
4. Applications.
Whether the model is for continuous or discrete data,with homogeneous or heterogeneous errors, Theorems 1 and 2 can be appliedas long as the information matrix can be written as in (2.1). As the examplesin this section will show, in many cases the result of the theorem facilitatesthe determination of complete classes with the minimal number of supportpoints.4.1.
Exponential regression models.
Dette, Melas and Wong (2006) stud-ied exponential regression models, which can be written as Y i = L X l =1 a l e − λ l x i + ε i , (4.1)where the ε i ’s are i.i.d. with mean 0 and variance σ , and x i ∈ [ U, V ] is thevalue of the regression variable to be selected by the experimenter. Here θ =( a , . . . , a L , λ , . . . , λ L ) T , with a l = 0, l = 1 , . . . , L , and 0 < λ < · · · < λ L . For L = 2, they showed that there is a D -optimal design for θ = ( a , a , λ , λ ) T based on four points, including the lower limit U . Further, for L = 3 and λ = ( λ + λ ) /
2, they showed that there is a D -optimal design for θ basedon six points, again including the lower limit U . By using Theorem 2, wewill show that similar conclusions are possible for other optimality criteria,including A - and E -optimality, and other functions of interest for manya priori values of θ .For L = 2, the results in Yang (2010) can be used to obtain a com-plete class of designs with at most five points. We can do better withTheorem 2. The information matrix for θ = ( a , a , λ , λ ) T under design { ( x i , ω i ) , i = 1 , . . . , N } can be written in the form of (2.1) with P ( θ ) =diag(1 , , a λ − λ , a λ − λ ) and C ( θ, c ) = c λ c λ +1 c λ +2 log( c ) c λ log( c ) c λ +1 log ( c ) c λ log( c ) c λ +1 log( c ) c λ +2 log ( c ) c λ +1 log ( c ) c λ +2 , (4.2)where c = e − ( λ − λ ) x and λ = λ λ − λ . Let Ψ ( c ) = c λ , Ψ ( c ) = log( c ) c λ ,Ψ ( c ) = c λ +1 , Ψ ( c ) = log( c ) c λ +1 , Ψ ( c ) = c λ +2 , Ψ ( c ) = log( c ) c λ +2 and C ( c ) = (cid:18) log ( c ) c λ log ( c ) c λ +1 log ( c ) c λ +1 log ( c ) c λ +2 (cid:19) . Then f , = λc λ − , f , = c , f , = λ +1 λ , f , = c , f , = λ +2) λ +1 , f , = c and f , ( c ) = λ ( λ + 2) c λ + 12( λ + 2) c λ + 12( λ + 2) c c . M. YANG AND J. STUFKEN
Note that c > λ >
0, so that F ( c ) is positive definite if | f , ( c ) | > λ + 30 λ − >
0, which is satisfied when λ λ < √ √ − .Thus, by (a) of Theorem 2, we have the following result. Theorem 3.
For Model (4.1) with L = 2 , if λ λ < √
960 + 30 √ − ≈ . , then the designs with at most four points, including the lower limit U , forma complete class. For L = 3 and 2 λ = λ + λ , the information matrix for θ = ( a , a , a ,λ , λ , λ ) T under design { ( x i , ω i ) , i = 1 , . . . , N } can be written in the formof (2.1) with P ( θ ) = diag(1 , , , a λ − λ , a λ − λ , a λ − λ ) and C ( θ, c )= c λ c λ +1 c λ +2 c λ +2 c λ +3 c λ +4 log( c ) c λ log( c ) c λ +1 log( c ) c λ +2 log ( c ) c λ log( c ) c λ +1 log( c ) c λ +2 log( c ) c λ +3 log ( c ) c λ +1 log ( c ) c λ +2 log( c ) c λ +2 log( c ) c λ +3 log( c ) c λ +4 log ( c ) c λ +2 log ( c ) c λ +3 log ( c ) c λ +4 , (4.3)where c = e − ( λ − λ ) x and λ = λ λ − λ . Let Ψ l − ( c ) = c λ + l − and Ψ l ( c ) =log( c ) c λ + l − , l = 1 , . . . ,
5, and let C ( c ) = log ( c ) c λ log ( c ) c λ +1 log ( c ) c λ +2 log ( c ) c λ +1 log ( c ) c λ +2 log ( c ) c λ +3 log ( c ) c λ +2 log ( c ) c λ +3 log ( c ) c λ +4 . Then f , = λc λ − , f l, l = c , l = 1 , , , , f l +1 , l +1 = l ( λ + l ) λ + l − , l = 1 , , , f , ( c ) = λ ( λ + 4) c λ + 18( λ + 4) c λ + 218( λ + 4) c λ + 18( λ + 4) c λ + 218( λ + 4) c λ + 38( λ + 4) c λ + 218( λ + 4) c λ + 38( λ + 4) c c . Again, c > λ >
0, so that F ( c ) is positive definite if | ( f , ( c )) | andits leading principal minors are positive. This is equivalent to1505 λ + 9030 λ + 11499 λ − > , λ + 110 λ − > , (4.4) 1295 λ + 5180 λ − > λ + 330 λ + 431 > . OCALLY OPTIMAL DESIGNS Simple computation shows that this holds for λ λ < .
72 (or, equivalently, λ λ < . Theorem 4.
For model (4.1) with L = 3 and λ = λ + λ , if λ λ < . , then the designs with at most six points, including the lower limit U ,form a complete class. LINEXP model.
Demidenko (2006) proposed a model referred toas the LINEXP model to describe tumor growth delay and regrowth. Thenatural logarithm of the tumor volume is modeled as Y i = α + γx i + β ( e − δx i −
1) + ε i , (4.5)with independent ε i ∼ N (0 , σ ) and x i ∈ [ U, V ] as the value of the singleregression variable, which in this case refers to time. Here θ = ( α, γ, β, δ ) T is the parameter vector, where α is the baseline logarithm of the tumorvolume, γ is the final growth rate and δ is the rate at which killed cells getwashed out. The size of the parameter β relative to γ/δ determines whetherregrowth is monotonic ( β < γ/δ ) or not. Li and Balakrishnan (2011) recentlystudied this model and showed that a D -optimal design for θ can be based onfour points, including U and V . We will now show that Theorem 2 extendsthis conclusion to other optimality criteria and functions of interest.The information matrix for θ under design { ( x i , ω i ) , i = 1 , . . . , N } can bewritten in the form of (2.1) with P ( θ ) = − δ δ/β − and(4.6) C ( θ, c ) = e c e c c ce c c ce c ce c c e c c e c , where c = − δx . With a proper choice of Ψ functions, it can be shown thatthe result in Yang (2010) yields a complete class of designs with at most fivepoints, including U and V . We can again do better with Theorem 2.Define Ψ ( c ) = c , Ψ ( c ) = e c , Ψ ( c ) = ce c , Ψ ( c ) = e c , Ψ ( c ) = ce c and C ( c ) = (cid:18) c c e c c e c c e c (cid:19) . This yields f , = 1, f , = e c , f , = 1, f , = 4 e c , f , = 1 and f , ( c ) = (cid:18) e − c e − c / e − c / (cid:19) . M. YANG AND J. STUFKEN
Clearly F ( c ) is a positive definite matrix. Therefore, by part (c) of Theo-rem 2, we reach the following conclusion. Theorem 5.
For the LINEXP model (4.5), the designs with at mostfour points, including U and V , form a complete class. Double-exponential regrowth model.
Demidenko (2004), using a two-compartment model, developed a double-exponential regrowth model to de-scribe the dynamics of post-irradiated tumors. The model can be writtenas Y i = α + ln[ βe νx i + (1 − β ) e − φx i ] + ε ij , (4.7)with independent ε i ∼ N (0 , σ ) and x i ∈ [ U, V ] again as the value for thevariable time. Here θ = ( α, β, ν, φ ) T is the parameter vector, where α isthe logarithm of the initial tumor volume, 0 < β < ν and φ are cell proliferation anddeath rates.Using Chebyshev systems and an equivalence theorem, Li and Balakr-ishnan (2011) showed that a D -optimal design for θ can be based on fourpoints including U and V . Theorem 1 allows us to extend this result toa complete class result, thereby covering many other optimality criteria andany functions of interest.The information matrix for θ under design { ( x i , ω i ) , i = 1 , . . . , N } is of theform (2.1) with P ( θ ) = − β /β
00 0 0 − / (1 − β ) − and with C ( θ, x ) a 4 × = 1, Ψ = e νx /g ( x ),Ψ = e νx /g ( x ), Ψ = xe νx /g ( x ), Ψ = xe νx /g ( x ), Ψ = x e νx /g ( x ),Ψ = xe − φx /g ( x ), Ψ = xe ( ν − φ ) x /g ( x ), Ψ = x e ( ν − φ ) x /g ( x ) and Ψ = x e − φx /g ( x ). Here, g ( x ) = βe νx + (1 − β ) e − φx . Note that Ψ can be writ-ten as a linear combination of Ψ and Ψ . We can apply Theorem 1 ifwe can show that both { , Ψ , Ψ , Ψ , − Ψ , Ψ } and { , Ψ , Ψ , Ψ , − Ψ , Ψ , Q T C ( x ) Q } are Chebyshev systems for any nonzero vector Q ,where C ( x ) = (cid:16) Ψ Ψ Ψ Ψ (cid:17) .Rather than do this directly, we first simplify the problem. We multi-ply each of the Ψ’s by the positive function e φx g ( x ) , which preserves theChebyshev system property. After further simplifications by replacing someof the resulting functions by independent linear combinations of these func-tions, which also preserves the Chebyshev system property, we arrive atthe systems { e ( ν + φ ) x , e ν + φ ) x , x , − xe ( ν + φ ) x , xe ν + φ ) x } and { e ( ν + φ ) x , OCALLY OPTIMAL DESIGNS e ν + φ ) x , x , − xe ( ν + φ ) x , xe ν + φ ) x , g ( x ) e φx Q T C ( x ) Q } . It suffices to showthat these are Chebyshev systems for any nonzero vector Q , which followsfrom Proposition 4 if we show that f l,l > l = 1 , . . . ,
6, for the latter system.It can be shown that f , = f , / f , = f , / ae ax , f , = e − ax and f , = (cid:16) e − ax / e − ax / e − ax (cid:17) , where a = ν + φ . Thus both systems are Chebyshevsystems, and by part (c) of Theorem 1, we reach the following conclusion. Theorem 6.
For the double-exponential regrowth model (4.7), the de-signs with at most four points, including U and V , form a complete class.
5. Discussion.
We have given a powerful extension of the result in Yang(2010) that has potential for providing a small complete class of designswhenever the information matrix can be written as in (2.1). Irrespectiveof the optimality criterion (provided that it does not violate the Loewnerordering) and of the function of θ that is of interest, the search for an optimaldesign can be restricted to the small complete class. As the examples inSection 4 show, the results lead us to conclusions that were not possibleusing the results in Yang (2010) and Dette and Melas (2011).As already pointed out, direct application of Theorem 1 may not be easy.Section 4.3 shows some tricks that can be useful when using Theorem 1.Direct application of Theorem 2 is easier because the condition for the func-tion F ( c ) can be verified with the help of software for symbolic computations.Sometimes it is more convenient to do this after multiplying each of the Ψfunctions by the same positive function (see Section 4.3).There remain, however, some basic questions related the application ofeither Theorem 1 or Theorem 2 that do not have simple general answers. Forexample, what is a good choice for p in forming the matrix C ( c ) in (2.4)?In Section 4, the choice p = p/ p approximatelyequal to p/ θ , we could wind up withdifferent matrices C ( c ), even after fixing p . So what ordering is best? In allof the examples in Section 4, we have used an ordering that makes “higher-order terms” appear in C ( c ), and this may offer the best general strategy.There is still another issue related to ordering: In renaming the independentΨ-functions in the first p − p columns of C ( θ, c ), different orders will resultin different f l,l -functions. In some cases, but not for all, these functions willresult in a function F ( c ) that satisfies the condition in Theorem 2. In theexamples, we have tended to associate “lower-order terms” with the earlierΨ-functions, but what order is best may require some trial and error.Whereas we have demonstrated that the main results of the paper arepowerful, regrettably we cannot offer any guarantees that they will alwaysgive results as desired, even when the information matrix can be written inthe form (2.1). M. YANG AND J. STUFKEN
APPENDIX
Proposition 1.
Assume that { Ψ , Ψ , . . . , Ψ k − } is a Chebyshev sys-tem defined on an interval [ A, B ] . Let A ≤ z < z < · · · < z t ≤ B , and let r , . . . , r t be coefficients that satisfy the following k equations: t X i =1 r i Ψ l ( z i ) = 0 , l = 0 , , . . . , k − . (A.1) Then we have: (a) If t ≤ k , then r i = 0 , i = 1 , . . . , t . (b) If t = k + 1 and one r i is not zero, then all are nonzero; moreoverall r i ’s for odd i must then have the same sign, which is opposite to that ofthe r i ’s for even i . Proof.
For part (a), if t < k , we can expand z , . . . , z t to a set of k distinct points, taking r i = 0 for the added points. Thus without loss ofgenerality, take t = k . Consider the matrixΨ( z , z , . . . , z k ) = Ψ ( z ) Ψ ( z ) · · · Ψ ( z k )Ψ ( z ) Ψ ( z ) · · · Ψ ( z k )... ... . . . ...Ψ k − ( z ) Ψ k − ( z ) · · · Ψ k − ( z k ) . (A.2)Then (A.1) can be written asΨ( z , z , . . . , z k ) R = 0 , where R = ( r , . . . , r k ) T . Since { Ψ , Ψ , . . . , Ψ k − } is a Chebyshev system,Ψ( z , z , . . . , z k ) is nonsingular, so that R = 0.For part (b), if one r i is 0, then it follows from part (a) that all r i ’s are 0.Therefore, if at least one r i is nonzero, then all of them must be nonzero.With the notation from the previous paragraph, we can write (A.1) asΨ( z , z , . . . , z k ) R = − r k +1 ψ ( z k +1 ) , where ψ ( z k +1 ) = (Ψ ( z k +1 ) , Ψ ( z k +1 ) , . . . , Ψ k − ( z k +1 )) T . It follows that r i = − r k +1 | Ψ( z , . . . , z i − , z k +1 , z i +1 , . . . , z k ) || Ψ( z , z , . . . , z k ) | , i = 1 , . . . , k. (A.3)By the Chebyshev system assumption, the denominator | Ψ( z , z , . . . , z k ) | in (A.3) is positive, while the numerator | Ψ( z , . . . , z i − , z k +1 , z i +1 , . . . , z k ) | ispositive for i = k, k − , . . . and negative otherwise. The result in (b) follows. (cid:3) Proposition 2.
Let { Ψ = 1 , Ψ , . . . , Ψ k − } be a Chebyshev system onan interval [ A, B ] , and suppose that k = 2 n − . Consider n pairs ( c i , ω i ) , i = 1 , . . . , n , and n pairs (˜ c i , ˜ ω i ) , i = 1 , . . . , n , with ω i > , ˜ ω i > and A ≤ c < ˜ c < · · · < c n < ˜ c n = B . Suppose further that the following k equations OCALLY OPTIMAL DESIGNS hold: X i ω i Ψ l ( c i ) = X i ˜ ω i Ψ l (˜ c i ) , l = 0 , , . . . , k − . (A.4) Then, for any function Ψ k on [ A, B ] , we can conclude that X i ω i Ψ k ( c i ) < X i ˜ ω i Ψ k (˜ c i )(A.5) if { Ψ = 1 , Ψ , . . . , Ψ k − , Ψ k } is also a Chebyshev system. Proof.
With R = ( ω , − ˜ ω , ω , − ˜ ω , . . . , ω n ) T , the k equations in (A.4) can be written asΨ( c , ˜ c , . . . , c n ) R = ˜ ω n ψ (˜ c n ) , (A.6)where Ψ and ψ are as defined in the proof of Proposition 1. Further, (A.5)is equivalent to (Ψ k ( c ) , Ψ k (˜ c ) , . . . , Ψ k ( c n )) R < ˜ ω n Ψ k (˜ c n ) . (A.7)Using (A.6) to solve for R , and using that ˜ ω n >
0, we see that (A.7) isequivalent to(Ψ k ( c ) , Ψ k (˜ c ) , . . . , Ψ k ( c n ))Ψ − ( c , ˜ c , . . . , c n ) ψ (˜ c n ) − Ψ k (˜ c n ) < . (A.8)From an elementary matrix result [see, e.g., Theorem 13.3.8 of Harville(1997)], the left-hand side of (A.8) can be written as − | Ψ ∗ ( c , ˜ c , . . . , c n , ˜ c n ) || Ψ( c , ˜ c , . . . , c n ) | , (A.9)where Ψ ∗ ( c , ˜ c , . . . , c n , ˜ c n )(A.10) = Ψ ( c ) Ψ (˜ c ) · · · Ψ ( c n ) Ψ (˜ c n )Ψ ( c ) Ψ (˜ c ) · · · Ψ ( c n ) Ψ (˜ c n )... ... . . . ... ...Ψ k − ( c ) Ψ k − (˜ c ) · · · Ψ k − ( c n ) Ψ k − (˜ c n )Ψ k ( c ) Ψ k (˜ c ) · · · Ψ k ( c n ) Ψ k (˜ c n ) . Since both { Ψ , Ψ , . . . , Ψ k − } and { Ψ , Ψ , . . . , Ψ k − , Ψ k } are Chebyshevsystems and c < ˜ c < · · · < c n < ˜ c n , it follows that (A.9) is negative, whichis what had to be shown. (cid:3) A similar argument as for Proposition 2 can be used for the next result.
Proposition 3.
Let { Ψ = 1 , Ψ , . . . , Ψ k − } be a Chebyshev system onan interval [ A, B ] and suppose that k = 2 n . Consider n pairs ( c i , ω i ) , i =1 , . . . , n , and n + 1 pairs (˜ c i , ˜ ω i ) , i = 0 , , . . . , n , with ω i > , ˜ ω i > and M. YANG AND J. STUFKEN A = ˜ c < c < ˜ c < · · · < c n < ˜ c n = B . Suppose further that the following k equations hold: X i ω i Ψ l ( c i ) = X i ˜ ω i Ψ l (˜ c i ) , l = 0 , , . . . , k − . (A.11) Then, for any function Ψ k on [ A, B ] , we can conclude that X i ω i Ψ k ( c i ) < X i ˜ ω i Ψ k (˜ c i )(A.12) if { Ψ = 1 , Ψ , . . . , Ψ k − , Ψ k } is also a Chebyshev system. Proposition 4.
Consider functions Ψ = 1 , Ψ , . . . , Ψ k on an inter-val [ A, B ] . Compute the corresponding functions f l,l as in (3.7), but with C ( c ) replaced by Ψ k , and suppose that f l,l > , l = 1 , . . . , k − . Then { , Ψ , . . . , Ψ k } is a Chebyshev system if f k,k > , while { , Ψ , . . . , − Ψ k } is a Chebyshev system if f k,k < . Proof.
The conclusion for the case f k,k < f k,k >
0, so that we will only focus on the latter. We need to showthat (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · · · ( z ) Ψ ( z ) · · · Ψ ( z k )... ... . . . ...Ψ k ( z ) Ψ k ( z ) · · · Ψ k ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > A ≤ z < z < · · · < z k ≤ B . Consider (A.13) as a functionof z k . The determinant is 0 if z k = z k − , so that it suffices to show that thederivative of (A.13) with respect to z k is positive on ( z k − , B ), that is, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · · · ( z ) Ψ ( z ) · · · Ψ ( z k − ) f , ( z k )... ... . . . ... ...Ψ k ( z ) Ψ k ( z ) · · · Ψ k ( z k − ) f k, ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > z k ∈ ( z k − , B ). Now consider (A.14) as a function of z k − , and usea similar argument. It suffices to show that for z k − ∈ ( z k − , z k ), (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · · · ( z ) Ψ ( z ) · · · f , ( z k − ) f , ( z k )... ... . . . ... ...Ψ k ( z ) Ψ k ( z ) · · · f k, ( z k − ) f k, ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > . (A.15)Continuing like this, it suffices to show that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f , ( z ) f , ( z ) · · · f , ( z k )... ... . . . ... f k, ( z ) f k, ( z ) · · · f k, ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > OCALLY OPTIMAL DESIGNS for any A ≤ z < z < · · · < z k ≤ B . Since f , ( c ) > c ∈ [ A, B ], (A.16) isequivalent to (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · · · f , ( z ) f , ( z ) f , ( z ) f , ( z ) · · · f , ( z k ) f , ( z k )... ... . . . ... f k, ( z ) f , ( z ) f k, ( z ) f , ( z ) · · · f k, ( z k ) f , ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > . (A.17)Recall that the entries in the last k − f l, , l = 2 , . . . , k . Hence, applying the same arguments usedfor (A.13) to (A.17) and using that f , ( c ) > c ∈ [ A, B ], it is sufficientto show that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · · · f , ( z ) f , ( z ) f , ( z ) f , ( z ) · · · f , ( z k ) f , ( z k )... ... . . . ... f k, ( z ) f , ( z ) f k, ( z ) f , ( z ) · · · f k, ( z k ) f , ( z k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > . (A.18)Continuing like this, the ultimate sufficient condition is that f k,k ( c ) > c ∈ [ A, B ], which is precisely our assumption. Thus the conclusion follows. (cid:3)
REFERENCES
Demidenko, E. (2004).
Mixed Models: Theory and Applications . Wiley, Hoboken, NJ.MR2077875
Demidenko, E. (2006). The assessment of tumour response to treatment.
J. Roy. Statist.Soc. Ser. C Dette, H. , Melas, V. B. and
Wong, W. K. (2006). Locally D -optimal designs forexponential regression models. Statist. Sinica Dette, H. and
Melas, V. B. (2011). A note on the de la Garza phenomenon for locallyoptimal designs.
Ann. Statist. Harville, D. A. (1997).
Matrix Algebra from a Statistician’s Perspective . Springer, NewYork. MR1467237
Karlin, S. and
Studden, W. J. (1966). Optimal experimental designs.
Ann. Math.Statist. Khuri, A. I. , Mukherjee, B. , Sinha, B. K. and
Ghosh, M. (2006). Design issues forgeneralized linear models: A review.
Statist. Sci. Li, G. and
Balakrishnan, N. (2011). Optimal designs for tumor regrowth models.
J. Statist. Plann. Inference
Pukelsheim, F. (1989). Complete class results for linear regression designs over the mul-tidimensional cube. In
Contributions to Probability and Statistics ( L. J. Gleser , M.D. Perlman , S. J. Press and
A. R. Sampson , eds.) 349–356. Springer, New York.MR1024342 M. YANG AND J. STUFKEN
Yang, M. (2010). On the de la Garza phenomenon.
Ann. Statist. Yang, M. and
Stufken, J. (2009). Support points of locally optimal designs for nonlinearmodels with two parameters.
Ann. Statist. Department of Mathematics, Statistics,and Computer SciencesUniversity of Illinois at ChicagoChicago, Illinois 60607USAE-mail: [email protected]