Optimal designs for comparing regression curves -- dependence within and between groups
aa r X i v : . [ m a t h . S T ] J a n Optimal designs for comparing regression curves -dependence within and between groups
Kirsten SchorningTechnische Universit¨at DortmundFakult¨at Statistik44221 Dortmund, Germany Holger DetteRuhr-Universit¨at BochumFakult¨at f¨ur Mathematik44780 Bochum, Germany
Abstract
We consider the problem of designing experiments for the comparison of two regres-sion curves describing the relation between a predictor and a response in two groups,where the data between and within the group may be dependent. In order to derive effi-cient designs we use results from stochastic analysis to identify the best linear unbiasedestimator (BLUE) in a corresponding continuous time model. It is demonstrated thatin general simultaneous estimation using the data from both groups yields more preciseresults than estimation of the parameters separately in the two groups. Using the BLUEfrom simultaneous estimation, we then construct an efficient linear estimator for finitesample size by minimizing the mean squared error between the optimal solution in thecontinuous time model and its discrete approximation with respect to the weights (of thelinear estimator). Finally, the optimal design points are determined by minimizing themaximal width of a simultaneous confidence band for the difference of the two regressionfunctions. The advantages of the new approach are illustrated by means of a simulationstudy, where it is shown that the use of the optimal designs yields substantially narrowerconfidence bands than the application of uniform designs.
Keywords and Phrases: optimal design, correlated observations, Gaussian white noise model,comparison of curvesAMS Subject classification: Primary 62K05; Secondary: 62M051
Introduction
The application of optimal or efficient designs can improve the accuracy of statistical analysissubstantially and meanwhile there exists a well established and powerful theory for the con-struction of (approximate) optimal designs for independent observations, see for example themonographs of Pukelsheim (2006) or Fedorov and Leonov (2013). In contrast, the determina-tion of optimal designs for efficient statistical analysis from dependent data is more challengingbecause the corresponding optimization problems are in general not convex and therefore thepowerful tools of convex analysis are not applicable. Although design problems for correlateddata have been discussed for a long time (see, for example Sacks and Ylvisaker, 1966, 1968;Bickel and Herzberg, 1979; N¨ather, 1985, who use asymptotic arguments to develop continuousbut in general non-convex optimzation problems in this context) a large part of the discussion isrestricted to models with a small number of parameters and we refer P´azman and M¨uller (2001),M¨uller and P´azman (2003), Dette et al. (2008), Kiselak and Stehl´ık (2008), Harman and ˇStulajter(2010), Rodriguez-Diaz (2017), Campos-Barreiro and L´opez-Fidalgo (2015) and Attia and Constantinescu(2020) among others.Recently, Dette et al. (2013) suggest a more systematic approach to the problem and determine(asymptotic) optimal designs for least squares estimation, under the additional assumption thatthe regression functions are eigenfunctions of an integral operator associated with the covariancekernel of the error process. This approach refers to models, where the regression functionsare eigenfunctions of the integral operator corresponding to the covariance kernel, which isused to describe the dependencies. For more general models Dette et al. (2016) propose toconstruct the optimal design and estimator simultaneously. More precisely, they construct aclass of estimators and corresponding optimal designs with a variance converging (as the samplesize increases) to the optimal variance in the continuous time model. Dette et al. (2017a)propose an alternative strategy for this purpose. They start with the construction of thebest linear unbiased estimator (BLUE) in the continuous time model using stochastic calculusand determine in a second step an implementable design, which is “close” to the solutionin the continuous time model. By this approach these authors are able to provide an easilyimplementable estimator with a corresponding design which is practically non distinguishablefrom the weighted least squares estimate (WLSE) with corresponding optimal design. Theirresults are applicable for a broad class of linear regression models with various covariance kernelsand have recently been extended to the situation, where also derivatives of the process can beobserved (see Dette et al., 2019).Dette and Schorning (2016) and Dette et al. (2017b) propose designs for the comparison ofregression curves from two independent samples, where the latter reference also allows for de-pendencies within the samples. Their work is motivated by applications in drug development,2here a comparison between two regression models that describe the relation between a com-mon response and the same covariates for two groups is used to establish the non-superiority ofone model to the other or to check whether the difference between two regression models can beneglected. For example, if the similarity between two regression functions describing the doseresponse relationships in the groups individually has been established subsequent inference indrug development could be based on the combined samples such that a more efficient statisti-cal analysis is possible on the basis of the larger population. Because of its importance severalprocedures for the comparison of curves have been investigated in linear and nonlinear mod-els (see Liu et al., 2009; Gsteiger et al., 2011; Liu et al., 2011; Dette et al., 2018; Bretz et al.,2018; M¨ollenhoff et al., 2018, 2020, among others). Designs minimizing the maximal width ofa (simultaneous) confidence band for the difference between the regression curves calculatedfrom two independent groups are determined by Dette and Schorning (2016) and Dette et al.(2017b), who also demonstrate that the use of these designs yields to substantially narrowerconfidence bands.While these results refer to independent groups it is the purpose of the present paper to inves-tigate designs for the comparison of regression curves corresponding to two groups, where thedata within the groups and between the groups may be dependent. It will be demonstratedthat in most cases simultaneous estimation of the parameters in the regression models usingthe data from both groups yields to more efficient inference than estimating the parameters inthe models corresponding to the different groups separately. Moreover, the simultaneous esti-mation procedure can never be worse. While this property holds independently of the designunder consideration, we subsequently construct efficient designs for the comparison of curvescorresponding to not necessarily independent groups and demonstrate its superiority by meansof a simulation study.The remaining part of this paper is organized as follows. In Section 2 we introduce the basicsand the design problem. Section 3 is devoted to a continuous time model, which could beinterpreted as a limiting experiment of the discrete model if the sample size converges to infin-ity. In this model we derive an explicit representation of the BLUE if estimation is performedsimultaneously is both groups. In Section 4 we develop a discrete approximation of the contin-uous BLUE by determining the optimal weights for the linear estimator. Finally, the optimaldesign points are determined such that the maximum width of the confidence band for thedifference of the two regression functions is minimal. Section 5 is devoted to a small numericalcomparison of the performance of the optimal designs with uniform designs. In particular, itis demonstrated that optimal designs yield substantially narrower confidence bands. In manycases the maximal width of a confidence band from the uniform design is by a factor between2 and 10 larger than the width of a confidence band from the optimal design.3
Simultaneous estimation of two regression models
Throughout this paper we consider the situation of two groups of observations Y , , . . . , Y ,n and Y , , . . . , Y ,n at time points t , . . . , t n ( i = 1 ,
2) where there may exist dependencies withinand between the groups. We assume the relation between the response and the covariate t ineach group is described by a linear regression models given by Y ij = Y i ( t j ) = f ⊤ i ( t j ) θ ( i ) + η i ( t j ) , j = 1 , . . . , n , i = 1 , . (2.1)Thus in each group n observations are taken at the same time points t , . . . , t n which can bechosen in a compact interval, say [ a, b ], and observations at different time points and in differentgroups might be dependent. The vectors of the unknown parameters θ (1) and θ (2) are assumed tobe p - and p -dimensional, respectively, and the corresponding vectors of regression functions f i ( t ) = ( f i, ( t ) , . . . , f i,p i ( t )) ⊤ , i = 1 ,
2, have continuously differentiable linearly independentcomponents.To address the situation of correlation between the groups, we start with a very simple covari-ance structure for each group, but we emphasize that all results presented in this paper arecorrect for more general covariance structures corresponding to Markov processes, see Remark3.3 for more details. To be precise, let { ε ( t ) | t ∈ [ a, b ] } and { ε ( t ) | t ∈ [ a, b ] } denote twoindependent Brownian motions, such that E [ ε i ( t j )] = 0 , K i ( t j , t k ) = E [ ε i ( t j ) ε i ( t k )] = min( t j , t k ) (2.2)denotes the mean value and the covariance of the individual process ε i at the points t j and t k , respectively. Let σ , σ > ̺ ∈ ( − , Σ / the square root of the covariancematrix Σ = σ σ σ ̺σ σ ̺ σ , (2.3)and define for t ∈ [ a, b ] the two-dimensional process { η ( t ) | t ∈ [ a, b ] } by η ( t ) = η ( t ) η ( t ) = Σ / ε ( t ) , (2.4)where ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ . Note that ̺ ∈ ( − ,
1) denotes the correlation between the obser-vations Y ( t j ) and Y ( t j ) ( j = 1 , . . . , n ), and that in general the correlation between Y ( t j ) and Y ( t k ) is given by Corr( Y ( t j ) , Y ( t k )) = ̺ (r t j t k ∧ s t k t j ) t j , t k ∈ (0 , θ (1) and θ (2) . However, this procedure ignores the correlationbetween the two groups and estimating the parameters θ (1) and θ (2) simultaneously from thedata of both groups might result in more precise estimates. In order to define estimators for theparameters θ (1) and θ (2) using the information from both groups we now consider a more generaltwo-dimensional regression model, which on the one hand contains the situation described inthe previous paragraph as special case, but on the other hand also allows us to consider thecase, where some of the components in θ and θ coincide, see Example 2.2 and Section 3.3 fordetails. To be precise we define the regression model Y ( t j ) = Y ( t j ) Y ( t j ) = F ⊤ ( t j ) θ + η ( t j ) = F ⊤ ( t j ) θ + Σ / ε ( t j ) , j = 1 , . . . , n, (2.5)where two-dimensional observations Y ( t ) = ( Y ( t ) , Y ( t )) ⊤ , . . . , Y ( t n ) = ( Y ( t n ) , Y ( t n )) ⊤ are available at time points t , . . . , t n ∈ [ a, b ]. In model (2.5) the vector θ = ( ϑ , . . . , ϑ p ) ⊤ is a p -dimensional parameter and F ⊤ ( t ) = F ⊤ ( t ) F ⊤ ( t ) = F , ( t ) . . . F ,p ( t ) F , ( t ) . . . F ,p ( t ) (2.6)denotes a ( p ×
2) matrix containing continuously differentiable regression functions, where thetwo-dimensional functions ( F , ( t ) , F , ( t )) ⊤ , . . . , ( F ,p ( t ) , F ,p ( t )) ⊤ are assumed to be linearlyindependent. Example 2.1
The individual models defined in (2.1) are contained in this two-dimensionalmodel. More precisely, defining the p = ( p + p )-dimensional vector of parameters θ by θ = (( θ (1) ) ⊤ , ( θ (2) ) ⊤ ) ⊤ and the regression function F ⊤ ( t ) in (2.6) by the rows F ⊤ ( t ) = ( f ⊤ ( t ) , ⊤ p ) , F ⊤ ( t ) = (0 ⊤ p , f ⊤ ( t )) , it follows that model (2.5) coincides with model (2.1). Moreover, this composite model takesthe correlation between the groups into account. In this case the models describing the relationbetween the variable t and the responses Y ( t ) and Y ( t ) do not share any parameters.5 xample 2.2 In this example we consider the situation where some of the parameters of theindividual models in (2.1) coincide. This situation occurs, for example, if Y ( t ) and Y ( t )represent clinical parameters (depending on time) before and after treatment, where it canbe assumed that the effect at time a coincides before and after the treatment. In this case areasonable model for average effect in the two groups is given by E [ Y ℓ ( t )] = θ (0) + (˜ θ ( ℓ ) ) ⊤ ˜ f ℓ ( t ) , ℓ = 1 , . More generally, we consider the situation where the vectors of the parameters are given by θ (1) = ( θ (0) ⊤ , ˜ θ (1) ⊤ ) ⊤ , θ (2) = ( θ (0) ⊤ , ˜ θ (2) ⊤ ) ⊤ , where θ (0) ∈ R p denotes the vector of common parameters in both models and vectors ˜ θ (1) ∈ R p − p and ˜ θ (2) ∈ R p − p contain the different parameters in the two individual models. Thecorresponding regression functions are given by f ⊤ ( t ) = ( f ⊤ ( t ) , ˜ f ⊤ ( t )) , f ⊤ ( t ) = ( f ⊤ ( t ) , ˜ f ⊤ ( t )) , (2.7)where the vector f ⊤ ( t ) contains the regression functions corresponding to the common pa-rameters in the two models, and ˜ f ⊤ ( t ) and ˜ f ⊤ ( t ) denote the vectors of regression functionscorresponding to the different parameters ˜ θ (1) and ˜ θ (2) , respectively.Defining the p = ( p + p − p )-dimensional vector of parameters θ by θ = ( θ (0) , ˜ θ (1) , ˜ θ (2) ) andthe regression function F ⊤ ( t ) in (2.6) by the rows F ⊤ ( t ) = ( f ⊤ ( t ) , ˜ f ⊤ ( t ) , ⊤ p − p ) , F ⊤ ( t ) = ( f ⊤ ( t ) , ⊤ p − p , ˜ f ⊤ ( t )) , it follows that the model (2.5) contains the individual models in (2.1), where the regressionfunctions are given by (2.7) and the parameters θ (1) and θ (2) share the parameter θ (0) . Moreover,this composite model takes the potential correlation between the groups into account. It was demonstrated by Dette et al. (2017a) that efficient designs for dependent data in regres-sion problems can be derived by first considering the estimation problem in a continuous timemodel. In this model there is no optimal design problem as the data can be observed over thefull interval [ a, b ]. However, efficient designs can be determined in two steps. First, one derivesthe best linear unbiased estimator (BLUE) in the continuous time model and, secondly, onedetermines design points (and an estimator) such that the resulting estimator from the discretedata provides a good approximation of the optimal solution in the continuous time model. Inthis paper we will use this strategy to develop optimal designs for the comparison of regression6urves from two (possible) dependent groups. In the present section we discuss a continuoustime model corresponding to the discrete model (2.5), while the second step, the determinationof an “optimal” approximation will be postponed to the following Section 4 .
To be precise, we consider the continuous time version of the linear regression model in (2.5),that is, Y ( t ) = Y ( t ) Y ( t ) = F ⊤ ( t ) θ + Σ / ε ( t ) , t ∈ [ a, b ] , (3.1)where we assume 0 < a < b and the full trajectory of the process { Y ( t ) | t ∈ [ a, b ] } is observed, { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } is a vector of independent Brownian motions as defined in(2.2) and the matrix Σ / is the square root of the covariance matrix Σ defined in (2.3). Notethat we restrict ourselves to an interval on the positive line, because in this case the notationis slightly simpler. But we emphasize that the theory developed in this section can also beapplied for a = 0, see Remark 3.1 for more details. We further assume that the ( p × p )-matrix M = Z ba ˙F ( t ) Σ − ˙F ⊤ ( t ) dt + 1 a F ( a ) Σ − F ⊤ ( a ) (3.2)is non-singular. Theorem 3.1
Consider the continuous time linear regression model (3.1) on the interval [ a, b ] , a > , with a continuously differentiable matrix of regression functions F , a vector { ε ( t ) =( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } of independent Brownian motions and a covariance matrix Σ definedby (2.3) . The best linear unbiased estimator of the parameter θ is given by ˆ θ BLUE = M − (cid:16) Z ba ˙ F ( t ) Σ − d Y ( t ) + 1 a F ( a ) Σ − Y ( a ) (cid:17) . (3.3) Moreover, the minimum variance is given by
Cov(ˆ θ BLUE ) = M − = (cid:18)Z ba ˙F ( t ) Σ − ˙F ⊤ ( t ) dt + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:19) − . (3.4) Proof.
Multiplying Y by the matrix Σ − / yields a transformed regression model ˜ Y ( t ) = ˜ Y ( t )˜ Y ( t ) = Σ − / Y ( t ) Y ( t ) = Σ − / F ⊤ ( t ) θ + ε ( t ) , (3.5)7here Σ − / is the inverse of Σ / , the square root of the covariance matrix Σ defined in(2.3). Note that the components of the vector ˜ Y are independent, and consequently, thejoint likelihood function can be obtained as the product of the individual components. Next werewrite the components of the continuous time model (3.5) in terms of two stochastic differentialequations, that is d ˜ Y i ( t ) = [ a,b ] ( t ) Σ − / i ˙F ⊤ ( t ) θdt + dε i ( t ) , t ∈ [0 , b ] , (3.6)˜ Y i ( a ) = Σ − / i F ⊤ ( a ) θ + ε i ( a ) , (3.7)where A is the indicator function of the set A and Σ − / i denotes the i -th row of the matrix Σ − / ( i = 1 , { ε i ( t ) | t ∈ [ a, b ] } is a Brownian motion its increments are independent.Consequently, the processes { ˜ Y i ( t ) | t ∈ [0 , b ] } and the random variable ˜ Y i ( a ) are independent.To obtain the joint density of the processes defined by (3.6) and (3.7) it is therefore sufficientto derive the individual densities.Let P ( i ) θ and P ( i )0 denote the measures on C ([0 , b ]) associated with the process ˜ Y i = { Y i ( t ) | t ∈ [0 , b ] } and { ε i ( t ) | t ∈ [0 , b ] } , respectively. It follows from Theorem 1 in Appendix II ofIbragimov and Has’minskii (1981) that P ( i ) θ is absolute continuous with respect to P ( i )0 withRadon-Nikodym-density d P ( i ) θ d P ( i )0 ( ˜ Y i ) = exp (cid:26)Z ba Σ − / i ˙F ⊤ ( t ) θd ˜ Y i ( t ) − Z ba (Σ − / i ˙F ⊤ ( t ) θ ) dt (cid:27) . Similarly, if Q θ denotes the distribution of the random variable ˜ Y i ( a ) ∼ N (Σ − / i F ⊤ ( a ) θ, a ) in(3.7), then the Radon-Nikodym-density of Q ( i ) θ with respct to Q ( i )0 is given by d Q ( i ) θ d Q ( i )0 ( ˜ Y i ( a )) = exp ( ˜ Y i ( a )Σ − / i F ⊤ ( a ) θa −
12 ( Σ − / i F ⊤ ( a ) θ ) a ) . Consequently, because of independence, the joint density of ( P ( i ) θ , Q ( i ) θ ) with respect to ( P ( i )0 , Q ( i )0 )is obtained as d P ( i ) θ d P ( i )0 ( ˜ Y i ) × d Q ( i ) θ d Q ( i )0 ( ˜ Y i ( a )) = exp ( Z ba Σ − / i ˙F ⊤ ( t ) θd ˜ Y i ( t ) + ˜ Y i ( a ) Σ − / i F ⊤ ( a ) θa ! − Z ba ( Σ − / i ˙F ⊤ ( t ) θ ) dt + ( Σ − / i F ( a ) θ ) a !) . As the processes ˜ Y and ˜ Y are independent by construction the maximum likelihood estimatorin the model (3.1) can be determined by solving the equation ∂∂θ log n Y i =1 d P ( i ) θ d P ( i )0 ( ˜ Y i ) × d Q ( i ) θ d Q ( i )0 ( ˜ Y i ( a )) o = X i =1 n Z ba ˙F ( t ) Σ − / i d ˜ Y i ( t ) + F ( a ) Σ − / i ˜ Y i ( a ) a − (cid:16) Z ba ˙F ( t )Σ − / i Σ − / i ˙F ⊤ ( t ) dt + ˙F ( a ) Σ − / i Σ − / i ˙F ⊤ ( a ) (cid:17) θ o = 08ith respect to θ . The solution coincides with the linear estimate defined in (3.3), and a straight-forward calculation, using Ito’s formula and the fact that the random variables R ba ˙ F ( t ) d ε t and ε a are independent, givesCov(ˆ θ BLUE ) = M − E θ h(cid:16) Z ba ˙ F ( t ) Σ − d Y ( t ) + 1 a F ( a ) Σ − Y ( a ) (cid:17) × (cid:16) Z ba ˙ F ( t ) Σ − Y ( t ) + 1 a F ( a ) Σ − Y ( a ) (cid:17) ⊤ i M − = M − (cid:16) Z ba ˙F ( t ) Σ − ˙F ⊤ ( t ) dt + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:17) M − = M − , where the matrix M is defined in (3.2). Since the covariance matrix M − is the inverse of the in-formation matrix in the continuous time regression model in (3.1) (see Ibragimov and Has’minskii,1981, p. 81), the linear estimator (3.3) is the BLUE, which completes the proof of Theorem3.1. (cid:3) Remark 3.1
The proof of Theorem 3.1 can easily be modified to obtain the BLUE for thecontinuous time model on the interval a = [0 , b ]. More precisely, for a = 0 equation (3.7)becomes a deterministic equation equivalent to Y (0) = F ⊤ (0) θ , (3.8)and we have to distinguish three cases.(1) If the regression function F satisfies F (0) = p × (that is rank( F (0)) = 0)), the determin-istic equation (3.8) does not contain any further information about the parameter θ andthe maximum likelihood estimator in model (3.1) is given byˆ θ BLUE = M − (cid:16) Z b ˙ F ( t ) Σ − d Y ( t ) (cid:17) , where the minimum variance is given byCov(ˆ θ BLUE ) = M − = (cid:18)Z b ˙F ( t ) Σ − ˙F ⊤ ( t ) dt (cid:19) − . (2) If the rank of the matrix F (0) satisfies rank( F (0)) = 1, the deterministic equation (3.8)contains one informative equation about θ . In that case, we assume without loss ofgenerality that F , (0) = 0 and it follows by (3.8) that θ can be reformulated by θ , . . . , θ p through θ = Y (0) − P pi = j θ j F ,j (0) F , (0) . (3.9)9sing (3.9) in combination with model (3.1), we obtain a reduced model by Z ( t ) = Y ( t ) − Y (0) F , (0) F , (0) F , (0) = ˜ F ( t )˜ θ + Σ / ε ( t ) , (3.10)where the matrix valued function ˜ F ( t ) is defined by˜ F T ( t ) = (cid:0) F i,j ( t ) − F i, (0) F , (0) F ,j (0) (cid:1) i =1 , ,j =2 ,...p (3.11)and the reduced ( p − θ is given by ˜ θ = ( θ , . . . , θ p ) . It follows byrank( F (0)) = 1, that the matrix valued function ˜ F ( t ) defined in (3.11) satisfies ˜F T ( ) =0 × p . Consequently, the modified model given by (3.10) satisfies the condition of thecase given in (1) and the best linear unbiased estimator for the reduced parameter ˜ θ isobtained by ˆ˜ θ BLUE = M − (cid:16) Z b ˙˜ F ( t ) Σ − d Z ( t ) (cid:17) , (3.12)where the process { Z ( t ); t ∈ [0 , b ] } is defined by (3.10), the matrix ˜F ( t ) is given by (3.11)and the minimum variance is given byCov(ˆ˜ θ BLUE ) = M − = (cid:18)Z b ˙˜ F ( t ) Σ − ˙˜ F ⊤ ( t ) dt (cid:19) − . The best linear unbiased estimator for the remaining parameter θ is then obtained byˆ θ = Y (0) − P pi = j ˆ˜ θ BLUE ,j F ,j (0) F , (0) . (3) If the rank of the matrix F (0) satisfies rank( F (0)) = 2, equation (3.8) contains twoinformative equations about θ .Let A ( t ) = F , ( t ) F , ( t ) F , ( t ) F , ( t ) (3.13)be the submatrix of F which contains the first two columns of F T ( t ). Without loss ofgenerality, we assume that A (0) is non-singular (as rank( F (0)) = 2).Then it follows by (3.8) that θ θ = A − (0) (cid:16) Y (0) − (cid:0) p X j =3 F i,j (0) θ j (cid:1) i =1 , (cid:17) . (3.14)10sing (3.14) in combination with (3.1) we obtain a reduced model given by Z ( t ) = Y ( t ) − A ( t ) A − (0) Y (0) = ˜ F ( t )˜ θ + Σ / ε ( t ) (3.15)where the matrix valued function A ( t ) is given by (3.13), the matrix valued function˜ F T ( t ) is of the form˜ F T ( t ) = A ( t ) A − (0) (cid:0) F i,j (0) (cid:1) i =1 , j =3 ,...p + (cid:0) F i,j ( t ) (cid:1) i =1 , j =3 ,...p (3.16)and the reduced ( p − θ is given by ˜ θ = ( θ , . . . , θ p ) .The matrix valued function ˜ F ( t ) defined in (3.16) satisfies ˜F T (0) = × p . Consequently,the modified model given by (3.15) satisfies the condition of the case given in (1) and thebest linear unbiased estimator ˆ˜ θ BLUE for the reduced ( p − θ isobtained by (3.12) using the process { Z ( t ); t ∈ [0 , b ] } defined by (3.15) and the matrixvalued function ˜F ( t ) given by (3.16). The best linear unbiased estimator for the remainingparameter ( θ , θ ) T is then obtained by ˆ θ ˆ θ = A − (0) (cid:16) Y (0) − (cid:0) p X j =3 F i,j (0)ˆ˜ θ BLUE , j (cid:1) i =1 , (cid:17) . Recall the definition of model (2.1) in Section 1. It was demonstrated in Example 2.1 that thiscase is a special case of model (2.5), where the matrix F ⊤ is given by F ⊤ ( t ) = f ⊤ ( t ) 0 ⊤ p ⊤ p f ⊤ ( t ) (3.17)and θ = ( θ (1) ⊤ , θ (2) ⊤ ) ⊤ . Considering both components in the vector Y separately, we obtaincontinuous versions of the two models introduced in (2.1),that is, Y i ( t ) = f ⊤ i ( t ) θ ( i ) + η i ( t ) , i = 1 , , (3.18)where the error processes { η ( t ) | t ∈ [ a, b ] } is defined by (2.4). An application of Theorem 3.1yields the following BLUE. Corollary 3.1
Consider the continuous time linear regression model (2.5) on the interval [ a, b ] ,with continuously differentiable matrix (3.17) , a vector { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } of ndependent Brownian motions and a matrix Σ defined by (2.3) . The best linear unbiasedestimator for the parameter θ is given by ˆ θ BLUE = ˆ θ (1)BLUE ˆ θ (2)BLUE = 1 σ σ (1 − ̺ ) M − Z ba σ ˙ f ( t ) − σ σ ̺ ˙ f ( t ) − σ σ ̺ ˙ f ( t ) σ ˙ f ( t ) d Y ( t ) Y ( t ) + 1 a σ f ( a ) − σ σ ̺f ( a ) − σ σ ̺f ( a ) σ f ( a ) Y ( a ) Y ( a ) . (3.19) The minimum variance is given by M − , where M = 1 σ σ (1 − ̺ ) σ M − σ σ ̺ M − σ σ ̺ M σ M and M ij = Z ba ˙ f i ( t ) ˙ f ⊤ j ( t ) dt + 1 a f i ( a ) f Tj ( a ) i, j = 1 , . (3.20)It is of interest to compare the estimator (3.19) with the estimator ˆ θ mar = ((ˆ θ (1)mar ) ⊤ , (ˆ θ (2)mar ) ⊤ ) ⊤ ,which is obtained by estimating the parameter in both models (3.18) separately. It follows fromTheorem 2.1 in Dette et al. (2017a) that the best linear unbiased estimators in these modelsare given by ˆ θ ( ℓ )mar = M − ℓℓ (cid:18)Z ba ˙ f ℓ ( t ) dY ℓ ( t ) + 1 a f ℓ ( a ) Y ℓ ( a ) (cid:19) , ℓ = 1 , , (3.21)where the matrices are defined by M ℓℓ = Z ba ˙ f ℓ ( t ) ˙ f ⊤ ℓ ( t ) dt + 1 a f ℓ ( a ) f ⊤ ℓ ( a ) , ℓ = 1 , . Moreover, the covariance matrices of the estimators ˆ θ (1)mar and ˆ θ (2)mar are the inverses of the Fisherinformation matrices in the individual models, that isCov(ˆ θ ( ℓ )mar ) = σ ℓ M − ℓℓ ℓ = 1 , . (3.22)The following result compares the variance of the two estimators (3.19) and (3.21). Theorem 3.2
If the assumptions of Corollary 3.1 are satisfied, we have (with respect to theLoewner ordering)
Cov(ˆ θ ( ℓ )BLUE ) ≤ Cov(ˆ θ ( ℓ )mar ) , ℓ = 1 , , or all ̺ ∈ ( − , , where the ˆ θ ( ℓ )BLUE and ˆ θ ( ℓ )mar are the best linear unbiased estimators of theparameter θ ( ℓ ) obtained by simultaneous estimation (see (3.19) ) and separate estimation in thetwo groups (see (3.21) ) , respectively. Proof.
Without loss of generality we consider the case ℓ = 1, the proof for the index ℓ = 2 isobtained by the same arguments. Let K ⊤ = ( I p , p × p ) be a p × ( p + p )- matrix, where I p and p × p denote the p -identity matrix and a ( p × p )-matrix filled with zeros. Then,Cov(ˆ θ ( ℓ )BLUE ) = ( C K ( M )) − , where C K ( M ) = ( K ⊤ M − K ) − = 1 σ (1 − ̺ ) (cid:0) M − ̺ M M − M ⊤ (cid:1) (3.23)is the Schur complement of the block M of the information matrix M (see p. 74 in Pukelsheim,2006). Observing (3.22) we now compare C K ( M ) and σ M and obtain C K ( M ) − σ M = 1 σ (1 − ̺ ) (cid:0) M − ̺ M M − M ⊤ (cid:1) − σ M = ̺ σ (1 − ̺ ) (cid:0) M − M M − M ⊤ (cid:1) := ̺ σ (1 − ̺ ) C K ( ˜M ) , (3.24)where C K ( ˜M ) is the Schur complement of the block M of the matrix ˜M = M M M M . Note that the matrix ˜M is nonnegative definite. An application of Lemma 3.12 of Pukelsheim(2006) shows that the Schur complement C K ( ˜M ) is also nonnegative definite, that is C K ( ˜M ) ≥ (cid:0) Cov(ˆ θ (1)BLUE ) (cid:1) − = C K ( M ) ≥ σ M = (cid:0) Cov(ˆ θ (1)mar ) (cid:1) − and the statement of the theorem follows. (cid:3) Remark 3.2 If ̺ = 0 we have C K ( M ) = M , and it follows from (3.23) that separateestimation in the individual groups does not yield less precise estimates, that is Cov(ˆ θ ( l )mar ) =Cov(ˆ θ (1)BLUE ) ( ℓ = 1 , θ ( l )mar ) ≥ Cov(ˆ θ (1)BLUE ). Moreover, the13nequality is strict in most cases, which means that simultaneous estimation of the parameters θ (1) and θ (2) yields more precise estimators. A necessary condition for strict inequality (i.e thematrix Cov(ˆ θ ( l )mar ) − Cov(ˆ θ (1)BLUE ) is positive definite) is the condition ̺ = 0. The following resultshows that this condition is not sufficient. It considers the important case where the regressionfunctions f and f in (3.17) are the same and shows that in this case the two estimators ˆ θ BLUE and ˆ θ mar coincide. Corollary 3.2
If the assumptions of Corollary 3.1 hold and additionally the regression func-tions in model (2.5) satisfy f = f , the best linear unbiased estimator for the parameter θ isgiven by ˆ θ BLUE = ˆ θ (1)BLUE ˆ θ (2)BLUE = Z ba (cid:16) I ⊗ M − ˙ f ( t ) (cid:17) d Y ( t ) + 1 a (cid:0) I ⊗ M − f ( a ) (cid:1) Y ( a ) , where I denotes the × -identity matrix and the matrix M is defined by (3.27) . Moreover,the minimum variance is given by Cov(ˆ θ BLUE ) = Σ ⊗ M − and Cov(ˆ θ ( l )mar ) = Cov(ˆ θ (1)BLUE ) ( ℓ = 1 , . Recall the definition of model (2.1) in Section 1. It was demonstrated in Example 2.2 that thiscase is a special case of model (2.5), where the matrix of regression functions is given by F ⊤ ( t ) = f ⊤ ( t ) , ˜ f ⊤ ( t ) , ⊤ p − p f ⊤ ( t ) , ⊤ p − p , ˜ f ⊤ ( t ) (3.25)and the vector of parameters is defined by θ = ( θ (0) ⊤ , ˜ θ (1) ⊤ , ˜ θ (2) ⊤ ) ⊤ . An application of Theorem 3.1 yields the BLUE in model (2.5) with the matrix F ⊤ defined by(3.25). Corollary 3.3
Consider the continuous time linear regression model (2.5) on the interval [ a, b ] ,where the the matrix of regression functions F ⊤ is continuously differentiable. The best linearunbiased estimator for the parameter θ is given by θ BLUE = ˆ θ (0)BLUE ˆ˜ θ (1)BLUE ˆ˜ θ (2)BLUE = 1 σ σ (1 − ̺ ) M − Z ba ( σ − σ σ ̺ ) ˙ f ( t ) ( σ − σ σ ̺ ) ˙ f ( t ) σ ˙˜ f ( t ) − σ σ ̺ ˙˜ f ( t ) − σ σ ̺ ˙˜ f ( t ) σ ˙˜ f ( t ) d Y ( t ) Y ( t ) + 1 a ( σ − σ σ ̺ ) f ( a ) ( σ − σ σ ̺ ) f ( a ) σ ˜ f ( a ) − σ σ ̺ ˜ f ( a ) − σ σ ̺ ˜ f ( a ) σ ˜ f ( a ) Y ( a ) Y ( a ) . (3.26) The minimum variance is
Cov(ˆ θ BLUE ) = M − , where M = 1 σ σ (1 − ̺ ) ( σ + σ − σ σ ̺ ) M ( σ − σ σ ̺ ) M ( σ − σ σ ̺ ) M ( σ − σ σ ̺ ) M σ M − σ σ ̺ M ( σ − σ σ ̺ ) M − σ σ ̺ M σ M and individual blocks in this matrix are given by M ij = Z ba ˙ g i ( t ) ˙ g ⊤ j ( t ) dt + 1 a g i ( a ) g ⊤ j ( a ) , (3.27) for i, j = 0 , , , where g ( t ) = f ( t ) and g i ( t ) = ˜ f i ( t ) for i = 1 , . It is again of interest to compare the estimate (3.26) with the estimate ˆ θ mar = ((ˆ θ (1)mar ) ⊤ , (ˆ θ (2)mar ) ⊤ ) ⊤ ,which is obtained by estimating the parameter θ (1) = (( θ (0) ) ⊤ , (˜ θ (1) ) ⊤ ) ⊤ in both models (3.18)separately by using (3.21). The corresponding covariances of the estimators ˆ θ (1)mar and ˆ θ (2)mar aregiven by (3.22). The following result compares the variance of the two estimators (3.26) and(3.21). Its proof is similar to the proof of Theorem 3.2 and therefore omitted. Theorem 3.3
If the assumptions of Corollary 3.3 are satisfied, we have (with respect to theLoewner ordering)
Cov(ˆ θ ( ℓ )BLUE ) ≤ Cov(ˆ θ ( ℓ )mar ) , ℓ = 1 , , for all ̺ ∈ ( − , , where the ˆ θ ( ℓ )BLUE and ˆ θ ( ℓ )mar are the best linear unbiased estimators of theparameter θ ( ℓ ) obtained by simultaneous and separate estimation, respectively. emark 3.3 The results presented so far have been derived for the case where the errorprocess { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } in (2.5) consist of two independent Brownian motions.This assumption has been made to simplify the notation. Similar results can be obtained forMarkov processes and in this remark we indicate the essential arguments.To be precise, assume that the error processes { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } in model (2.5)consist of two independent centered Gaussian processes with continuous covariance kernel givenby K ( s, t ) = E [ ε i ( s ) ε i ( t )] = u ( s ) v ( t ) min { q ( s ) , q ( t ) } s, t ∈ [ a, b ] , (3.28)where u ( · ) and v ( · ) are functions defined on the interval [ a, b ] such that the function q ( · ) = u ( · ) /v ( · ) is positive and strictly increasing. Kernels of the form (3.28) are called triangular kernels and a famous result in Doob (1949) essentially shows that a Gaussian process is aMarkov process if and only if its covariance kernel is triangular (see also Mehr and McFadden,1965). In this case model (2.5) can be transformed into a model with an error process consistingof two independent Brownian motions using the arguments given in Appendix B of Dette et al.(2016). More precisely, define q ( t ) = u ( t ) v ( t )and consider the stochastic process ε ( t ) = v ( t ) ˜ ε ( q ( t )) , where { ˜ ε (˜ t ) = (˜ ε (˜ t ) ⊤ , ˜ ε (˜ t )) | ˜ t ∈ [˜ a, ˜ b ] } consists of two independent Brownian motions onthe interval [˜ a, ˜ b ] = [ q ( a ) , q ( b )]. It now follows from Doob (1949) that the process { ε ( t ) =( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } consists of two independent centered Gaussian process on the interval[ a, b ] with covariance kernel (3.28). Consequently, if we consider the model ˜ Y (˜ t ) = ˜ Y (˜ t )˜ Y (˜ t ) = ˜F ⊤ (˜ t ) θ + Σ / ˜ ε (˜ t ) , ˜ t ∈ [ q ( a ) , q ( b )] , (3.29)and ˜ F (˜ t ) = F ( q − (˜ t )) v ( q − (˜ t )) , ˜ ε (˜ t ) = ε ( q − (˜ t )) v ( q − (˜ t )) , ˜ Y (˜ t ) = Y ( q − (˜ t )) v ( q − (˜ t )) , the results obtained so far are applicable. Thus, a ”good” estimator obtained for the parameter θ in model (3.29) is also a ”good estimator” for the parameter θ in model (3.1) with errorprocess consisting of two Gaussian processes with covariance kernel (3.28). Consequently, wecan derive the optimal estimator for the parameter θ in the continuous time model (3.1) withcovariance kernel (3.28) from the best linear unbiased estimator in the model given in (3.29)16ith Brownian motions by an application of Theorem 3.1. The resulting best linear unbiasedestimator for θ in model (3.1) with triangular kernel (3.28) is of the formˆ θ BLUE = M − n Z ba ˙ F ( t ) v ( t ) − F ( t ) v ( t )˙ u ( t ) v ( t ) − u ( t ) ˙ v ( t ) Σ − d (cid:18) Y ( t ) v ( t ) (cid:19) + F ( a ) Σ − Y ( a ) u ( a ) v ( a ) o , where the minimum variance is given by M − = Z ba (cid:16) ˙ F ( t ) v ( t ) − F ( t ) ˙ v ( t ) (cid:17) Σ − (cid:16) ˙ F ( t ) v ( t ) − F ( t ) ˙ v ( t ) (cid:17) ⊤ v ( t )[ ˙ u ( t ) v ( t ) − u ( t ) ˙ v ( t )] dt + F ( a ) Σ − F ⊤ ( a ) u ( a ) v ( a ) − . In this section we will derive optimal designs for comparing curves. The first part is devotedto a discretization of the BLUE in the continuous time model. In the second part we developan optimality criterion to obtain efficient designs for the comparison of curves based on thediscretized estimators.
To obtain a discrete design for n observations at the points a = t , . . . , t n from the continuousdesign derived in Section 3, we use a similar approach as in Dette et al. (2017a) and constructa discrete approximation of the stochastic integral in (3.3). For this purpose we consider thelinear estimatorˆ θ n = M − n n X i =2 Ω i ˙ F ( t i − ) Σ − ( Y ( t i ) − Y ( t i − )) + F ( a ) a Σ − Y a o (4.1)= M − n n X i =2 Φ i Σ − ( Y ( t i ) − Y ( t i − )) + F ( a ) a Σ − Y a o , were a = t < t < . . . < t n − < t n = b , Ω , . . . , Ω n are p × p weight matrices and Φ = Ω ˙ F ( t ) , . . . , Φ n = Ω n ˙ F ( t n − ) are p × M − is given in (3.4). To determine these weights in an “optimal” way we firstderive a representation of the mean squared error between the best linear estimate (3.3) in thecontinuous time model and its discrete approximation (4.1). The following result is a directconsequence of Ito’s formula. 17 emma 4.1 Consider the continuous time model (3.1) . If the assumptions of Theorem 3.1 aresatisfied, we have E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] = M − n n X i =2 Z t i t i − (cid:2) ˙ F ( s ) − Φ i (cid:3) Σ − (cid:2) ˙ F ( s ) − Φ i (cid:3) ⊤ ds + n X i,j =2 Z t i t i − (cid:2) ˙ F ( s ) − Φ i (cid:3) Σ − ˙ F ⊤ ( s ) ds θ θ ⊤ Z t j t j − ˙ F ( s ) Σ − (cid:2) ˙ F ( s ) − Φ i (cid:3) ⊤ ds o M − . (4.2)In the following we choose optimal p × i = Ω i ˙ F ( t i − ) and design points t , . . . , t n − ( t = a, t n = b ), such that the linear estimate (4.1) is unbiased and the mean squared errormatrix in (4.2) “becomes small”. An alternative criterion is to replace the mean squared error E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] by the mean squared error E θ [(ˆ θ n − θ )(ˆ θ n − θ ) ⊤ ]between the estimate ˆ θ n defined in (4.1) and the “true” vector of parameters. The followingresult shows that in the class of unbiased estimators both optimization problems yield the samesolution. The proof is similar to the proof of Theorem 3.1 in Dette et al. (2017a). Theorem 4.1
The estimator ˆ θ n defined in (4.1) is unbiased if and only if the identity M = Z ba ˙ F ( s ) Σ − ˙ F ⊤ ( s ) ds = n X i =2 Φ i Σ − Z t i t i − ˙ F ⊤ ( s ) ds = n X i =2 Φ i Σ − ( F ( t i ) − F ( t i − )) ⊤ , (4.3) is satisfied. Moreover, for any linear unbiased estimator of the form ˜ θ n = R ba G ( s ) dY s we have E θ [(˜ θ n − θ )(˜ θ n − θ ) ⊤ ] = E θ [(˜ θ n − ˆ θ BLUE )(˜ θ n − ˆ θ BLUE ) ⊤ ] + M − . In order to describe a solution in terms of optimal “weights” Φ ∗ i and design points t ∗ i we recallthat the condition of unbiasedness of the estimate ˆ θ n in (4.1) is given by (4.3) and introducethe notation B i = [ F ( t i ) − F ( t i − )] Σ − / / p t i − t i − , (4.4) A i = Φ i Σ − / p t i − t i − . It follows from Lemma 4.1 and Theorem 4.1 that for an unbiased estimator ˆ θ n of the form (4.1)the mean squared error has the representation E θ (cid:2) (ˆ θ BLUE − ˆ θ n ) ⊤ (ˆ θ BLUE − ˆ θ n ) (cid:3) = − M − M M − + n X i =2 M − A i A i ⊤ M − , (4.5)18hich has to be “minimized” subject to the constraint M = Z ba ˙ F ( s ) Σ − ˙ F ⊤ ( s ) ds = n X i =2 A i B ⊤ i . (4.6)The following result shows that a minimization with respect to the weights Φ i (or equivalently A i ) can actually be carried out with respect to the Loewner ordering. Theorem 4.2
Assume that the assumptions of Theorem 3.1 are satisfied and that the matrix B = n X i =2 B i B ⊤ i = n X i =2 [ F ( t i ) − F ( t i − )] Σ − [ F ( t i ) − F ( t i − )] ⊤ t i − t i − , (4.7) is non-singular. Let Φ ∗ , . . . , Φ ∗ n denote ( p × -matrices satisfying the equations Φ ∗ i = M B − F ( t i ) − F ( t i − ) t i − t i − i = 2 , . . . , n, (4.8) then Φ ∗ , . . . , Φ ∗ n are optimal weight matrices minimizing E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] withrespect to the Loewner ordering among all unbiased estimators of the form (4.1) . Moreover, thevariance of the resulting estimator ˆ θ ∗ n is given by Cov(ˆ θ ∗ n ) = M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − Proof.
Let v denote a p -dimensional vector and consider the problem of minimizing thecriterion v ⊤ E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] v (4.9)subject to the constraint (4.6). Observing (4.5) this yields the Lagrange function G v ( A . . . , A n ) = − v ⊤ M − M M − v + n X i =2 ( v ⊤ M − A i A ⊤ i M − v ) − tr (cid:8) Λ ( M − n X i =2 A i B ⊤ i ) (cid:9) , (4.10)where A , . . . , A n are ( p × Λ = ( λ k,ℓ ) pk,ℓ =1 is a ( p × p )-matrix of Lagrangemultipliers. The function G v is convex with respect to A , . . . , A n . Therefore, taking derivativeswith respect to A j yields as necessary and sufficient for the extremum (here we use matrixdifferential calculus)2( M − v ) ⊤ A i ⊗ ( M − v ) ⊤ + vec { ΛB i } = 0 ⊤ p , i = 2 , . . . , n . Rewriting this system of linear equations in a ( p × M − vv ⊤ M − A i = − ΛB i i = 2 , . . . , n . M and B yields for the matrix of Lagrangian multipliers Λ = − M − vv ⊤ M − M B − , which gives 2 M − vv ⊤ M − A i = 2 M − vv ⊤ M − M B − B i i = 2 , . . . , n . (4.11)Note that one solution of (4.11) is given by A ∗ i = M B − B i , i = 2 , . . . , n which does not depend on the vectors v . Therefore, the tuple of matrices ( A ∗ , . . . , A ∗ n ) mini-mizes the convex function G v in (4.10) for all v ∈ R p .Observing the notations in (4.4) shows that the optimal matrix weights are given by (4.8).Moreover, these weights in (4.8) do not depend on the vector v either and provide a simultane-ous minimizer of the criterion defined in (4.9) for all v ∈ R p . Consequently, the weights definedin (4.8) minimize E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] under the unbiasedness constraint (4.6) withrespect to the Loewner ordering. (cid:3) Remark 4.1
If the matrix B in Theorem 4.2 is singular, the optimal weights are not uniquelydetermined and we propose to replace the inverse B by its Moore-Penrose inverse.Note that for fixed design points t , . . . , t n Theorem 4.2 yields universally optimal weights Φ ∗ , . . . , Φ ∗ n (with respect to the Loewner ordering) for estimators of the form (4.1) satisfying(4.3). On the other hand, a further optimization with respect to the Loewner ordering withrespect to the choice of the points t , . . . , t n − ( t = a, t n = b ) is not possible, and we have toapply a real valued optimality criterion for this purpose. In the following section, we will derivesuch a criterion which explicitly addresses the comparison of the regression curves from the twogroups introduced in Section 2. We return to the practical scenario of the two groups introduced in (2.1), where we now focuson the comparison of these groups on the interval [ a, b ].More precisely, consider the model introduced in (2.5) and let ˆ θ ∗ n be the estimator (4.1) withoptimal weights defined by (4.8) from n observations taken at the time points a = t < t <. . . < t n − < t n = b . Then this estimator is normally distributed with mean E [ˆ θ ∗ n ] = θ andcovariance matrix Cov(ˆ θ ∗ n ) = M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − M , M and B are given by (3.2), (4.6) and (4.7), respectively. Note thatthe covariance matrix depends on the time points t , . . . , t n through the matrix B − . Moreover,using the estimator ˆ θ ∗ n the prediction of the difference of a fixed time point t ∈ [ a, b ] satisfies(1 , − F ⊤ ( t )ˆ θ ∗ n − (1 , − F ⊤ ( t ) θ ∼ N p (0 , h ( t ; t , . . . , t n )) , where h ( t ; t , . . . , t n ) = (1 , − F ⊤ M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − F ( t )(1 , − T . We now use this result and the results of Gsteiger et al. (2011) to obtain a simultaneous con-fidence band for the difference of the two curves. More precisely, if the interval [ a, b ] is therange where the two curves should be compared, the simultaneous confidence band is definedas follows. Consider the statisticˆ T = sup t ∈ [ a,b ] | (1 , − F ⊤ ( t )ˆ θ ∗ n − (1 , − F ⊤ ( t ) θ |{ h ( t ; t , . . . , t n ) } / , and define D as the (1 − α )-quantile of the corresponding distribution, that is P ( ˆ T ≤ D ) = 1 − α. Note that Gsteiger et al. (2011) propose the parametric bootstrap for choosing the critical value D . Define u ( t ; t , . . . , t n ) = (1 , − F ⊤ ( t )ˆ θ ∗ n + D · { h ( t ; t , . . . , t n ) } / ,l ( t ; t , . . . , t n ) = (1 , − F ⊤ ( t )ˆ θ ∗ n − D · { h ( t ; t , . . . , t n ) } / , then the confidence band for the difference of the two regression functions is defined by C − α = (cid:8) g : [ a, b ] → R | l ( t ; t , . . . , t n ) ≤ g ( t ) ≤ u ( t ; t , . . . , t n ) for all t ∈ [ a, b ] (cid:9) . (4.12)Consequently, good time points t = a < t < . . . < t n − , t n = b should minimize the width u ( t ; t , . . . , t n ) − l ( t ; t , . . . , t n ) = 2 · D · { h ( t ; t , . . . , t n ) } / of this band at each t ∈ [ a, b ]. As this is only possible in rare circumstances, we propose tominimize an L p -norm of the function h ( · ; t . . . , t n ) as a design criterion, that isΦ p ( t , . . . , t n ) = k h ( · ; t . . . , t n ) k p := (cid:16) Z ba [ h ( t ; t . . . , t n )] p (cid:17) /p dt, ≤ p ≤ ∞ , (4.13)where the case p = ∞ corresponds to the maximal deviation k h ( · ; t . . . , t n ) k ∞ = sup t ∈ [ a,b ] | h ( t ; t . . . , t n ) | . a = t ∗ < t ∗ < . . . < t ∗ n = b (minimizing (4.13)) and the correspond-ing weights derived in Theorem 4.2 provide the optimal linear unbiased estimator of the form(4.1) (with the corresponding optimal design). Example 4.1
We now conclude this section by considering the cases of no common and com-mon parameters, respectively.(a) If we are in the situation of Example 2.1 (no common parameters), the regression function F ⊤ ( t ) is of the form in (3.17) and the variance of the prediction of the difference at a fixedpoint t ∈ [ a, b ] reduces to h ( t ; t , . . . , t n ) = ( f ⊤ ( t ) , − f ⊤ ( t )) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − ( f ⊤ ( t ) , − f ⊤ ( t )) ⊤ . The corresponding design criterion is given byΦ p (cid:0) t , . . . , t n (cid:1) = k ( f ⊤ , − f ⊤ ) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − ( f ⊤ , − f ⊤ ) ⊤ k p . (4.14)(b) If we are in the situation of Example 2.2 (common parameters), the regression function F ⊤ ( t ) is given by (3.25) and the variance of the prediction of the difference at a fixed point t ∈ [ a, b ] reduces to h ( t ; t , . . . , t n ) = (0 , ˜ f ⊤ ( t ) , − ˜ f ⊤ ( t )) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − (0 , ˜ f ⊤ ( t ) , − ˜ f ⊤ ( t )) ⊤ . The corresponding design criterion is given byΦ p (cid:0) t , . . . , t n (cid:1) = k (0 ⊤ p , ˜ f ⊤ , − ˜ f ⊤ ) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − (0 ⊤ p , ˜ f ⊤ , − ˜ f ⊤ ) ⊤ k p . In this section the methodology is illustrated in examples by means of a simulation study. Tobe precise, we consider the regression model (2.5), where the matrix F ( t ) is given by (3.17)corresponding to the case that the regression function do not share common parameters, seeSection 3.2 for more details. In this case the corresponding bounds for the confidence band isgiven by (4.12), where u ( t ; t , . . . , t n ) = (ˆ θ ∗ (1) n ) ⊤ f (1) ( t ) − (ˆ θ ∗ (2) n ) ⊤ f (2) ( t ) + D · { h ( t ; t , . . . , t n ) } / ,l ( t ; t , . . . , t n ) = (ˆ θ ∗ (1) n ) ⊤ f (1) ( t ) − (ˆ θ ∗ (2) n ) ⊤ f (2) ( t ) − D · { h ( t ; t , . . . , t n ) } / , θ ∗ n = ((ˆ θ ∗ (1) n ) ⊤ , (ˆ θ ∗ (2) n ) ⊤ ) ⊤ is the estimator (4.1) with optimal weights defined in (4.8). Thedesign space is given by the interval [ a, b ] = [1 , f and f in the matrix (3.17), that is f A ( t ) = ( t, sin( t ) , cos( t )) ⊤ ,f B ( t ) = ( t , cos( t ) , cos(2 t )) ⊤ , (4.15) f C ( t ) = (cid:0) t, log( t ) , t (cid:1) ⊤ . To model the dependence between the two groups we use the covariance matrix Σ = ̺̺ , in (2.5), where the correlations are chosen as ̺ = 0 . , . , .
7. Following the discussion in Section4.1 we focus on the comparison of the regression curves for the two groups and derive optimaldesigns, minimizing the criterion Φ ∞ defined in (4.14). As result, we obtain simultaneousconfidence bands with a smaller maximal width for the difference of the curves describing therelation in the two groups. We can obtain similar results different values p ∈ (0 , ∞ ) in (4.14)but for the sake of brevity we concentrate on the criterion Φ ∞ which is probably also the easiestto interpret for practitioners.We denote by ˆ θ ∗ n the linear unbiased estimator derived in Section 4. For each of the combina-tions of regression functions containing two different functions defined in (4.15), the optimalweights have been found by Theorem 4.2 and the optimal design points t ∗ i are determined min-imising the criterion Φ ∞ defined in (4.14). For the numerical optimisation the Particle SwarmOptimisation (PSO) algorithm is used (see, for example, Clerc, 2006) assuming a sample sizeof four observations in each group, that is, n = 4. Furthermore, the uniform design used in thefollowing calculations is the design which has four equally spaced design points in the interval[1 , ∞ -optimal design points minimizing the criterion criterion in (4.14) are given inTable 1 for all combinations of models and correlations under consideration. Note that for eachmodel the corresponding optimal design points change for different values of correlation ̺ .In order to investigate the impact of the optimal design on the structure of the confidence bandswe have performed a small simulation study simulating confidence bands for the difference of theregression functions. The vector of parameter values used for each model is θ = ( θ (1) ⊤ , θ (2) ⊤ ) ⊤ =(1 , , , , , ⊤ . In Figure 1 we display the averages of uniform confidence bands defined in(4.12) under the uniform and optimal design calculated by 100 simulation runs.The left, middle and right columns show the results for the correlation ̺ = 0 . ̺ = 0 . ̺ = 0 .
7, respectively, while the rows correposnd to different combinations for the functions f Confidence bands for the difference of the regression functions (solid grey line) on thebasis of an optimal (solid lines) and uniform design (dashed lines). Left panel: ̺ = 0 . . Middlepanel: ̺ = 0 . . Right panel: ̺ = 0 . . First row: model with f = f A and f = f B . Second row:model with f = f A and f = f C . Third row: model with f = f B and f = f C . − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Optimal designs points on the interval [1 , for the estimator ˆ θ ∗ n in (4.1) minimizingthe criterion Φ ∞ in (4.14) . Different correlations ̺ = 0 . , . , . and different regressionfunctions defined in (4.15) are considered. correlationmodels ̺ = 0 . ̺ = 0 . ̺ = 0 . f = f A & f = f B [1, 1.59, 3.93, 10] [1, 1.62, 3.91, 10] [1, 1.74, 7.99, 10] f = f A & f = f C [1, 3.46, 9.60, 10] [1, 2.86, 8.83, 10] [1, 2.61, 3.52, 10] f = f B & f = f C [1, 2.20 , 6.25, 10] [1, 1.62, 3.98 , 10] [1, 2.85, 6.29 , 10]and f (first row: f = f A , f = f B , middle row: f = f A , f = f C and last row f = f B , f = f C ). In each graph, the confidence bands from the Φ ∞ -optimal or the uniform design areplotted separately using the solid and dashed lines respectively, along with the plot for the truedifference f ⊤ ( t ) θ (1) − f ⊤ ( t ) θ (2) (solid grey lines).We observe, that in all cases under considerations the use of Φ ∞ -optimal designs yields a clearlyvisible improvement compared to the uniform design. The maximal width of the confidenceband is reduced substantially. Moreover, the bands from the Φ ∞ -optimal designs are nearlyuniformly more narrow than the bands based on the uniform design. Even more importantly,the confidence bands based on the Φ ∞ -optimal design show a similar structure as the truedifferences, while the confidence bands from the uniform design oscillate.A comparison of the left, middle and right columns in Figure 1 shows that the maximum widthfor the confidence bands based on the optimal design decreases with increasing (absolute) cor-relation ̺ . This effect is not visible for the confidence bands based on the uniform design. Forexample, for the middle row of Figure 1, which corresponds to the case f = f A and f = f C ,the maximum width of the confidence bands based on the equally spaced design points evenseem to increase.Table 2 presents the values of the criterion Φ ∞ in (4.14) for the different scenarios and confirmsthe conclusions drawn from the visual inspection of the confidence bands plots. We observethat the use of the optimal design points reduces the maximum width of the confidence bandssubstantially. Moreover, for the optimal design the maximum width becomes smaller withincreasing (absolute) correlation. On the other hand this monotonicity cannot be observed inall cases for the uniform designs. 25ummarizing, the use of the proposed Φ ∞ -optimal design improves statistically inference sub-stantially reducing the maximum variance of the difference of the two estimated regressioncurves. Moreover, simultaneous estimation in combination with a Φ ∞ -optimal design yields afurther reduction of the maximum width of the confidence bands, thus providing a more preciseinference for the difference of the curves describing the relation between t and the responses inthe two groups.Table 2: Values of the criterion Φ ∞ for the optimal and uniform design with four observationsin each group in the interval [1 , . The error process is given by a two independent Brownianmotions with correlation ̺ = 0 . , . , . between the groups, respectively. CorrelationModels Design ̺ = 0 . ̺ = 0 . ̺ = 0 . f = f A & f = f B optimal 14.79 9.44 6.09uniform 141.87 142.59 148.74 f = f A & f = f C optimal 16.00 10.00 6.60uniform 33.32 29.10 25.66 f = f B & f = f C optimal 14.71 9.53 5.99uniform 147.27 127.19 115.07 Acknowledgements
This research was partially supported by the Collaborative ResearchCenter ‘Statistical modeling of nonlinear dynamic processes’ (
Sonderforschungsbereich 823,Teilprojekt C2 ).On behalf of all authors, the corresponding author states that there is no conflict of interest.
References
Attia, A. and Constantinescu, E. (2020). Optimal experimental design for inverse problems inthe presence of observation correlations. arxiv:2007.14476 .26ickel, P. J. and Herzberg, A. M. (1979). Robustness of design against autocorrelation intime I: Asymptotic theory, optimality for location and linear regression.
Annals of Statistics ,7(1):77–95.Bretz, F., M¨ollenhoff, K., Dette, H., Liu, W., and Trampisch, M. (2018). Assessing the similarityof dose response and target doses in two non-overlapping subgroups.
Statistics in Medicine ,37:722–738.Campos-Barreiro, S. and L´opez-Fidalgo, J. (2015). D -optimal experimental designs for a growthmodel applied to a Holstein-Friesian dairy farm. Statistical Methods & Applications , 24:491–505.Clerc, M. (2006).
Particle Swarm Optimization . Iste Publishing Company, London.Dette, H., Konstantinou, M., and Zhigljavsky, A. (2017a). A new approach to optimal designsfor correlated observations.
Ann. Statist. , 45(4):1579–1608.Dette, H., Kunert, J., and Pepelyshev, A. (2008). Exact optimal designs for weighted leastsquares analysis with correlated errors.
Statistica Sinica , 18(1):135–154.Dette, H., M¨ollenhoff, K., Volgushev, S., and Bretz, F. (2018). Equivalence of regression curves.
Journal of the American Statistical Association , 113:711–729.Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2013). Optimal design for linear models withcorrelated observations.
The Annals of Statistics , 41(1):143–176.Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2016). Optimal designs in regression withcorrelated errors.
Ann. Statist. , 44(1):113–152.Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2019). The blue in continuous-time regressionmodels with correlated errors.
Ann. Statist. , 47(4):1928–1959.Dette, H. and Schorning, K. (2016). Optimal designs for comparing curves.
Ann. Statist. ,44(3):1103–1130.Dette, H., Schorning, K., and Konstantinou, M. (2017b). Optimal designs for comparingregression models with correlated observations.
Computational Statistics & Data Analysis ,113:273 – 286.Doob, J. L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems.
The Annals ofMathematical Statistics , 20(3):393–403.Fedorov, V. V. and Leonov, S. L. (2013).
Optimal Design for Nonlinear Response Models . CRCPress, Boca Raton, FL, USA.Gsteiger, S., Bretz, F., and Liu, W. (2011). Simultaneous confidence bands for non- linearregression models with application to population pharmacokinetic analyses.
Journal of Bio-pharmaceutical Statistics , 21(4):708–725.Harman, R. and ˇStulajter, F. (2010). Optimal prediction designs in finite discrete spectrumlinear regression models.
Metrika , 72(2):281–294.27bragimov, I. A. and Has’minskii, R. Z. (1981).
Statistical Estimation . Springer-Verlag, NewYork-Berlin.Kiselak, J. and Stehl´ık, M. (2008). Equidistant D -optimal designs for parameters of Ornstein-Uhlenbeck process. Statistics and Probability Letters , 78:1388–1396.Liu, W., Bretz, F., Hayter, A. J., and Wynn, H. P. (2009). Assessing non-superiority, non-inferiority or equivalence when comparing two regression models over a restricted covariateregion.
Biometrics , 65:1279–1287.Liu, W., Jamshidian, M., and Zhang, Y. (2011). Multiple comparison of several linear regressionmodels.
Journal of the American Statistical Association , 99(466):395–403.Mehr, C. B. and McFadden, J. (1965). Certain properties of Gaussian processes and theirfirst-passage times.
Journal of the Royal Statistical Society, Ser. B. , 27(3):505–522.M¨ollenhoff, K., Bretz, F., and Dette, H. (2020). Equivalence of regression curves sharingcommon parameters.
Biometrics , 76:518–529.M¨ollenhoff, K., Dette, H., Kotzagiorgis, E., Volgushev, S., and Collignon, O. (2018). Regulatoryassessment of drug dissolution profiles comparability via maximum deviation.
Statistics inMedicine , 37:2968–2981.M¨uller, W. G. and P´azman, A. (2003). Measures for designs in experiments with correlatederrors.
Biometrika , 90:423–434.N¨ather, W. (1985).
Effective Observation of Random Fields . Teubner Verlagsgesellschaft,Leipzig.P´azman, A. and M¨uller, W. G. (2001). Optimal design of experiments subject to correlatederrors.
Statist. Probab. Lett. , 52(1):29–34.Pukelsheim, F. (2006).
Optimal Design of Experiments . SIAM, Philadelphia.Rodriguez-Diaz, J. M. (2017). Computation of c-optimal designs for models with correlatedobservations.
Computational Statistics & Data Analysis , 113:287 – 296.Sacks, J. and Ylvisaker, N. D. (1966). Designs for regression problems with correlated errors.
Annals of Mathematical Statistics , 37:66–89.Sacks, J. and Ylvisaker, N. D. (1968). Designs for regression problems with correlated errors;many parameters.