[PDF] Optimal designs for comparing regression curves -- dependence within and between groups

Abstract

We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n Optimal designs for comparing regression curves -dependence within and between groups

Kirsten SchorningTechnische Universit¨at DortmundFakult¨at Statistik44221 Dortmund, Germany Holger DetteRuhr-Universit¨at BochumFakult¨at f¨ur Mathematik44780 Bochum, Germany

Abstract

We consider the problem of designing experiments for the comparison of two regres-sion curves describing the relation between a predictor and a response in two groups,where the data between and within the group may be dependent. In order to derive eﬃ-cient designs we use results from stochastic analysis to identify the best linear unbiasedestimator (BLUE) in a corresponding continuous time model. It is demonstrated thatin general simultaneous estimation using the data from both groups yields more preciseresults than estimation of the parameters separately in the two groups. Using the BLUEfrom simultaneous estimation, we then construct an eﬃcient linear estimator for ﬁnitesample size by minimizing the mean squared error between the optimal solution in thecontinuous time model and its discrete approximation with respect to the weights (of thelinear estimator). Finally, the optimal design points are determined by minimizing themaximal width of a simultaneous conﬁdence band for the diﬀerence of the two regressionfunctions. The advantages of the new approach are illustrated by means of a simulationstudy, where it is shown that the use of the optimal designs yields substantially narrowerconﬁdence bands than the application of uniform designs.

Keywords and Phrases: optimal design, correlated observations, Gaussian white noise model,comparison of curvesAMS Subject classiﬁcation: Primary 62K05; Secondary: 62M051

Introduction

The application of optimal or eﬃcient designs can improve the accuracy of statistical analysissubstantially and meanwhile there exists a well established and powerful theory for the con-struction of (approximate) optimal designs for independent observations, see for example themonographs of Pukelsheim (2006) or Fedorov and Leonov (2013). In contrast, the determina-tion of optimal designs for eﬃcient statistical analysis from dependent data is more challengingbecause the corresponding optimization problems are in general not convex and therefore thepowerful tools of convex analysis are not applicable. Although design problems for correlateddata have been discussed for a long time (see, for example Sacks and Ylvisaker, 1966, 1968;Bickel and Herzberg, 1979; N¨ather, 1985, who use asymptotic arguments to develop continuousbut in general non-convex optimzation problems in this context) a large part of the discussion isrestricted to models with a small number of parameters and we refer P´azman and M¨uller (2001),M¨uller and P´azman (2003), Dette et al. (2008), Kiselak and Stehl´ık (2008), Harman and ˇStulajter(2010), Rodriguez-Diaz (2017), Campos-Barreiro and L´opez-Fidalgo (2015) and Attia and Constantinescu(2020) among others.Recently, Dette et al. (2013) suggest a more systematic approach to the problem and determine(asymptotic) optimal designs for least squares estimation, under the additional assumption thatthe regression functions are eigenfunctions of an integral operator associated with the covariancekernel of the error process. This approach refers to models, where the regression functionsare eigenfunctions of the integral operator corresponding to the covariance kernel, which isused to describe the dependencies. For more general models Dette et al. (2016) propose toconstruct the optimal design and estimator simultaneously. More precisely, they construct aclass of estimators and corresponding optimal designs with a variance converging (as the samplesize increases) to the optimal variance in the continuous time model. Dette et al. (2017a)propose an alternative strategy for this purpose. They start with the construction of thebest linear unbiased estimator (BLUE) in the continuous time model using stochastic calculusand determine in a second step an implementable design, which is “close” to the solutionin the continuous time model. By this approach these authors are able to provide an easilyimplementable estimator with a corresponding design which is practically non distinguishablefrom the weighted least squares estimate (WLSE) with corresponding optimal design. Theirresults are applicable for a broad class of linear regression models with various covariance kernelsand have recently been extended to the situation, where also derivatives of the process can beobserved (see Dette et al., 2019).Dette and Schorning (2016) and Dette et al. (2017b) propose designs for the comparison ofregression curves from two independent samples, where the latter reference also allows for de-pendencies within the samples. Their work is motivated by applications in drug development,2here a comparison between two regression models that describe the relation between a com-mon response and the same covariates for two groups is used to establish the non-superiority ofone model to the other or to check whether the diﬀerence between two regression models can beneglected. For example, if the similarity between two regression functions describing the doseresponse relationships in the groups individually has been established subsequent inference indrug development could be based on the combined samples such that a more eﬃcient statisti-cal analysis is possible on the basis of the larger population. Because of its importance severalprocedures for the comparison of curves have been investigated in linear and nonlinear mod-els (see Liu et al., 2009; Gsteiger et al., 2011; Liu et al., 2011; Dette et al., 2018; Bretz et al.,2018; M¨ollenhoﬀ et al., 2018, 2020, among others). Designs minimizing the maximal width ofa (simultaneous) conﬁdence band for the diﬀerence between the regression curves calculatedfrom two independent groups are determined by Dette and Schorning (2016) and Dette et al.(2017b), who also demonstrate that the use of these designs yields to substantially narrowerconﬁdence bands.While these results refer to independent groups it is the purpose of the present paper to inves-tigate designs for the comparison of regression curves corresponding to two groups, where thedata within the groups and between the groups may be dependent. It will be demonstratedthat in most cases simultaneous estimation of the parameters in the regression models usingthe data from both groups yields to more eﬃcient inference than estimating the parameters inthe models corresponding to the diﬀerent groups separately. Moreover, the simultaneous esti-mation procedure can never be worse. While this property holds independently of the designunder consideration, we subsequently construct eﬃcient designs for the comparison of curvescorresponding to not necessarily independent groups and demonstrate its superiority by meansof a simulation study.The remaining part of this paper is organized as follows. In Section 2 we introduce the basicsand the design problem. Section 3 is devoted to a continuous time model, which could beinterpreted as a limiting experiment of the discrete model if the sample size converges to inﬁn-ity. In this model we derive an explicit representation of the BLUE if estimation is performedsimultaneously is both groups. In Section 4 we develop a discrete approximation of the contin-uous BLUE by determining the optimal weights for the linear estimator. Finally, the optimaldesign points are determined such that the maximum width of the conﬁdence band for thediﬀerence of the two regression functions is minimal. Section 5 is devoted to a small numericalcomparison of the performance of the optimal designs with uniform designs. In particular, itis demonstrated that optimal designs yield substantially narrower conﬁdence bands. In manycases the maximal width of a conﬁdence band from the uniform design is by a factor between2 and 10 larger than the width of a conﬁdence band from the optimal design.3

Simultaneous estimation of two regression models

Throughout this paper we consider the situation of two groups of observations Y , , . . . , Y ,n and Y , , . . . , Y ,n at time points t , . . . , t n ( i = 1 ,

2) where there may exist dependencies withinand between the groups. We assume the relation between the response and the covariate t ineach group is described by a linear regression models given by Y ij = Y i ( t j ) = f ⊤ i ( t j ) θ ( i ) + η i ( t j ) , j = 1 , . . . , n , i = 1 , . (2.1)Thus in each group n observations are taken at the same time points t , . . . , t n which can bechosen in a compact interval, say [ a, b ], and observations at diﬀerent time points and in diﬀerentgroups might be dependent. The vectors of the unknown parameters θ (1) and θ (2) are assumed tobe p - and p -dimensional, respectively, and the corresponding vectors of regression functions f i ( t ) = ( f i, ( t ) , . . . , f i,p i ( t )) ⊤ , i = 1 ,

2, have continuously diﬀerentiable linearly independentcomponents.To address the situation of correlation between the groups, we start with a very simple covari-ance structure for each group, but we emphasize that all results presented in this paper arecorrect for more general covariance structures corresponding to Markov processes, see Remark3.3 for more details. To be precise, let { ε ( t ) | t ∈ [ a, b ] } and { ε ( t ) | t ∈ [ a, b ] } denote twoindependent Brownian motions, such that E [ ε i ( t j )] = 0 , K i ( t j , t k ) = E [ ε i ( t j ) ε i ( t k )] = min( t j , t k ) (2.2)denotes the mean value and the covariance of the individual process ε i at the points t j and t k , respectively. Let σ , σ > ̺ ∈ ( − , Σ / the square root of the covariancematrix Σ =  σ σ σ ̺σ σ ̺ σ  , (2.3)and deﬁne for t ∈ [ a, b ] the two-dimensional process { η ( t ) | t ∈ [ a, b ] } by η ( t ) =  η ( t ) η ( t )  = Σ / ε ( t ) , (2.4)where ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ . Note that ̺ ∈ ( − ,

1) denotes the correlation between the obser-vations Y ( t j ) and Y ( t j ) ( j = 1 , . . . , n ), and that in general the correlation between Y ( t j ) and Y ( t k ) is given by Corr( Y ( t j ) , Y ( t k )) = ̺ (r t j t k ∧ s t k t j ) t j , t k ∈ (0 , θ (1) and θ (2) . However, this procedure ignores the correlationbetween the two groups and estimating the parameters θ (1) and θ (2) simultaneously from thedata of both groups might result in more precise estimates. In order to deﬁne estimators for theparameters θ (1) and θ (2) using the information from both groups we now consider a more generaltwo-dimensional regression model, which on the one hand contains the situation described inthe previous paragraph as special case, but on the other hand also allows us to consider thecase, where some of the components in θ and θ coincide, see Example 2.2 and Section 3.3 fordetails. To be precise we deﬁne the regression model Y ( t j ) =  Y ( t j ) Y ( t j )  = F ⊤ ( t j ) θ + η ( t j ) = F ⊤ ( t j ) θ + Σ / ε ( t j ) , j = 1 , . . . , n, (2.5)where two-dimensional observations Y ( t ) = ( Y ( t ) , Y ( t )) ⊤ , . . . , Y ( t n ) = ( Y ( t n ) , Y ( t n )) ⊤ are available at time points t , . . . , t n ∈ [ a, b ]. In model (2.5) the vector θ = ( ϑ , . . . , ϑ p ) ⊤ is a p -dimensional parameter and F ⊤ ( t ) =  F ⊤ ( t ) F ⊤ ( t )  =  F , ( t ) . . . F ,p ( t ) F , ( t ) . . . F ,p ( t )  (2.6)denotes a ( p ×

2) matrix containing continuously diﬀerentiable regression functions, where thetwo-dimensional functions ( F , ( t ) , F , ( t )) ⊤ , . . . , ( F ,p ( t ) , F ,p ( t )) ⊤ are assumed to be linearlyindependent. Example 2.1

The individual models deﬁned in (2.1) are contained in this two-dimensionalmodel. More precisely, deﬁning the p = ( p + p )-dimensional vector of parameters θ by θ = (( θ (1) ) ⊤ , ( θ (2) ) ⊤ ) ⊤ and the regression function F ⊤ ( t ) in (2.6) by the rows F ⊤ ( t ) = ( f ⊤ ( t ) , ⊤ p ) , F ⊤ ( t ) = (0 ⊤ p , f ⊤ ( t )) , it follows that model (2.5) coincides with model (2.1). Moreover, this composite model takesthe correlation between the groups into account. In this case the models describing the relationbetween the variable t and the responses Y ( t ) and Y ( t ) do not share any parameters.5 xample 2.2 In this example we consider the situation where some of the parameters of theindividual models in (2.1) coincide. This situation occurs, for example, if Y ( t ) and Y ( t )represent clinical parameters (depending on time) before and after treatment, where it canbe assumed that the eﬀect at time a coincides before and after the treatment. In this case areasonable model for average eﬀect in the two groups is given by E [ Y ℓ ( t )] = θ (0) + (˜ θ ( ℓ ) ) ⊤ ˜ f ℓ ( t ) , ℓ = 1 , . More generally, we consider the situation where the vectors of the parameters are given by θ (1) = ( θ (0) ⊤ , ˜ θ (1) ⊤ ) ⊤ , θ (2) = ( θ (0) ⊤ , ˜ θ (2) ⊤ ) ⊤ , where θ (0) ∈ R p denotes the vector of common parameters in both models and vectors ˜ θ (1) ∈ R p − p and ˜ θ (2) ∈ R p − p contain the diﬀerent parameters in the two individual models. Thecorresponding regression functions are given by f ⊤ ( t ) = ( f ⊤ ( t ) , ˜ f ⊤ ( t )) , f ⊤ ( t ) = ( f ⊤ ( t ) , ˜ f ⊤ ( t )) , (2.7)where the vector f ⊤ ( t ) contains the regression functions corresponding to the common pa-rameters in the two models, and ˜ f ⊤ ( t ) and ˜ f ⊤ ( t ) denote the vectors of regression functionscorresponding to the diﬀerent parameters ˜ θ (1) and ˜ θ (2) , respectively.Deﬁning the p = ( p + p − p )-dimensional vector of parameters θ by θ = ( θ (0) , ˜ θ (1) , ˜ θ (2) ) andthe regression function F ⊤ ( t ) in (2.6) by the rows F ⊤ ( t ) = ( f ⊤ ( t ) , ˜ f ⊤ ( t ) , ⊤ p − p ) , F ⊤ ( t ) = ( f ⊤ ( t ) , ⊤ p − p , ˜ f ⊤ ( t )) , it follows that the model (2.5) contains the individual models in (2.1), where the regressionfunctions are given by (2.7) and the parameters θ (1) and θ (2) share the parameter θ (0) . Moreover,this composite model takes the potential correlation between the groups into account. It was demonstrated by Dette et al. (2017a) that eﬃcient designs for dependent data in regres-sion problems can be derived by ﬁrst considering the estimation problem in a continuous timemodel. In this model there is no optimal design problem as the data can be observed over thefull interval [ a, b ]. However, eﬃcient designs can be determined in two steps. First, one derivesthe best linear unbiased estimator (BLUE) in the continuous time model and, secondly, onedetermines design points (and an estimator) such that the resulting estimator from the discretedata provides a good approximation of the optimal solution in the continuous time model. Inthis paper we will use this strategy to develop optimal designs for the comparison of regression6urves from two (possible) dependent groups. In the present section we discuss a continuoustime model corresponding to the discrete model (2.5), while the second step, the determinationof an “optimal” approximation will be postponed to the following Section 4 .

To be precise, we consider the continuous time version of the linear regression model in (2.5),that is, Y ( t ) =  Y ( t ) Y ( t )  = F ⊤ ( t ) θ + Σ / ε ( t ) , t ∈ [ a, b ] , (3.1)where we assume 0 < a < b and the full trajectory of the process { Y ( t ) | t ∈ [ a, b ] } is observed, { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } is a vector of independent Brownian motions as deﬁned in(2.2) and the matrix Σ / is the square root of the covariance matrix Σ deﬁned in (2.3). Notethat we restrict ourselves to an interval on the positive line, because in this case the notationis slightly simpler. But we emphasize that the theory developed in this section can also beapplied for a = 0, see Remark 3.1 for more details. We further assume that the ( p × p )-matrix M = Z ba ˙F ( t ) Σ − ˙F ⊤ ( t ) dt + 1 a F ( a ) Σ − F ⊤ ( a ) (3.2)is non-singular. Theorem 3.1

Consider the continuous time linear regression model (3.1) on the interval [ a, b ] , a > , with a continuously diﬀerentiable matrix of regression functions F , a vector { ε ( t ) =( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } of independent Brownian motions and a covariance matrix Σ deﬁnedby (2.3) . The best linear unbiased estimator of the parameter θ is given by ˆ θ BLUE = M − (cid:16) Z ba ˙ F ( t ) Σ − d Y ( t ) + 1 a F ( a ) Σ − Y ( a ) (cid:17) . (3.3) Moreover, the minimum variance is given by

Cov(ˆ θ BLUE ) = M − = (cid:18)Z ba ˙F ( t ) Σ − ˙F ⊤ ( t ) dt + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:19) − . (3.4) Proof.

Multiplying Y by the matrix Σ − / yields a transformed regression model ˜ Y ( t ) =  ˜ Y ( t )˜ Y ( t )  = Σ − /  Y ( t ) Y ( t )  = Σ − / F ⊤ ( t ) θ + ε ( t ) , (3.5)7here Σ − / is the inverse of Σ / , the square root of the covariance matrix Σ deﬁned in(2.3). Note that the components of the vector ˜ Y are independent, and consequently, thejoint likelihood function can be obtained as the product of the individual components. Next werewrite the components of the continuous time model (3.5) in terms of two stochastic diﬀerentialequations, that is d ˜ Y i ( t ) = [ a,b ] ( t ) Σ − / i ˙F ⊤ ( t ) θdt + dε i ( t ) , t ∈ [0 , b ] , (3.6)˜ Y i ( a ) = Σ − / i F ⊤ ( a ) θ + ε i ( a ) , (3.7)where A is the indicator function of the set A and Σ − / i denotes the i -th row of the matrix Σ − / ( i = 1 , { ε i ( t ) | t ∈ [ a, b ] } is a Brownian motion its increments are independent.Consequently, the processes { ˜ Y i ( t ) | t ∈ [0 , b ] } and the random variable ˜ Y i ( a ) are independent.To obtain the joint density of the processes deﬁned by (3.6) and (3.7) it is therefore suﬃcientto derive the individual densities.Let P ( i ) θ and P ( i )0 denote the measures on C ([0 , b ]) associated with the process ˜ Y i = { Y i ( t ) | t ∈ [0 , b ] } and { ε i ( t ) | t ∈ [0 , b ] } , respectively. It follows from Theorem 1 in Appendix II ofIbragimov and Has’minskii (1981) that P ( i ) θ is absolute continuous with respect to P ( i )0 withRadon-Nikodym-density d P ( i ) θ d P ( i )0 ( ˜ Y i ) = exp (cid:26)Z ba Σ − / i ˙F ⊤ ( t ) θd ˜ Y i ( t ) − Z ba (Σ − / i ˙F ⊤ ( t ) θ ) dt (cid:27) . Similarly, if Q θ denotes the distribution of the random variable ˜ Y i ( a ) ∼ N (Σ − / i F ⊤ ( a ) θ, a ) in(3.7), then the Radon-Nikodym-density of Q ( i ) θ with respct to Q ( i )0 is given by d Q ( i ) θ d Q ( i )0 ( ˜ Y i ( a )) = exp ( ˜ Y i ( a )Σ − / i F ⊤ ( a ) θa −

12 ( Σ − / i F ⊤ ( a ) θ ) a ) . Consequently, because of independence, the joint density of ( P ( i ) θ , Q ( i ) θ ) with respect to ( P ( i )0 , Q ( i )0 )is obtained as d P ( i ) θ d P ( i )0 ( ˜ Y i ) × d Q ( i ) θ d Q ( i )0 ( ˜ Y i ( a )) = exp ( Z ba Σ − / i ˙F ⊤ ( t ) θd ˜ Y i ( t ) + ˜ Y i ( a ) Σ − / i F ⊤ ( a ) θa ! − Z ba ( Σ − / i ˙F ⊤ ( t ) θ ) dt + ( Σ − / i F ( a ) θ ) a !) . As the processes ˜ Y and ˜ Y are independent by construction the maximum likelihood estimatorin the model (3.1) can be determined by solving the equation ∂∂θ log n Y i =1 d P ( i ) θ d P ( i )0 ( ˜ Y i ) × d Q ( i ) θ d Q ( i )0 ( ˜ Y i ( a )) o = X i =1 n Z ba ˙F ( t ) Σ − / i d ˜ Y i ( t ) + F ( a ) Σ − / i ˜ Y i ( a ) a − (cid:16) Z ba ˙F ( t )Σ − / i Σ − / i ˙F ⊤ ( t ) dt + ˙F ( a ) Σ − / i Σ − / i ˙F ⊤ ( a ) (cid:17) θ o = 08ith respect to θ . The solution coincides with the linear estimate deﬁned in (3.3), and a straight-forward calculation, using Ito’s formula and the fact that the random variables R ba ˙ F ( t ) d ε t and ε a are independent, givesCov(ˆ θ BLUE ) = M − E θ h(cid:16) Z ba ˙ F ( t ) Σ − d Y ( t ) + 1 a F ( a ) Σ − Y ( a ) (cid:17) × (cid:16) Z ba ˙ F ( t ) Σ − Y ( t ) + 1 a F ( a ) Σ − Y ( a ) (cid:17) ⊤ i M − = M − (cid:16) Z ba ˙F ( t ) Σ − ˙F ⊤ ( t ) dt + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:17) M − = M − , where the matrix M is deﬁned in (3.2). Since the covariance matrix M − is the inverse of the in-formation matrix in the continuous time regression model in (3.1) (see Ibragimov and Has’minskii,1981, p. 81), the linear estimator (3.3) is the BLUE, which completes the proof of Theorem3.1. (cid:3) Remark 3.1

The proof of Theorem 3.1 can easily be modiﬁed to obtain the BLUE for thecontinuous time model on the interval a = [0 , b ]. More precisely, for a = 0 equation (3.7)becomes a deterministic equation equivalent to Y (0) = F ⊤ (0) θ , (3.8)and we have to distinguish three cases.(1) If the regression function F satisﬁes F (0) = p × (that is rank( F (0)) = 0)), the determin-istic equation (3.8) does not contain any further information about the parameter θ andthe maximum likelihood estimator in model (3.1) is given byˆ θ BLUE = M − (cid:16) Z b ˙ F ( t ) Σ − d Y ( t ) (cid:17) , where the minimum variance is given byCov(ˆ θ BLUE ) = M − = (cid:18)Z b ˙F ( t ) Σ − ˙F ⊤ ( t ) dt (cid:19) − . (2) If the rank of the matrix F (0) satisﬁes rank( F (0)) = 1, the deterministic equation (3.8)contains one informative equation about θ . In that case, we assume without loss ofgenerality that F , (0) = 0 and it follows by (3.8) that θ can be reformulated by θ , . . . , θ p through θ = Y (0) − P pi = j θ j F ,j (0) F , (0) . (3.9)9sing (3.9) in combination with model (3.1), we obtain a reduced model by Z ( t ) = Y ( t ) − Y (0) F , (0)  F , (0) F , (0)  = ˜ F ( t )˜ θ + Σ / ε ( t ) , (3.10)where the matrix valued function ˜ F ( t ) is deﬁned by˜ F T ( t ) = (cid:0) F i,j ( t ) − F i, (0) F , (0) F ,j (0) (cid:1) i =1 , ,j =2 ,...p (3.11)and the reduced ( p − θ is given by ˜ θ = ( θ , . . . , θ p ) . It follows byrank( F (0)) = 1, that the matrix valued function ˜ F ( t ) deﬁned in (3.11) satisﬁes ˜F T ( ) =0 × p . Consequently, the modiﬁed model given by (3.10) satisﬁes the condition of thecase given in (1) and the best linear unbiased estimator for the reduced parameter ˜ θ isobtained by ˆ˜ θ BLUE = M − (cid:16) Z b ˙˜ F ( t ) Σ − d Z ( t ) (cid:17) , (3.12)where the process { Z ( t ); t ∈ [0 , b ] } is deﬁned by (3.10), the matrix ˜F ( t ) is given by (3.11)and the minimum variance is given byCov(ˆ˜ θ BLUE ) = M − = (cid:18)Z b ˙˜ F ( t ) Σ − ˙˜ F ⊤ ( t ) dt (cid:19) − . The best linear unbiased estimator for the remaining parameter θ is then obtained byˆ θ = Y (0) − P pi = j ˆ˜ θ BLUE ,j F ,j (0) F , (0) . (3) If the rank of the matrix F (0) satisﬁes rank( F (0)) = 2, equation (3.8) contains twoinformative equations about θ .Let A ( t ) =  F , ( t ) F , ( t ) F , ( t ) F , ( t )  (3.13)be the submatrix of F which contains the ﬁrst two columns of F T ( t ). Without loss ofgenerality, we assume that A (0) is non-singular (as rank( F (0)) = 2).Then it follows by (3.8) that  θ θ  = A − (0) (cid:16) Y (0) − (cid:0) p X j =3 F i,j (0) θ j (cid:1) i =1 , (cid:17) . (3.14)10sing (3.14) in combination with (3.1) we obtain a reduced model given by Z ( t ) = Y ( t ) − A ( t ) A − (0) Y (0) = ˜ F ( t )˜ θ + Σ / ε ( t ) (3.15)where the matrix valued function A ( t ) is given by (3.13), the matrix valued function˜ F T ( t ) is of the form˜ F T ( t ) = A ( t ) A − (0) (cid:0) F i,j (0) (cid:1) i =1 , j =3 ,...p + (cid:0) F i,j ( t ) (cid:1) i =1 , j =3 ,...p (3.16)and the reduced ( p − θ is given by ˜ θ = ( θ , . . . , θ p ) .The matrix valued function ˜ F ( t ) deﬁned in (3.16) satisﬁes ˜F T (0) = × p . Consequently,the modiﬁed model given by (3.15) satisﬁes the condition of the case given in (1) and thebest linear unbiased estimator ˆ˜ θ BLUE for the reduced ( p − θ isobtained by (3.12) using the process { Z ( t ); t ∈ [0 , b ] } deﬁned by (3.15) and the matrixvalued function ˜F ( t ) given by (3.16). The best linear unbiased estimator for the remainingparameter ( θ , θ ) T is then obtained by  ˆ θ ˆ θ  = A − (0) (cid:16) Y (0) − (cid:0) p X j =3 F i,j (0)ˆ˜ θ BLUE , j (cid:1) i =1 , (cid:17) . Recall the deﬁnition of model (2.1) in Section 1. It was demonstrated in Example 2.1 that thiscase is a special case of model (2.5), where the matrix F ⊤ is given by F ⊤ ( t ) =  f ⊤ ( t ) 0 ⊤ p ⊤ p f ⊤ ( t )  (3.17)and θ = ( θ (1) ⊤ , θ (2) ⊤ ) ⊤ . Considering both components in the vector Y separately, we obtaincontinuous versions of the two models introduced in (2.1),that is, Y i ( t ) = f ⊤ i ( t ) θ ( i ) + η i ( t ) , i = 1 , , (3.18)where the error processes { η ( t ) | t ∈ [ a, b ] } is deﬁned by (2.4). An application of Theorem 3.1yields the following BLUE. Corollary 3.1

Consider the continuous time linear regression model (2.5) on the interval [ a, b ] ,with continuously diﬀerentiable matrix (3.17) , a vector { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } of ndependent Brownian motions and a matrix Σ deﬁned by (2.3) . The best linear unbiasedestimator for the parameter θ is given by ˆ θ BLUE =  ˆ θ (1)BLUE ˆ θ (2)BLUE  = 1 σ σ (1 − ̺ ) M − Z ba  σ ˙ f ( t ) − σ σ ̺ ˙ f ( t ) − σ σ ̺ ˙ f ( t ) σ ˙ f ( t )  d  Y ( t ) Y ( t )  + 1 a  σ f ( a ) − σ σ ̺f ( a ) − σ σ ̺f ( a ) σ f ( a )   Y ( a ) Y ( a )  . (3.19) The minimum variance is given by M − , where M = 1 σ σ (1 − ̺ )  σ M − σ σ ̺ M − σ σ ̺ M σ M  and M ij = Z ba ˙ f i ( t ) ˙ f ⊤ j ( t ) dt + 1 a f i ( a ) f Tj ( a ) i, j = 1 , . (3.20)It is of interest to compare the estimator (3.19) with the estimator ˆ θ mar = ((ˆ θ (1)mar ) ⊤ , (ˆ θ (2)mar ) ⊤ ) ⊤ ,which is obtained by estimating the parameter in both models (3.18) separately. It follows fromTheorem 2.1 in Dette et al. (2017a) that the best linear unbiased estimators in these modelsare given by ˆ θ ( ℓ )mar = M − ℓℓ (cid:18)Z ba ˙ f ℓ ( t ) dY ℓ ( t ) + 1 a f ℓ ( a ) Y ℓ ( a ) (cid:19) , ℓ = 1 , , (3.21)where the matrices are deﬁned by M ℓℓ = Z ba ˙ f ℓ ( t ) ˙ f ⊤ ℓ ( t ) dt + 1 a f ℓ ( a ) f ⊤ ℓ ( a ) , ℓ = 1 , . Moreover, the covariance matrices of the estimators ˆ θ (1)mar and ˆ θ (2)mar are the inverses of the Fisherinformation matrices in the individual models, that isCov(ˆ θ ( ℓ )mar ) = σ ℓ M − ℓℓ ℓ = 1 , . (3.22)The following result compares the variance of the two estimators (3.19) and (3.21). Theorem 3.2

If the assumptions of Corollary 3.1 are satisﬁed, we have (with respect to theLoewner ordering)

Cov(ˆ θ ( ℓ )BLUE ) ≤ Cov(ˆ θ ( ℓ )mar ) , ℓ = 1 , , or all ̺ ∈ ( − , , where the ˆ θ ( ℓ )BLUE and ˆ θ ( ℓ )mar are the best linear unbiased estimators of theparameter θ ( ℓ ) obtained by simultaneous estimation (see (3.19) ) and separate estimation in thetwo groups (see (3.21) ) , respectively. Proof.

Without loss of generality we consider the case ℓ = 1, the proof for the index ℓ = 2 isobtained by the same arguments. Let K ⊤ = ( I p , p × p ) be a p × ( p + p )- matrix, where I p and p × p denote the p -identity matrix and a ( p × p )-matrix ﬁlled with zeros. Then,Cov(ˆ θ ( ℓ )BLUE ) = ( C K ( M )) − , where C K ( M ) = ( K ⊤ M − K ) − = 1 σ (1 − ̺ ) (cid:0) M − ̺ M M − M ⊤ (cid:1) (3.23)is the Schur complement of the block M of the information matrix M (see p. 74 in Pukelsheim,2006). Observing (3.22) we now compare C K ( M ) and σ M and obtain C K ( M ) − σ M = 1 σ (1 − ̺ ) (cid:0) M − ̺ M M − M ⊤ (cid:1) − σ M = ̺ σ (1 − ̺ ) (cid:0) M − M M − M ⊤ (cid:1) := ̺ σ (1 − ̺ ) C K ( ˜M ) , (3.24)where C K ( ˜M ) is the Schur complement of the block M of the matrix ˜M =  M M M M  . Note that the matrix ˜M is nonnegative deﬁnite. An application of Lemma 3.12 of Pukelsheim(2006) shows that the Schur complement C K ( ˜M ) is also nonnegative deﬁnite, that is C K ( ˜M ) ≥ (cid:0) Cov(ˆ θ (1)BLUE ) (cid:1) − = C K ( M ) ≥ σ M = (cid:0) Cov(ˆ θ (1)mar ) (cid:1) − and the statement of the theorem follows. (cid:3) Remark 3.2 If ̺ = 0 we have C K ( M ) = M , and it follows from (3.23) that separateestimation in the individual groups does not yield less precise estimates, that is Cov(ˆ θ ( l )mar ) =Cov(ˆ θ (1)BLUE ) ( ℓ = 1 , θ ( l )mar ) ≥ Cov(ˆ θ (1)BLUE ). Moreover, the13nequality is strict in most cases, which means that simultaneous estimation of the parameters θ (1) and θ (2) yields more precise estimators. A necessary condition for strict inequality (i.e thematrix Cov(ˆ θ ( l )mar ) − Cov(ˆ θ (1)BLUE ) is positive deﬁnite) is the condition ̺ = 0. The following resultshows that this condition is not suﬃcient. It considers the important case where the regressionfunctions f and f in (3.17) are the same and shows that in this case the two estimators ˆ θ BLUE and ˆ θ mar coincide. Corollary 3.2

If the assumptions of Corollary 3.1 hold and additionally the regression func-tions in model (2.5) satisfy f = f , the best linear unbiased estimator for the parameter θ isgiven by ˆ θ BLUE =  ˆ θ (1)BLUE ˆ θ (2)BLUE  = Z ba (cid:16) I ⊗ M − ˙ f ( t ) (cid:17) d Y ( t ) + 1 a (cid:0) I ⊗ M − f ( a ) (cid:1) Y ( a ) , where I denotes the × -identity matrix and the matrix M is deﬁned by (3.27) . Moreover,the minimum variance is given by Cov(ˆ θ BLUE ) = Σ ⊗ M − and Cov(ˆ θ ( l )mar ) = Cov(ˆ θ (1)BLUE ) ( ℓ = 1 , . Recall the deﬁnition of model (2.1) in Section 1. It was demonstrated in Example 2.2 that thiscase is a special case of model (2.5), where the matrix of regression functions is given by F ⊤ ( t ) =  f ⊤ ( t ) , ˜ f ⊤ ( t ) , ⊤ p − p f ⊤ ( t ) , ⊤ p − p , ˜ f ⊤ ( t )  (3.25)and the vector of parameters is deﬁned by θ = ( θ (0) ⊤ , ˜ θ (1) ⊤ , ˜ θ (2) ⊤ ) ⊤ . An application of Theorem 3.1 yields the BLUE in model (2.5) with the matrix F ⊤ deﬁned by(3.25). Corollary 3.3

Consider the continuous time linear regression model (2.5) on the interval [ a, b ] ,where the the matrix of regression functions F ⊤ is continuously diﬀerentiable. The best linearunbiased estimator for the parameter θ is given by θ BLUE =  ˆ θ (0)BLUE ˆ˜ θ (1)BLUE ˆ˜ θ (2)BLUE  = 1 σ σ (1 − ̺ ) M − Z ba  ( σ − σ σ ̺ ) ˙ f ( t ) ( σ − σ σ ̺ ) ˙ f ( t ) σ ˙˜ f ( t ) − σ σ ̺ ˙˜ f ( t ) − σ σ ̺ ˙˜ f ( t ) σ ˙˜ f ( t )  d  Y ( t ) Y ( t )  + 1 a  ( σ − σ σ ̺ ) f ( a ) ( σ − σ σ ̺ ) f ( a ) σ ˜ f ( a ) − σ σ ̺ ˜ f ( a ) − σ σ ̺ ˜ f ( a ) σ ˜ f ( a )   Y ( a ) Y ( a )  . (3.26) The minimum variance is

Cov(ˆ θ BLUE ) = M − , where M = 1 σ σ (1 − ̺ )  ( σ + σ − σ σ ̺ ) M ( σ − σ σ ̺ ) M ( σ − σ σ ̺ ) M ( σ − σ σ ̺ ) M σ M − σ σ ̺ M ( σ − σ σ ̺ ) M − σ σ ̺ M σ M  and individual blocks in this matrix are given by M ij = Z ba ˙ g i ( t ) ˙ g ⊤ j ( t ) dt + 1 a g i ( a ) g ⊤ j ( a ) , (3.27) for i, j = 0 , , , where g ( t ) = f ( t ) and g i ( t ) = ˜ f i ( t ) for i = 1 , . It is again of interest to compare the estimate (3.26) with the estimate ˆ θ mar = ((ˆ θ (1)mar ) ⊤ , (ˆ θ (2)mar ) ⊤ ) ⊤ ,which is obtained by estimating the parameter θ (1) = (( θ (0) ) ⊤ , (˜ θ (1) ) ⊤ ) ⊤ in both models (3.18)separately by using (3.21). The corresponding covariances of the estimators ˆ θ (1)mar and ˆ θ (2)mar aregiven by (3.22). The following result compares the variance of the two estimators (3.26) and(3.21). Its proof is similar to the proof of Theorem 3.2 and therefore omitted. Theorem 3.3

If the assumptions of Corollary 3.3 are satisﬁed, we have (with respect to theLoewner ordering)

Cov(ˆ θ ( ℓ )BLUE ) ≤ Cov(ˆ θ ( ℓ )mar ) , ℓ = 1 , , for all ̺ ∈ ( − , , where the ˆ θ ( ℓ )BLUE and ˆ θ ( ℓ )mar are the best linear unbiased estimators of theparameter θ ( ℓ ) obtained by simultaneous and separate estimation, respectively. emark 3.3 The results presented so far have been derived for the case where the errorprocess { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } in (2.5) consist of two independent Brownian motions.This assumption has been made to simplify the notation. Similar results can be obtained forMarkov processes and in this remark we indicate the essential arguments.To be precise, assume that the error processes { ε ( t ) = ( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } in model (2.5)consist of two independent centered Gaussian processes with continuous covariance kernel givenby K ( s, t ) = E [ ε i ( s ) ε i ( t )] = u ( s ) v ( t ) min { q ( s ) , q ( t ) } s, t ∈ [ a, b ] , (3.28)where u ( · ) and v ( · ) are functions deﬁned on the interval [ a, b ] such that the function q ( · ) = u ( · ) /v ( · ) is positive and strictly increasing. Kernels of the form (3.28) are called triangular kernels and a famous result in Doob (1949) essentially shows that a Gaussian process is aMarkov process if and only if its covariance kernel is triangular (see also Mehr and McFadden,1965). In this case model (2.5) can be transformed into a model with an error process consistingof two independent Brownian motions using the arguments given in Appendix B of Dette et al.(2016). More precisely, deﬁne q ( t ) = u ( t ) v ( t )and consider the stochastic process ε ( t ) = v ( t ) ˜ ε ( q ( t )) , where { ˜ ε (˜ t ) = (˜ ε (˜ t ) ⊤ , ˜ ε (˜ t )) | ˜ t ∈ [˜ a, ˜ b ] } consists of two independent Brownian motions onthe interval [˜ a, ˜ b ] = [ q ( a ) , q ( b )]. It now follows from Doob (1949) that the process { ε ( t ) =( ε ( t ) , ε ( t )) ⊤ | t ∈ [ a, b ] } consists of two independent centered Gaussian process on the interval[ a, b ] with covariance kernel (3.28). Consequently, if we consider the model ˜ Y (˜ t ) =  ˜ Y (˜ t )˜ Y (˜ t )  = ˜F ⊤ (˜ t ) θ + Σ / ˜ ε (˜ t ) , ˜ t ∈ [ q ( a ) , q ( b )] , (3.29)and ˜ F (˜ t ) = F ( q − (˜ t )) v ( q − (˜ t )) , ˜ ε (˜ t ) = ε ( q − (˜ t )) v ( q − (˜ t )) , ˜ Y (˜ t ) = Y ( q − (˜ t )) v ( q − (˜ t )) , the results obtained so far are applicable. Thus, a ”good” estimator obtained for the parameter θ in model (3.29) is also a ”good estimator” for the parameter θ in model (3.1) with errorprocess consisting of two Gaussian processes with covariance kernel (3.28). Consequently, wecan derive the optimal estimator for the parameter θ in the continuous time model (3.1) withcovariance kernel (3.28) from the best linear unbiased estimator in the model given in (3.29)16ith Brownian motions by an application of Theorem 3.1. The resulting best linear unbiasedestimator for θ in model (3.1) with triangular kernel (3.28) is of the formˆ θ BLUE = M − n Z ba ˙ F ( t ) v ( t ) − F ( t ) v ( t )˙ u ( t ) v ( t ) − u ( t ) ˙ v ( t ) Σ − d (cid:18) Y ( t ) v ( t ) (cid:19) + F ( a ) Σ − Y ( a ) u ( a ) v ( a ) o , where the minimum variance is given by M − = Z ba (cid:16) ˙ F ( t ) v ( t ) − F ( t ) ˙ v ( t ) (cid:17) Σ − (cid:16) ˙ F ( t ) v ( t ) − F ( t ) ˙ v ( t ) (cid:17) ⊤ v ( t )[ ˙ u ( t ) v ( t ) − u ( t ) ˙ v ( t )] dt + F ( a ) Σ − F ⊤ ( a ) u ( a ) v ( a )  − . In this section we will derive optimal designs for comparing curves. The ﬁrst part is devotedto a discretization of the BLUE in the continuous time model. In the second part we developan optimality criterion to obtain eﬃcient designs for the comparison of curves based on thediscretized estimators.

To obtain a discrete design for n observations at the points a = t , . . . , t n from the continuousdesign derived in Section 3, we use a similar approach as in Dette et al. (2017a) and constructa discrete approximation of the stochastic integral in (3.3). For this purpose we consider thelinear estimatorˆ θ n = M − n n X i =2 Ω i ˙ F ( t i − ) Σ − ( Y ( t i ) − Y ( t i − )) + F ( a ) a Σ − Y a o (4.1)= M − n n X i =2 Φ i Σ − ( Y ( t i ) − Y ( t i − )) + F ( a ) a Σ − Y a o , were a = t < t < . . . < t n − < t n = b , Ω , . . . , Ω n are p × p weight matrices and Φ = Ω ˙ F ( t ) , . . . , Φ n = Ω n ˙ F ( t n − ) are p × M − is given in (3.4). To determine these weights in an “optimal” way we ﬁrstderive a representation of the mean squared error between the best linear estimate (3.3) in thecontinuous time model and its discrete approximation (4.1). The following result is a directconsequence of Ito’s formula. 17 emma 4.1 Consider the continuous time model (3.1) . If the assumptions of Theorem 3.1 aresatisﬁed, we have E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] = M − n n X i =2 Z t i t i − (cid:2) ˙ F ( s ) − Φ i (cid:3) Σ − (cid:2) ˙ F ( s ) − Φ i (cid:3) ⊤ ds + n X i,j =2 Z t i t i − (cid:2) ˙ F ( s ) − Φ i (cid:3) Σ − ˙ F ⊤ ( s ) ds θ θ ⊤ Z t j t j − ˙ F ( s ) Σ − (cid:2) ˙ F ( s ) − Φ i (cid:3) ⊤ ds o M − . (4.2)In the following we choose optimal p × i = Ω i ˙ F ( t i − ) and design points t , . . . , t n − ( t = a, t n = b ), such that the linear estimate (4.1) is unbiased and the mean squared errormatrix in (4.2) “becomes small”. An alternative criterion is to replace the mean squared error E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] by the mean squared error E θ [(ˆ θ n − θ )(ˆ θ n − θ ) ⊤ ]between the estimate ˆ θ n deﬁned in (4.1) and the “true” vector of parameters. The followingresult shows that in the class of unbiased estimators both optimization problems yield the samesolution. The proof is similar to the proof of Theorem 3.1 in Dette et al. (2017a). Theorem 4.1

The estimator ˆ θ n deﬁned in (4.1) is unbiased if and only if the identity M = Z ba ˙ F ( s ) Σ − ˙ F ⊤ ( s ) ds = n X i =2 Φ i Σ − Z t i t i − ˙ F ⊤ ( s ) ds = n X i =2 Φ i Σ − ( F ( t i ) − F ( t i − )) ⊤ , (4.3) is satisﬁed. Moreover, for any linear unbiased estimator of the form ˜ θ n = R ba G ( s ) dY s we have E θ [(˜ θ n − θ )(˜ θ n − θ ) ⊤ ] = E θ [(˜ θ n − ˆ θ BLUE )(˜ θ n − ˆ θ BLUE ) ⊤ ] + M − . In order to describe a solution in terms of optimal “weights” Φ ∗ i and design points t ∗ i we recallthat the condition of unbiasedness of the estimate ˆ θ n in (4.1) is given by (4.3) and introducethe notation B i = [ F ( t i ) − F ( t i − )] Σ − / / p t i − t i − , (4.4) A i = Φ i Σ − / p t i − t i − . It follows from Lemma 4.1 and Theorem 4.1 that for an unbiased estimator ˆ θ n of the form (4.1)the mean squared error has the representation E θ (cid:2) (ˆ θ BLUE − ˆ θ n ) ⊤ (ˆ θ BLUE − ˆ θ n ) (cid:3) = − M − M M − + n X i =2 M − A i A i ⊤ M − , (4.5)18hich has to be “minimized” subject to the constraint M = Z ba ˙ F ( s ) Σ − ˙ F ⊤ ( s ) ds = n X i =2 A i B ⊤ i . (4.6)The following result shows that a minimization with respect to the weights Φ i (or equivalently A i ) can actually be carried out with respect to the Loewner ordering. Theorem 4.2

Assume that the assumptions of Theorem 3.1 are satisﬁed and that the matrix B = n X i =2 B i B ⊤ i = n X i =2 [ F ( t i ) − F ( t i − )] Σ − [ F ( t i ) − F ( t i − )] ⊤ t i − t i − , (4.7) is non-singular. Let Φ ∗ , . . . , Φ ∗ n denote ( p × -matrices satisfying the equations Φ ∗ i = M B − F ( t i ) − F ( t i − ) t i − t i − i = 2 , . . . , n, (4.8) then Φ ∗ , . . . , Φ ∗ n are optimal weight matrices minimizing E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] withrespect to the Loewner ordering among all unbiased estimators of the form (4.1) . Moreover, thevariance of the resulting estimator ˆ θ ∗ n is given by Cov(ˆ θ ∗ n ) = M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − Proof.

Let v denote a p -dimensional vector and consider the problem of minimizing thecriterion v ⊤ E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] v (4.9)subject to the constraint (4.6). Observing (4.5) this yields the Lagrange function G v ( A . . . , A n ) = − v ⊤ M − M M − v + n X i =2 ( v ⊤ M − A i A ⊤ i M − v ) − tr (cid:8) Λ ( M − n X i =2 A i B ⊤ i ) (cid:9) , (4.10)where A , . . . , A n are ( p × Λ = ( λ k,ℓ ) pk,ℓ =1 is a ( p × p )-matrix of Lagrangemultipliers. The function G v is convex with respect to A , . . . , A n . Therefore, taking derivativeswith respect to A j yields as necessary and suﬃcient for the extremum (here we use matrixdiﬀerential calculus)2( M − v ) ⊤ A i ⊗ ( M − v ) ⊤ + vec { ΛB i } = 0 ⊤ p , i = 2 , . . . , n . Rewriting this system of linear equations in a ( p × M − vv ⊤ M − A i = − ΛB i i = 2 , . . . , n . M and B yields for the matrix of Lagrangian multipliers Λ = − M − vv ⊤ M − M B − , which gives 2 M − vv ⊤ M − A i = 2 M − vv ⊤ M − M B − B i i = 2 , . . . , n . (4.11)Note that one solution of (4.11) is given by A ∗ i = M B − B i , i = 2 , . . . , n which does not depend on the vectors v . Therefore, the tuple of matrices ( A ∗ , . . . , A ∗ n ) mini-mizes the convex function G v in (4.10) for all v ∈ R p .Observing the notations in (4.4) shows that the optimal matrix weights are given by (4.8).Moreover, these weights in (4.8) do not depend on the vector v either and provide a simultane-ous minimizer of the criterion deﬁned in (4.9) for all v ∈ R p . Consequently, the weights deﬁnedin (4.8) minimize E θ [(ˆ θ BLUE − ˆ θ n )(ˆ θ BLUE − ˆ θ n ) ⊤ ] under the unbiasedness constraint (4.6) withrespect to the Loewner ordering. (cid:3) Remark 4.1

If the matrix B in Theorem 4.2 is singular, the optimal weights are not uniquelydetermined and we propose to replace the inverse B by its Moore-Penrose inverse.Note that for ﬁxed design points t , . . . , t n Theorem 4.2 yields universally optimal weights Φ ∗ , . . . , Φ ∗ n (with respect to the Loewner ordering) for estimators of the form (4.1) satisfying(4.3). On the other hand, a further optimization with respect to the Loewner ordering withrespect to the choice of the points t , . . . , t n − ( t = a, t n = b ) is not possible, and we have toapply a real valued optimality criterion for this purpose. In the following section, we will derivesuch a criterion which explicitly addresses the comparison of the regression curves from the twogroups introduced in Section 2. We return to the practical scenario of the two groups introduced in (2.1), where we now focuson the comparison of these groups on the interval [ a, b ].More precisely, consider the model introduced in (2.5) and let ˆ θ ∗ n be the estimator (4.1) withoptimal weights deﬁned by (4.8) from n observations taken at the time points a = t < t <. . . < t n − < t n = b . Then this estimator is normally distributed with mean E [ˆ θ ∗ n ] = θ andcovariance matrix Cov(ˆ θ ∗ n ) = M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − M , M and B are given by (3.2), (4.6) and (4.7), respectively. Note thatthe covariance matrix depends on the time points t , . . . , t n through the matrix B − . Moreover,using the estimator ˆ θ ∗ n the prediction of the diﬀerence of a ﬁxed time point t ∈ [ a, b ] satisﬁes(1 , − F ⊤ ( t )ˆ θ ∗ n − (1 , − F ⊤ ( t ) θ ∼ N p (0 , h ( t ; t , . . . , t n )) , where h ( t ; t , . . . , t n ) = (1 , − F ⊤ M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − F ( t )(1 , − T . We now use this result and the results of Gsteiger et al. (2011) to obtain a simultaneous con-ﬁdence band for the diﬀerence of the two curves. More precisely, if the interval [ a, b ] is therange where the two curves should be compared, the simultaneous conﬁdence band is deﬁnedas follows. Consider the statisticˆ T = sup t ∈ [ a,b ] | (1 , − F ⊤ ( t )ˆ θ ∗ n − (1 , − F ⊤ ( t ) θ |{ h ( t ; t , . . . , t n ) } / , and deﬁne D as the (1 − α )-quantile of the corresponding distribution, that is P ( ˆ T ≤ D ) = 1 − α. Note that Gsteiger et al. (2011) propose the parametric bootstrap for choosing the critical value D . Deﬁne u ( t ; t , . . . , t n ) = (1 , − F ⊤ ( t )ˆ θ ∗ n + D · { h ( t ; t , . . . , t n ) } / ,l ( t ; t , . . . , t n ) = (1 , − F ⊤ ( t )ˆ θ ∗ n − D · { h ( t ; t , . . . , t n ) } / , then the conﬁdence band for the diﬀerence of the two regression functions is deﬁned by C − α = (cid:8) g : [ a, b ] → R | l ( t ; t , . . . , t n ) ≤ g ( t ) ≤ u ( t ; t , . . . , t n ) for all t ∈ [ a, b ] (cid:9) . (4.12)Consequently, good time points t = a < t < . . . < t n − , t n = b should minimize the width u ( t ; t , . . . , t n ) − l ( t ; t , . . . , t n ) = 2 · D · { h ( t ; t , . . . , t n ) } / of this band at each t ∈ [ a, b ]. As this is only possible in rare circumstances, we propose tominimize an L p -norm of the function h ( · ; t . . . , t n ) as a design criterion, that isΦ p ( t , . . . , t n ) = k h ( · ; t . . . , t n ) k p := (cid:16) Z ba [ h ( t ; t . . . , t n )] p (cid:17) /p dt, ≤ p ≤ ∞ , (4.13)where the case p = ∞ corresponds to the maximal deviation k h ( · ; t . . . , t n ) k ∞ = sup t ∈ [ a,b ] | h ( t ; t . . . , t n ) | . a = t ∗ < t ∗ < . . . < t ∗ n = b (minimizing (4.13)) and the correspond-ing weights derived in Theorem 4.2 provide the optimal linear unbiased estimator of the form(4.1) (with the corresponding optimal design). Example 4.1

We now conclude this section by considering the cases of no common and com-mon parameters, respectively.(a) If we are in the situation of Example 2.1 (no common parameters), the regression function F ⊤ ( t ) is of the form in (3.17) and the variance of the prediction of the diﬀerence at a ﬁxedpoint t ∈ [ a, b ] reduces to h ( t ; t , . . . , t n ) = ( f ⊤ ( t ) , − f ⊤ ( t )) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − ( f ⊤ ( t ) , − f ⊤ ( t )) ⊤ . The corresponding design criterion is given byΦ p (cid:0) t , . . . , t n (cid:1) = k ( f ⊤ , − f ⊤ ) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − ( f ⊤ , − f ⊤ ) ⊤ k p . (4.14)(b) If we are in the situation of Example 2.2 (common parameters), the regression function F ⊤ ( t ) is given by (3.25) and the variance of the prediction of the diﬀerence at a ﬁxed point t ∈ [ a, b ] reduces to h ( t ; t , . . . , t n ) = (0 , ˜ f ⊤ ( t ) , − ˜ f ⊤ ( t )) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − (0 , ˜ f ⊤ ( t ) , − ˜ f ⊤ ( t )) ⊤ . The corresponding design criterion is given byΦ p (cid:0) t , . . . , t n (cid:1) = k (0 ⊤ p , ˜ f ⊤ , − ˜ f ⊤ ) M − (cid:26) M B − M + 1 a F ( a ) Σ − F ⊤ ( a ) (cid:27) M − (0 ⊤ p , ˜ f ⊤ , − ˜ f ⊤ ) ⊤ k p . In this section the methodology is illustrated in examples by means of a simulation study. Tobe precise, we consider the regression model (2.5), where the matrix F ( t ) is given by (3.17)corresponding to the case that the regression function do not share common parameters, seeSection 3.2 for more details. In this case the corresponding bounds for the conﬁdence band isgiven by (4.12), where u ( t ; t , . . . , t n ) = (ˆ θ ∗ (1) n ) ⊤ f (1) ( t ) − (ˆ θ ∗ (2) n ) ⊤ f (2) ( t ) + D · { h ( t ; t , . . . , t n ) } / ,l ( t ; t , . . . , t n ) = (ˆ θ ∗ (1) n ) ⊤ f (1) ( t ) − (ˆ θ ∗ (2) n ) ⊤ f (2) ( t ) − D · { h ( t ; t , . . . , t n ) } / , θ ∗ n = ((ˆ θ ∗ (1) n ) ⊤ , (ˆ θ ∗ (2) n ) ⊤ ) ⊤ is the estimator (4.1) with optimal weights deﬁned in (4.8). Thedesign space is given by the interval [ a, b ] = [1 , f and f in the matrix (3.17), that is f A ( t ) = ( t, sin( t ) , cos( t )) ⊤ ,f B ( t ) = ( t , cos( t ) , cos(2 t )) ⊤ , (4.15) f C ( t ) = (cid:0) t, log( t ) , t (cid:1) ⊤ . To model the dependence between the two groups we use the covariance matrix Σ =  ̺̺  , in (2.5), where the correlations are chosen as ̺ = 0 . , . , .

7. Following the discussion in Section4.1 we focus on the comparison of the regression curves for the two groups and derive optimaldesigns, minimizing the criterion Φ ∞ deﬁned in (4.14). As result, we obtain simultaneousconﬁdence bands with a smaller maximal width for the diﬀerence of the curves describing therelation in the two groups. We can obtain similar results diﬀerent values p ∈ (0 , ∞ ) in (4.14)but for the sake of brevity we concentrate on the criterion Φ ∞ which is probably also the easiestto interpret for practitioners.We denote by ˆ θ ∗ n the linear unbiased estimator derived in Section 4. For each of the combina-tions of regression functions containing two diﬀerent functions deﬁned in (4.15), the optimalweights have been found by Theorem 4.2 and the optimal design points t ∗ i are determined min-imising the criterion Φ ∞ deﬁned in (4.14). For the numerical optimisation the Particle SwarmOptimisation (PSO) algorithm is used (see, for example, Clerc, 2006) assuming a sample sizeof four observations in each group, that is, n = 4. Furthermore, the uniform design used in thefollowing calculations is the design which has four equally spaced design points in the interval[1 , ∞ -optimal design points minimizing the criterion criterion in (4.14) are given inTable 1 for all combinations of models and correlations under consideration. Note that for eachmodel the corresponding optimal design points change for diﬀerent values of correlation ̺ .In order to investigate the impact of the optimal design on the structure of the conﬁdence bandswe have performed a small simulation study simulating conﬁdence bands for the diﬀerence of theregression functions. The vector of parameter values used for each model is θ = ( θ (1) ⊤ , θ (2) ⊤ ) ⊤ =(1 , , , , , ⊤ . In Figure 1 we display the averages of uniform conﬁdence bands deﬁned in(4.12) under the uniform and optimal design calculated by 100 simulation runs.The left, middle and right columns show the results for the correlation ̺ = 0 . ̺ = 0 . ̺ = 0 .

7, respectively, while the rows correposnd to diﬀerent combinations for the functions f Conﬁdence bands for the diﬀerence of the regression functions (solid grey line) on thebasis of an optimal (solid lines) and uniform design (dashed lines). Left panel: ̺ = 0 . . Middlepanel: ̺ = 0 . . Right panel: ̺ = 0 . . First row: model with f = f A and f = f B . Second row:model with f = f A and f = f C . Third row: model with f = f B and f = f C . − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Optimal designs points on the interval [1 , for the estimator ˆ θ ∗ n in (4.1) minimizingthe criterion Φ ∞ in (4.14) . Diﬀerent correlations ̺ = 0 . , . , . and diﬀerent regressionfunctions deﬁned in (4.15) are considered. correlationmodels ̺ = 0 . ̺ = 0 . ̺ = 0 . f = f A & f = f B [1, 1.59, 3.93, 10] [1, 1.62, 3.91, 10] [1, 1.74, 7.99, 10] f = f A & f = f C [1, 3.46, 9.60, 10] [1, 2.86, 8.83, 10] [1, 2.61, 3.52, 10] f = f B & f = f C [1, 2.20 , 6.25, 10] [1, 1.62, 3.98 , 10] [1, 2.85, 6.29 , 10]and f (ﬁrst row: f = f A , f = f B , middle row: f = f A , f = f C and last row f = f B , f = f C ). In each graph, the conﬁdence bands from the Φ ∞ -optimal or the uniform design areplotted separately using the solid and dashed lines respectively, along with the plot for the truediﬀerence f ⊤ ( t ) θ (1) − f ⊤ ( t ) θ (2) (solid grey lines).We observe, that in all cases under considerations the use of Φ ∞ -optimal designs yields a clearlyvisible improvement compared to the uniform design. The maximal width of the conﬁdenceband is reduced substantially. Moreover, the bands from the Φ ∞ -optimal designs are nearlyuniformly more narrow than the bands based on the uniform design. Even more importantly,the conﬁdence bands based on the Φ ∞ -optimal design show a similar structure as the truediﬀerences, while the conﬁdence bands from the uniform design oscillate.A comparison of the left, middle and right columns in Figure 1 shows that the maximum widthfor the conﬁdence bands based on the optimal design decreases with increasing (absolute) cor-relation ̺ . This eﬀect is not visible for the conﬁdence bands based on the uniform design. Forexample, for the middle row of Figure 1, which corresponds to the case f = f A and f = f C ,the maximum width of the conﬁdence bands based on the equally spaced design points evenseem to increase.Table 2 presents the values of the criterion Φ ∞ in (4.14) for the diﬀerent scenarios and conﬁrmsthe conclusions drawn from the visual inspection of the conﬁdence bands plots. We observethat the use of the optimal design points reduces the maximum width of the conﬁdence bandssubstantially. Moreover, for the optimal design the maximum width becomes smaller withincreasing (absolute) correlation. On the other hand this monotonicity cannot be observed inall cases for the uniform designs. 25ummarizing, the use of the proposed Φ ∞ -optimal design improves statistically inference sub-stantially reducing the maximum variance of the diﬀerence of the two estimated regressioncurves. Moreover, simultaneous estimation in combination with a Φ ∞ -optimal design yields afurther reduction of the maximum width of the conﬁdence bands, thus providing a more preciseinference for the diﬀerence of the curves describing the relation between t and the responses inthe two groups.Table 2: Values of the criterion Φ ∞ for the optimal and uniform design with four observationsin each group in the interval [1 , . The error process is given by a two independent Brownianmotions with correlation ̺ = 0 . , . , . between the groups, respectively. CorrelationModels Design ̺ = 0 . ̺ = 0 . ̺ = 0 . f = f A & f = f B optimal 14.79 9.44 6.09uniform 141.87 142.59 148.74 f = f A & f = f C optimal 16.00 10.00 6.60uniform 33.32 29.10 25.66 f = f B & f = f C optimal 14.71 9.53 5.99uniform 147.27 127.19 115.07 Acknowledgements

This research was partially supported by the Collaborative ResearchCenter ‘Statistical modeling of nonlinear dynamic processes’ (

Sonderforschungsbereich 823,Teilprojekt C2 ).On behalf of all authors, the corresponding author states that there is no conﬂict of interest.

References

Attia, A. and Constantinescu, E. (2020). Optimal experimental design for inverse problems inthe presence of observation correlations. arxiv:2007.14476 .26ickel, P. J. and Herzberg, A. M. (1979). Robustness of design against autocorrelation intime I: Asymptotic theory, optimality for location and linear regression.

Annals of Statistics ,7(1):77–95.Bretz, F., M¨ollenhoﬀ, K., Dette, H., Liu, W., and Trampisch, M. (2018). Assessing the similarityof dose response and target doses in two non-overlapping subgroups.

Statistics in Medicine ,37:722–738.Campos-Barreiro, S. and L´opez-Fidalgo, J. (2015). D -optimal experimental designs for a growthmodel applied to a Holstein-Friesian dairy farm. Statistical Methods & Applications , 24:491–505.Clerc, M. (2006).

Particle Swarm Optimization . Iste Publishing Company, London.Dette, H., Konstantinou, M., and Zhigljavsky, A. (2017a). A new approach to optimal designsfor correlated observations.

Ann. Statist. , 45(4):1579–1608.Dette, H., Kunert, J., and Pepelyshev, A. (2008). Exact optimal designs for weighted leastsquares analysis with correlated errors.

Statistica Sinica , 18(1):135–154.Dette, H., M¨ollenhoﬀ, K., Volgushev, S., and Bretz, F. (2018). Equivalence of regression curves.

Journal of the American Statistical Association , 113:711–729.Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2013). Optimal design for linear models withcorrelated observations.

The Annals of Statistics , 41(1):143–176.Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2016). Optimal designs in regression withcorrelated errors.

Ann. Statist. , 44(1):113–152.Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2019). The blue in continuous-time regressionmodels with correlated errors.

Ann. Statist. , 47(4):1928–1959.Dette, H. and Schorning, K. (2016). Optimal designs for comparing curves.

Ann. Statist. ,44(3):1103–1130.Dette, H., Schorning, K., and Konstantinou, M. (2017b). Optimal designs for comparingregression models with correlated observations.

Computational Statistics & Data Analysis ,113:273 – 286.Doob, J. L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems.

The Annals ofMathematical Statistics , 20(3):393–403.Fedorov, V. V. and Leonov, S. L. (2013).

Optimal Design for Nonlinear Response Models . CRCPress, Boca Raton, FL, USA.Gsteiger, S., Bretz, F., and Liu, W. (2011). Simultaneous conﬁdence bands for non- linearregression models with application to population pharmacokinetic analyses.

Journal of Bio-pharmaceutical Statistics , 21(4):708–725.Harman, R. and ˇStulajter, F. (2010). Optimal prediction designs in ﬁnite discrete spectrumlinear regression models.

Metrika , 72(2):281–294.27bragimov, I. A. and Has’minskii, R. Z. (1981).

Statistical Estimation . Springer-Verlag, NewYork-Berlin.Kiselak, J. and Stehl´ık, M. (2008). Equidistant D -optimal designs for parameters of Ornstein-Uhlenbeck process. Statistics and Probability Letters , 78:1388–1396.Liu, W., Bretz, F., Hayter, A. J., and Wynn, H. P. (2009). Assessing non-superiority, non-inferiority or equivalence when comparing two regression models over a restricted covariateregion.

Biometrics , 65:1279–1287.Liu, W., Jamshidian, M., and Zhang, Y. (2011). Multiple comparison of several linear regressionmodels.

Journal of the American Statistical Association , 99(466):395–403.Mehr, C. B. and McFadden, J. (1965). Certain properties of Gaussian processes and theirﬁrst-passage times.

Journal of the Royal Statistical Society, Ser. B. , 27(3):505–522.M¨ollenhoﬀ, K., Bretz, F., and Dette, H. (2020). Equivalence of regression curves sharingcommon parameters.

Biometrics , 76:518–529.M¨ollenhoﬀ, K., Dette, H., Kotzagiorgis, E., Volgushev, S., and Collignon, O. (2018). Regulatoryassessment of drug dissolution proﬁles comparability via maximum deviation.

Statistics inMedicine , 37:2968–2981.M¨uller, W. G. and P´azman, A. (2003). Measures for designs in experiments with correlatederrors.

Biometrika , 90:423–434.N¨ather, W. (1985).

Eﬀective Observation of Random Fields . Teubner Verlagsgesellschaft,Leipzig.P´azman, A. and M¨uller, W. G. (2001). Optimal design of experiments subject to correlatederrors.

Statist. Probab. Lett. , 52(1):29–34.Pukelsheim, F. (2006).

Optimal Design of Experiments . SIAM, Philadelphia.Rodriguez-Diaz, J. M. (2017). Computation of c-optimal designs for models with correlatedobservations.

Computational Statistics & Data Analysis , 113:287 – 296.Sacks, J. and Ylvisaker, N. D. (1966). Designs for regression problems with correlated errors.

Annals of Mathematical Statistics , 37:66–89.Sacks, J. and Ylvisaker, N. D. (1968). Designs for regression problems with correlated errors;many parameters.