D -optimal designs for Poisson regression with synergetic interaction effect
Fritjof Freise, Ulrike Graßhoff, Frank Röttger, Rainer Schwabe
DD -OPTIMAL DESIGNS FOR POISSON REGRESSION WITHSYNERGETIC INTERACTION EFFECT FRITJOF FREISE, ULRIKE GRASSHOFF, FRANK R ¨OTTGER, AND RAINER SCHWABE
Abstract.
We characterize D -optimal designs in the two-dimensional Poisson re-gression model with synergetic interaction and provide an explicit proof. The proofis based on the idea of reparameterization of the design region in terms of contoursof constant intensity. This approach leads to a substantial reduction of complexity asproperties of the sensitivity can be treated along and across the contours separately.Furthermore, some extensions of this result to higher dimensions are presented. Keywords. D -optimal design, Poisson regression, Interaction, Synergy effect, Min-imally supported design Introduction
Count data plays an important role in medical and pharmaceutical development,marketing, or psychological research. For example, Vives, Losilla, and Rodrigo [21]performed a review on articles published in psychological journals in the period from2002 to 2006. There they found out that a substantial part of these articles dealt withcount data for which the mean was quite low (for details we refer to the discussionin Graßhoff et al. [8]). In these situations, standard linear models are not applicablebecause they cannot account for the inherent heteroscedasticity. Instead Poisson re-gression models are often more appropriate to describe such data. As an early sourcein psychological research we may refer to the Rasch Poisson counts model introducedby Rasch [15] in 1960 to predict person ability in an item response setup.The Poisson regression model can be considered as a particular Generalized LinearModel (see McCullagh and Nelder [13]). For the analysis of count data in the Poissonregression model there is a variety of literature (see e. g. Cameron and Trivedi [3]) andthe statistical analysis is implemented in main standard statistical software packages(cf. “glm” in R,“GENLIN” in SPSS, “proc genmod” in SAS), But only few work hasbeen done to design such experiments. Ford, Torsney and Wu derived optimal designsfor the one-dimensional Poisson regression model in their pioneering paper on canonicaltransformations [7]. Wang et al. [22] obtained numerical solutions for optimal designsin two-dimensional Poisson regression models both for the main effects only (additive)model as well as for the model with interaction term. For the main effects only modelthe optimality of their design was proven analytically by Russell et al. [17] even for
Corresponding author: Frank R¨ottger. a r X i v : . [ m a t h . S T ] J un F. FREISE, U. GRASSHOFF, F. R ¨OTTGER, AND R. SCHWABE larger dimensions. Rodr´ıguez-Torreblanca and Rodr´ıguez-D´ıaz [16] extended the resultby Ford et al. for one-dimensional Poisson regression to overdispersed data specified bya negative binomial regression model, and Schmidt and Schwabe [18] generalized theresult by Russell et al. for higher-dimensional Poisson regression to a much broaderclass of additive regression models. Graßhoff et al. [8] gave a complete characteriza-tion of optimal designs in an ANOVA-type setting for Poisson regression with binarypredictors and Kahle et al. [9] indicate, how interactions could be incorporated in thisparticular situation.In this paper, we find D -optimal designs for the two-dimensional Poisson regressionmodel with synergetic interaction as before considered numerically by Wang et al. [22].We show the D -optimality by reparametrizing the design space via hyperbolic coordi-nates, such that the inequalities in the Kiefer–Wolfowitz equivalence theorem only needto be checked on the boundary and the diagonal of the design region. This allows usto find an analytical proof for the D -optimality of the proposed design. Furthermore,we extend this result in various ways to higher-dimensional Poisson regression. First,we find D -optimal designs for first-order and second-order interactions, given that theprespecified interaction parameters are zero. Second, we present a D -optimal designfor Poisson regression with first-order synergetic interaction where the design space isrestricted to the union of the two-dimensional faces of the positive orthant.The paper is organized as follows. In the next section we introduce the basic no-tations for Poisson regression models and specify the corresponding concepts of infor-mation and design in Section 3. Results for two-dimensional Poisson regression withinteraction are established in Section 4. In Section 5, we present some extensionsto higher-dimensional Poisson regression models. Further extensions are discussed inSection 6. Technical proofs have been deferred to an Appendix. We note that mostof the inequalities there have first been detected by using the computer algebra sys-tem Mathematica [23], but analytical proofs are provided in the Appendix for thereaders’ convenience. 2.
Model Specification
We consider the Poisson regression model where observations Y are Poisson dis-tributed with intensity E ( Y ) = λ ( x ) which depends on one or more explanatoryvariables x = ( x , ..., x k ) in terms of a generalized linear model. In particular, weassume a log-link which relates the mean λ ( x ) to a linear component f ( x ) (cid:62) β by λ ( x ) = exp( f ( x ) (cid:62) β ), where f ( x ) = ( f ( x ) , ..., f p ( x )) (cid:62) is a vector of p known regres-sion functions and β is a p -dimensional vector of unknown parameters. For exam-ple, if x = x is one-dimensional ( k = 1), then simple Poisson regression is given by f ( x ) = (1 , x ) (cid:62) with p = 2, β = ( β , β ) (cid:62) and intensity λ ( x ) = exp( β + β x ). Fortwo explanatory variables x = ( x , x ) ( k = 2) multiple Poisson regression withoutinteraction is given by f ( x ) = (1 , x , x ) (cid:62) with p = 3, β = ( β , β , β ) (cid:62) and intensity λ ( x ) = exp( β + β x + β x ). -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 3 In what follows we will focus on the two-dimensional multiple regression ( x =( x , x ), k = 2) with interaction term, where p = 4, f ( x ) = (1 , x , x , x x ) (cid:62) , β =( β , β , β , β ) (cid:62) and intensity(2.1) λ ( x ) = exp( β + β x + β x + β x x ) . Here β is an intercept term such that the mean is exp( β ) when the explanatoryvariables are equal to 0. The quantities β and β denote the direct effects of eachsingle explanatory variable, and β describes the amount of the interaction effectwhen both explanatory variables are active (non-zero).Typically the explanatory variables describe non-negative quantities ( x , x ≥ β , β < β ≤ β < Information and Design
In experimental situations the setting x of the explanatory variables may be chosenby the experimenter from some experimental region X . As the explanatory variablesdescribe non-negative quantities, and if there are no further restrictions on these quan-tities, it is natural to assume that the design region X is the non-negative half-axis[0 , ∞ ) or the closure of quadrant I in the Cartesian plane, [0 , ∞ ) , in one- or two-dimensional Poisson regression, respectively.To measure the contribution of an observation Y at setting x the correspondinginformation can be used: With the log-link the Poisson regression model constitutes ageneralized linear model with canonical link [13]. Furthermore for Poisson distributedobservations Y the variance and the mean coincide, Var( Y ) = E ( Y ) = λ ( x ). Hence,according to [2] the elemental (Fisher) information for an observation Y at a setting x is a p × p matrix given by M β ( x ) = λ ( x ) f ( x ) f ( x ) (cid:62) . Note that on the right-hand side the intensity λ ( x ) = exp( f ( x ) (cid:62) β ) depends on thelinear component f ( x ) (cid:62) β and, hence, on the parameter vector β . Consequently alsothe information depends on β as indicated by the notation M β .For N independent observations Y , ..., Y N at settings x , ..., x N the joint Fisherinformation matrix is obtained as the sum of the elemental information matrices, M β ( x , ..., x N ) = N (cid:88) i =1 λ ( x i ) f ( x i ) f ( x i ) (cid:62) . F. FREISE, U. GRASSHOFF, F. R ¨OTTGER, AND R. SCHWABE
The collection x , ..., x N of settings is called an exact design, and the aim of designoptimization is to choose these settings such that the statistical analysis is improved.The quality of a design can be measured in terms of the information matrix because itsinverse is proportional to the asymptotic covariance matrix of the maximum-likelihoodestimator of β , see Fahrmeir and Kaufmann [4]. Hence, larger information meanshigher precision. However, matrices are not comparable in general. Therefore one hasto confine oneself to some real valued criterion function applied to the informationmatrix. In accordance with the literature we will use the most popular D -criterionwhich aims at maximizing the determinant of the information matrix. This criterionhas nice analytical properties and can be interpreted in terms of minimization of thevolume of the asymptotic confidence ellipsoid for β based on the maximum-likelihoodestimator. The optimal design will depend on the parameter vector β and is, hence,only locally optimal.Finding an optimal exact design is a discrete optimization problem which is oftentoo hard for analytical solutions. Therefore we adopt the concept of approximatedesigns in the spirit of Kiefer [10]. An approximate design ξ is defined as a collection x , ..., x n − of n mutually distinct settings in the design region X with correspondingweights w , ..., w n − ≥ (cid:80) n − i =0 w i = 1. Then an exact design can be writtenas an approximate design, where x , ..., x n − are the mutually distinct settings in theexact design with corresponding numbers N , ..., N n − of replications, (cid:80) n − i =0 N i = N ,and frequencies w i = N i /N , i = 0 , ..., n −
1. However, in an approximate design theweights are relaxed from multiples of 1 /N to non-negative real numbers which allowfor continuous optimization.For an approximate design ξ the information matrix is defined as M β ( ξ ) = n − (cid:88) i =0 w i λ ( x i ) f ( x i ) f ( x i ) (cid:62) , which therefore coincides with the standardized (per observation) information matrix N M β ( x , ..., x N ). An approximate design ξ ∗ will be called locally D -optimal at β if itmaximizes the determinant of the information matrix M β ( ξ ).4. Optimal Designs
We start with quoting results from the literature for one-dimensional and two-dimensional regression without interaction: In the case of one-dimensional Poissonregression the design ξ ∗ β which assigns equal weights w ∗ = w ∗ = 1 / x ∗ = 0 and x ∗ = 2 / | β | is locally D -optimal at β on X = [0 , ∞ ) for β <
0, seeRodr´ıguez-Torreblanca and Rodr´ıguez-D´ıaz [16].In the case of two-dimensional Poisson regression without interaction the design ξ ∗ β ,β which assigns equal weights w ∗ = w ∗ = w ∗ = 1 / x ∗ = (0 , x ∗ = (2 / | β | , x ∗ = (0 , / | β | ) is locally D -optimal at β on X = [0 , ∞ ) for -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 5 β , β <
0, see Russell et al. [17]. Note that the optimal coordinates on the axescoincide with the optimal values in the one-dimensional case, see Schmidt and Schwabe[18].In both cases the optimal design is minimally supported, i.e. the number n of supportpoints of the design is equal to the number p of parameters. It is well-known that for D -optimal minimally supported designs the optimal weights are all equal, w ∗ i = 1 /p ,see Silvey [20]. Such optimal designs are attractive as they can be realized as exactdesigns when the sample size N is a multiple of the number of parameters p .Further note that these optimal designs always include the setting x = 0 or x =(0 , λ attains its largest value.The above findings coincide with the numerical results obtained by Wang et al. [22]who also numerically found minimally supported D -optimal designs for the case of two-dimensional Poisson regression with interaction. In what follows we will give explicitformulae for these designs and establish rigorous analytical proofs of their optimality.We start with the special situation of vanishing interaction ( β = 0). In this casestandard methods of factorization can be applied to establish the optimal design, seeSchwabe [19], section 4. Theorem 4.1. If β , β < and β = 0 , then the design ξ ∗ β ⊗ ξ ∗ β which assigns equalweights w ∗ = w ∗ = w ∗ = w ∗ = 1 / to the four settings x ∗ = (0 , , x ∗ = (2 / | β | , , x ∗ = (0 , / | β | ) , and x ∗ = (2 / | β | , / | β | ) is locally D -optimal at β on X = [0 , ∞ ) .Proof. The regression function f ( x ) = (1 , x , x , x x ) (cid:62) is the Kronecker product ofthe regression functions f ( x ) = (1 , x ) (cid:62) and f ( x ) = (1 , x ) (cid:62) in the correspondingmarginal one-dimensional Poisson regression models, and the design region X is theCartesian product of the marginal design regions X = X = [0 , ∞ ). Also the intensity λ ( x ) = exp( β + β x + β x ) factorizes into the marginal intensities λ ( x ) = exp( β + β x ) and λ ( x ) = exp( β x ) for the marginal parameters β = ( β , β ) (cid:62) and β =(0 , β ) (cid:62) , respectively. As mentioned before the designs ξ ∗ β j which assign equal weights1 / x j = 0 and x j = 2 / | β j | are locally D -optimal at β j on X j , j = 1 , ξ ∗ β ⊗ ξ ∗ β which is defined as the measure theoretic productof the marginals is locally D -optimal at β by an application of Theorem 4.2 in [19]. (cid:3) In contrast to the result of Theorem 4.1 the intensity fails to factorize in the case ofa non-vanishing interaction ( β (cid:54) = 0). Thus a different approach has to be chosen. Asa prerequisite we mention that in the above cases the optimal designs can be derivedfrom those for standard parameter values β = 0 and β = − β = β = − β = 0 and β = β = − ρ = − β ≥ F. FREISE, U. GRASSHOFF, F. R ¨OTTGER, AND R. SCHWABE ρ t Figure 1.
Value of optimal t in Lemma 4.2 for − / ≤ ρ ≤ Standardized Case.
Throughout this subsection we assume the standardizedsituation with β = (0 , − , − , − ρ ) (cid:62) for some ρ ≥
0. Motivated by Theorem 4.1 andthe numerical results in Wang et al. [22] we consider a class Ξ of minimally supporteddesigns as potential candidates for being optimal. In the class Ξ the designs have onesetting at the origin x = (0 , x = ( x , x = (0 , x ) on each of the bounding axes of the design region as for the optimaldesign in the model without interaction, and an additional setting x = ( t, t ) on thediagonal of the design region, where the effects of the two components are equal. Thefollowing result is due to K¨onner [12]. Lemma 4.2.
Let t = ( √ ρ − / (2 ρ ) for ρ > and t = 2 for ρ = 0 . Then thedesign ξ t which assigns equal weights / to x = (0 , , x = (2 , , x = (0 , , and x = ( t, t ) is locally D -optimal within the class Ξ . Note that t = 2 for ρ = 0 is in accordance with the optimal product-type design inTheorem 4.1, t is continuously decreasing in ρ , and t tends to 0 when the strength ofsynergy ρ gets arbitrarily large. Figure 1 shows the value of t in dependence on ρ .To establish that ξ t is locally D -optimal within the class of all designs on X wewill make use of the Kiefer–Wolfowitz equivalence theorem [11] in its extended versionincorporating intensities, see Fedorov [6]. For this we introduce the sensitivity function -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 7 Figure 2.
Deduced sensitivity function for t = 2 ( ρ = 0) ψ ( x ; ξ ) = λ ( x ) f ( x ) (cid:62) M ( ξ ) − f ( x ) , where we suppress the dependence on β in the nota-tion. Then by the equivalence theorem a design ξ ∗ is (locally) D -optimal if (and onlyif) the sensitivity function ψ ( x ; ξ ∗ ) does not exceed the number p of parameters uni-formly on the design region X . Equivalently we may consider the deduced sensitivityfunction d ( x ; ξ ) = f ( x ) (cid:62) M ( ξ ) − f ( x ) /p − /λ ( x )as λ ( x ) >
0. Then ξ t is D -optimal if d ( x ; ξ t ) ≤ x ∈ X . To establish thiscondition we need some preparatory results on the shape of the (deduced) sensitivityfunction. Figure 2 shows d ( x ; ξ t ) for t = 2 for ρ = 0, i.e. for the standardized settingin Theorem 4.1. Lemma 4.3. If ξ is invariant under permutation of x and x , then d ( x ; ξ ) attains itsmaximum on the boundary or on the diagonal of X . Lemma 4.4. d (( x, ξ t ) = d ((0 , x ); ξ t ) ≤ for all x ≥ . Lemma 4.5. d (( x, x ); ξ t ) ≤ for all x ≥ . Note that ξ t is invariant with respect to the permutation of x and x . Then,combining Lemmas 4.3 to 4.5, we obtain d ( x ; ξ t ) ≤ x ∈ X which establishesthe D -optimality of ξ t in view of the equivalence theorem. F. FREISE, U. GRASSHOFF, F. R ¨OTTGER, AND R. SCHWABE
Theorem 4.6.
In the two-dimensional Poisson regression model with interaction thedesign ξ t is locally D -optimal at β = (0 , − , − , − ρ ) (cid:62) on X = [0 , ∞ ) which assignsequal weights / to the settings x = (0 , , x = (2 , , x = (0 , , and x = ( t, t ) ,where t = ( √ ρ − / (2 ρ ) for ρ > and t = 2 for ρ = 0 . General case.
For the general situation of decreasing intensities ( β , β < β <
0) the optimal design can be obtained by simultaneousscaling of the settings x = ( x , x ) → ˜ x = ( x / | β | , x / | β | ) and of the parameters β = (0 , − , − , − ρ ) (cid:62) → ˜ β = (0 , β , β , − ρβ β ) (cid:62) by equivariance, see Radloff andSchwabe [14]. This simultaneous scaling leaves the linear component and, hence, theintensity unchanged, f (˜ x ) (cid:62) ˜ β = f ( x ) (cid:62) β . If the scaling of x is applied to the settingsin ξ t of Theorem 4.6, then the resulting rescaled design will be locally D -optimal at˜ β on X as the design region is invariant with respect to scaling. Furthermore, thedesign optimization is not affected by the value β of the intercept term because thisterm contributes to the intensity and, hence, to the information matrix only by amultiplicative factor, λ ( x ) = exp( β ) exp( β x + β x + β x x ). We thus obtain thefollowing result from Theorem 4.6. Theorem 4.7.
Assume the two-dimensional Poisson regression model with interactionand β = ( β , β , β , β ) (cid:62) with β , β < and β ≤ . Let ρ = − β / ( β β ) , t =( √ ρ − / (2 ρ ) for β < and t = 2 for β = 0 . Then the design which assignsequal weights / to the settings x = (0 , , x = (2 / | β | , , x = (0 , / | β | ) , and x = ( t/ | β | , t/ | β | ) is locally D -optimal at β on X = [0 , ∞ ) . Note that the settings x , x , and x of the locally D -optimal design ξ t in themodel with interaction coincide with those of the optimal design for the model withoutinteraction. Only a fourth setting x = ( t/ | β | , t/ | β | ) has been added in the interiorof the design region. 5. Higher-dimensional Models
In the present section on k -dimensional Poisson regression with k explanatory vari-ables ( x = ( x , x , ..., x k ), k ≥
3) we restrict to the standardized case with zero inter-cept ( β = 0) and all main effects β = ... = β k equal to − β and β , ..., β k < k -dimensional Poisson regression without interactions f ( x ) (cid:62) β = β + k (cid:88) j =1 β j x j Russell et al. [17] showed that the minimally supported design which assigns equalweights 1 / ( k + 1) to the origin x = (0 , ...,
0) and the k axial settings x = (2 , , ..., x = (0 , , ..., ... , x k = (0 , ..., ,
2) is locally D -optimal at β = (0 , − , ..., − (cid:62) . -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 9 Schmidt and Schwabe [18] more generally proved that in models without interactionsthe locally D -optimal design points coincide with their counterparts in the marginalone-dimensional models. This approach will be extended in Theorems 5.2 and 5.4 totwo- and three-dimensional marginals with interactions.In what follows we mainly consider the particular situation that all interactionsoccurring in the models have values equal to 0 and that the design region is the fullorthant X = [0 , ∞ ) k . Setting the interactions to zero does not mean that we presumeto know that there are no interactions in the model. Instead we are going to determinelocally optimal designs in models with interactions which are locally optimal at such β for which all interaction terms attain the value 0.We start with a generalization of Theorem 4.1 to a k -dimensional Poisson regressionmodel with complete interactions f ( x ) (cid:62) β = β + k (cid:88) j =1 β j x j + (cid:88) i In the k -dimensional Poisson regression model with complete interac-tions the minimally supported design ξ ∗− ⊗ ... ⊗ ξ ∗− which assigns equal weights /p to the p = 2 k settings of the full factorial on { , } k is locally D -optimal at β on X = [0 , ∞ ) k , when β = ... = β k = − and all interactions β ij , ..., β ...k are equal to . The proof of Theorem 5.1 follows the lines of the proof of Theorem 4.1 as all ofthe design region X , the vector of regression functions f , and the intensity function λ factorize to their one-dimensional counterparts. Hence, details will be omitted.Now we come back to the Poisson regression model with first-order interactions f ( x ) (cid:62) β = β + k (cid:88) j =1 β j x j + (cid:88) i In the k -dimensional Poisson regression model with first-order inter-actions the minimally supported design which assigns equal weights /p to the p =1 + k + k ( k − / settings x = (0 , , ..., , x = (2 , , ..., , x = (0 , , ..., , ... , x k = (0 , ..., , , and x ij = x i + x j , ≤ i < j ≤ k , is locally D -optimal at β on X = [0 , ∞ ) k , when β = ... = β k = − and β ij = 0 , ≤ i < j ≤ k . For illustrative purposes we specify this result for k = 3 components. Corollary 5.3. In the three-dimensional Poisson regression model with first-order in-teractions f ( x ) (cid:62) β = β + β x + β x + β x + β x x + β x x + β x x Figure 3. Design points in Example 5.3 the minimally supported design which assigns equal weights / to the settings x =(0 , , , x = (2 , , , x = (0 , , , x = (0 , , , x = (2 , , , x = (2 , , , and x = (0 , , is locally D -optimal at β on X = [0 , ∞ ) , when β = β = β = − and β = β = β = 0 . The optimal design points of Corollary 5.3 are visualized in Figure 3. Note that in inthe Poisson regression model with first-order interactions the locally D -optimal designhas only support points on the axes and on the diagonals of the faces, but none in theinterior of the design region, and that the support points on each face coincide withthe optimal settings for the corresponding two-dimensional marginal model. Thus onlythose settings are included from the full factorial { , } k of the complete interactioncase (Theorem 5.1) which have, at most, two non-zero components, and the locally D -optimal design concentrates on settings with higher intensity. This is in accordancewith the findings for the Poisson regression model without interactions, where onlythose settings will be used which have, at most, one non-zero component, and carriesover to higher-order interactions. In particular, for the Poisson regression model withsecond-order interactions f ( x ) (cid:62) β = β + k (cid:88) j =1 β j x j + (cid:88) i 6, weobtain a similar result. Theorem 5.4. In the k -dimensional Poisson regression model with second-order in-teractions the minimally supported design which assigns equal weights /p to the p = -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 11 k + k ( k − / k ( k − k − / settings x = (0 , , ..., , x = (2 , , ..., , x =(0 , , ..., , ... , x k = (0 , ..., , , x ij = x i + x j , ≤ i < j ≤ k , and x ij(cid:96) = x i + x j + x (cid:96) , ≤ i < j < (cid:96) ≤ k , is locally D -optimal at β on X = [0 , ∞ ) k , when β = ... = β k = − , β ij = 0 , ≤ i < j ≤ k , and β ij(cid:96) = 0 , ≤ i < j < (cid:96) ≤ k . The proofs of Theorems 5.2 and 5.4 are based on symmetry properties which get lostif one or more of the interaction terms are non-zero. However, if only few componentsof x may be active (non-zero), then locally D -optimal designs may be obtained in thespirit of the proof of Lemma 4.4 for synergetic interaction effects. We demonstratethis in the setting of first-order interactions ρ ij = − β ij ≥ 0, when the design region X consists of the union of the two-dimensional faces of the orthant, i. e. when, at most,two components of x can be active. Theorem 5.5. Consider the k -dimensional Poisson regression model with first-orderinteractions on X = (cid:83) i The main purpose of the present paper is to characterize locally D -optimal designsexplicitly for the two-dimensional Poisson regression model with interaction on theunbounded design region of quadrant I when both main effects as well as the interactioneffect are negative, and to present a rigorous proof for their optimality. Obviously thedesigns specified in Theorem 4.7 remain optimal on design regions which are subsetsof quadrant I and cover the support points of the respective design. For example, ifthe design region is a rectangle, X = [0 , b ] × [0 , b ], then the design of Theorem 4.7 isoptimal as long as b ≥ / | β | and b ≥ / | β | for the two components. Furthermore,if the design region is shifted, X = [ a , ∞ ) × [ a , ∞ ) or a sufficiently large subregionof that, then also the locally D -optimal design is shifted accordingly and assigns equal ρ D Figure 4. Efficiency of ξ x for x = 2 (solid line), x = 1 (dashed) and x = 1 / / x = ( a , a ), x = ( a + 2 / | β | , a ), x = ( a , a + 2 / | β | ), and x =( a + t/ | β | , a + t/ | β | ) where t is defined as in Theorem 4.7.Although the locally D -optimal designs only differ in the location of the supportpoint on the diagonal, if the main effects are kept fixed, they are quite sensitive withrespect to the strength ρ of the synergy parameter in their performance. The quality oftheir performance can be measured in terms of the local D -efficiency which is definedas eff D ( ξ, β ) = (cid:16) det( M β ( ξ )) / det( M β ( ξ ∗ β )) (cid:17) (1 /p ) for a design ξ , where ξ ∗ β denotes thelocally D -optimal design at β . This efficiency can be interpreted as the asymptoticproportion of observations required for the locally D -optimal ξ ∗ β to obtain the sameprecision as for the competing design ξ of interest. For example, in the standardizedcase of Subsection 4.1 the design ξ x would be locally D -optimal when the strength ofsynergy would be (2 − x ) /x . Its local D -efficiency can be calculated as eff D ( ξ, β ) =( x/t ) exp((2 t + ρt − x − ρx ) / 4) when ρ is the true strength of synergy and t is thecorresponding optimal coordinate on the diagonal ( t = ( √ ρ − / (2 ρ ) for ρ > t = 2 for ρ = 0). For selected values of x the local D -efficiencies are depicted inFigure 4. The appealing product-type design ξ of Theorem 4.1 rapidly loses efficiencyif the strength ρ of synergy substantially increase. The triangular design ξ seems to berather robust over a wide range of strength parameters, while for smaller x the design ξ x loses efficiency when there is no synergy effect ( ρ = 0). Hence, it would be desirableto determine robust designs like maximin D -efficient or weighted (“Bayesian”) optimal -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 13 designs (see e. g. Atkinson et al. [1]), but this would go beyond the scope of the presentpaper.If in contrast to the situation of Theorems 4.6 and 4.7 there is an antagonisticinteraction effect which means that β is positive ( ρ < β = β = − 1) on a squaredesign region Lemma 4.2 may be extended as follows Lemma 6.1. Let b ≥ , ρ < , and t = ( √ ρ − / (2 ρ ) for ρ > − / .(a) If ρ > − / , t ≤ b and t exp( − t − ρt ) ≥ b exp( − b − ρb ) , then the design ξ t islocally D -optimal within the class Ξ on X = [0 , b ] .(b) If ρ ≤ − / or b < t or t exp( − t − ρt ) < b exp( − b − ρb ) , then the design ξ b is locally D -optimal within the class Ξ on X = [0 , b ] . Moreover, Lemma 4.4 does not depend on ρ and, if, additionally, b ≤ / | ρ | , then theargumentation in the proof of Lemma 4.3 can be adopted, where now the hyperboliccoordinate system is centered at (1 / | ρ | , / | ρ | ) and v is negative (cf. the proof below).However, the inequalities of Lemma 4.5 are no longer valid, in general. In particular,for ρ less than, but close to − / ξ t shows a local minimum at t rather than a maximum which disproves the optimality of ξ t within the class of all designs on X = [0 , b ] . In that case an additional fifth supportpoint is required on the diagonal, and also the weights have to be optimized. So, in thecase of an antagonistic interaction effect no general analytic solution can be expectedand the numerically obtained optimal designs may become difficult to be realized asexact designs.For even smaller design regions ( b < 2) design points on the adverse boundaries( x = b or x = b ) may occur in the optimal designs, but not in the interior besidesthe diagonal, both in the synergetic as well as in the antagonistic case.It seems more promising to extend the present results to negative binomial (Poisson-Gamma) regression which is a popular generalization of Poisson regression which cancope with overdispersion as in Rodr´ıguez-Torreblanca and Rodr´ıguez-D´ıaz [16] for one-dimensional regression or in Schmidt and Schwabe [18] for multidimensional regressionwithout interaction. This will be object of further investigation. Acknowledgements. We acknowledge that the statement of Lemma 4.2 was orig-inally derived by D¨orte Schnur in her thesis [12]. Part of this work was supportedby grants HO 1286/6, SCHW 531/15 and 314838170, GRK 2297 MathCoRe of theDeutsche Forschungsgemeinschaft DFG. Appendix A. Proofs Proof of Lemma 4.2. For a design ξ with settings x i and corresponding weights w i , i =0 , ..., n − 1, denote by F = ( f ( x ) , ..., f ( x n − )) (cid:62) the ( n × p )-dimensional essential design matrix and by the ( n × n )-dimensional diagonal matrices Λ = diag( λ ( x ) , ..., λ ( x n − ))and W = diag( w , ..., w n − ) the intensity and the weight matrix, respectively. Thenthe information matrix can be written as M ( ξ ) = F (cid:62) WΛF . For minimally supported designs the matrices F , W and Λ are quadratic ( p × p )and the determinant of the information matrix factorizes,det( M ( ξ )) = det( W ) det( Λ ) det( F ) . As W and Λ are diagonal and F = x x t t t is a triangular matrix for ξ ∈ Ξ , the determinants of these matrices are the productsof their entries on the diagonal. Hence,det( M ( ξ )) = w w w w x exp( − x ) x exp( − x ) t exp( − t − ρt )and the weights as well as the single settings can be optimized separately. As forall minimally supported designs the optimal weights are all equal to 1 /p which ishere 1 / 4. The contribution x j exp( − x j ) of the axial points is the same as in thecorresponding marginal one-dimensional Poisson regression model with β j = (0 , − (cid:62) and is optimized by x j = 2, j = 1 , 2. Finally, t exp( − t − ρt ) is maximized by t = ( √ ρ − / (2 ρ ) for ρ > t = 2 for ρ = 0. (cid:3) Proof of Lemma 4.3. The main idea behind this proof is to consider the deduced sen-sitivity function on contours of equal intensities. For this we reparametrize the designregion and use shifted and rescaled hyperbolic coordinates, x = ( v exp( u ) − /ρ and x = ( v exp( − u ) − /ρ, where v = (cid:112) (1 + ρx )(1 + ρx ) is the (shifted and scaled) hyperbolic distance and u = log( (cid:112) (1 + ρx ) / (1 + ρx )) is the (shifted and scaled) hyperbolic angle in the case ρ > 0. The design region X = [0 , ∞ ) is covered by v ≥ | u | ≤ log( v ).With these coordinates, fixing v > u which in-tersects the diagonal at u = 0. On each of these paths the intensity function λ ( x ) isconstant.Because ξ t is invariant under permutation of x and x , i. e. sign change of u , thededuced sensitivity function d ( x ; ξ t ) is symmetric in u , and we only have to considerthe non-negative branch, 0 ≤ u ≤ log( v ). Using cosh(2 u ) = 2 cosh ( u ) − 1, we observethat d ( x ; ξ t ) is a quadratic polynomial in cosh( u ) = (exp( u ) + exp( − u )) / ξ t , the information matrix and, hence, its inverse -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 15 x x x x Figure 5. Lines of constant intensity for ρ = 1 / ρ = 2 withoptimal design pointsis invariant with respect to simultaneous exchange of the second and third columnsand rows, respectively. The leading coefficient of the quadratic polynomial can bewritten as c ( v ) a (cid:62) M ( ξ ) − a , where a = (0 , − ρ, , (cid:62) and c ( v ) is a positive constantdepending on v . Since M ( ξ ) − is positive-definite, the leading coefficient is positive.Now, any quadratic polynomial with positive leading coefficient attains its maximumover an interval on the boundary. This continues to hold if we compose the polynomialwith a strictly monotonic function like cosh( u ) on [0 , log( v )]. Hence, on each path themaximum occurs at the diagonal ( u = 0, i. e. x = x ) or on the boundary ( | u | = log( v ),i. e. x = 0 or x = 0). As the paths cover the whole design region, the statement ofthe Lemma follows for ρ > ρ = 0 the contours of equal intensities degenerate to straight lines, where x + x is constant. Then the design region can be reparametrized by x = v + u and x = v − u , where v = ( x + x ) / ≥ (cid:96) ) distance fromthe origin and u = ( x − x ) / (cid:96) ) distance from the diagonal, | u | ≤ v .Using similar arguments as for the case ρ > v fixed is a symmetric polynomial in u ofdegree 4 with positive leading term. Hence, also in the case ρ = 0 the maximum of thesensitivity function can only be attained on the diagonal ( u = 0) or on the boundary( | u | = v ) which completes the proof. (cid:3) Proof of Lemma 4.4. With the notation in the Proof of Lemma 4.2 the deduced sensi-tivity function can be written as d ( x ; ξ t ) = f ( x ) (cid:62) F − Λ − ( F − ) (cid:62) f ( x ) /p − /λ ( x ) , (A.1)where F − = − / / − / / t − /t − / (2 t ) − / (2 t ) 1 /t , and similarly for the deduced sensitivity function d ( x ; ξ ∗− ) of the locally D -optimaldesign ξ ∗− in the one-dimensional marginal model when β = (0 , − (cid:62) . For settings x = ( x , 0) we then obtain d ( x ; ξ t ) = d ( x ; ξ ∗− ) by the relation between the quantitiesand matrices in both models and their special structure. As ξ ∗− is D -optimal inthe marginal model, its deduced sensitivity d is bounded by zero by the equivalencetheorem. Hence, we obtain d (( x , ξ t ) ≤ x ≥ d ((0 , x ); ξ t ) ≤ x ≥ (cid:3) Proof of Lemma 4.5. First note that the relation between ρ and t = ( √ ρ − / (2 ρ )is one-to-one such that conversely ρ = (2 − t ) /t . Then, with the transformation q = x/t , the inequality to show in Lemma 4.5 can be equivalently reformulated to(A.2) d ( x ; ξ t ) =( q − ( q ( t − − + 12 exp(2) t ( q − q + exp( t + 2) q − exp(2 tq + (2 − t ) q ) ≤ ≤ t ≤ q ≥ h ( q, t ) = 12 exp(2) t ( q − q + ( q − ( q ( t − − in t and q and a function h ( q, t ) = exp(2 qt + (2 − t ) q ) − exp( t + 2) q involving the exponential terms such that d ( x ; ξ t ) = h ( q, t ) − h ( q, t ) and to find asuitable separating function h ( q, t ) such that the inequalities h ( q, t ) ≤ h ( q, t ) and h ( q, t ) ≤ h ( q, t ) are easier to handle, where essentially methods for polynomials canbe used for the former inequality while in the latter properties of exponential functionscan be employed.This function h ( q, t ) will be defined piecewise in q by h ( q, t ) = (cid:26) q ≤ q exp( t + 2)( q − q for q > q , -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 17 q h i ( q , 0 ) q h i ( q , 0.5 ) q h i ( q , 1 ) q h i ( q , 2 ) Figure 6. The functions h ( q, t ) (blue), h ( q, t ) (orange) and h ( q, t )(green) for t = 0 , / , q = 3 / 5, and the proof will be performed case-by-case. Figure 6 visualizes thisapproach for selected values of t .We start with the case q ≤ q : The function h ( q, t ) is a quadratic polynomial in t with positive leading term. Therefore its maximum over 0 ≤ t ≤ t = 0 or t = 2 of the interval. Now, for t = 0 we obtain h ( q, 0) = (1 − q ) ≤ q ≤ q .For t = 2 h ( q, 2) = (1 − q ) (2 exp(2) q + (1 − q ) )is a polynomial of degree 4 in q with positive leading term, h (0 , 2) = 1 and h (1 , 2) = 0.The polynomial has a local maximum h ( q , 2) = exp(4)(3 exp(1) − (cid:112) exp(2) − (exp(2) + 2 + (cid:112) exp(4) − ≈ . at q = (exp(2) + 2 + (cid:112) exp(4) − / (4 exp(2) + 2) ≈ . . This implies that h ( q, t ) ≤ q ≤ q and all t ∈ [0 , h ( q, t ) as a function of t . Its partial derivative with respect to t is given by(A.3) ∂∂t h ( q, t ) = (2 − q ) q exp( q (2 t + (2 − t ) q )) − q exp( t + 2) . If we compare the exponential terms, we see that(A.4) q (2 t + (2 − t ) q ) − ( t + 2) = − t ( q − + 2( q − ≥ q − ≤ t ≤ q . Hence, the partial derivative (A.3) is non-negative if q − (2 − q ) exp(4( q − ≥ . (A.5)To see this we notice ∂∂q q − (2 − q ) exp(4( q − − q − (2 q − q + 3) exp(4( q − ≤ q ≤ q = 1, where it is equal to 1. Combining the above results we obtain that h ( q, t )attains its minimum at t = 0 for all q ≤ 1. It remains to show that h ( q, 0) =exp(2 q ) − exp(2) q ≥ q ≤ q . For this we check the derivative ∂∂q h ( q, 0) = 4 q (exp(2 q ) − exp(2) q )with respect to q which is positive for 0 < q < q and negative for q < q ≤ q .where q ≈ . h ( q, 0) a the end-points of the relevant interval, h (0 , 0) = 1 and h ( q , ≈ . h ( q, ≥ h ( q, t ) ≤ ≤ h ( q, t ) for all q ≤ q and all 0 ≤ t ≤ q > q the condition h ( q, t ) ≤ h ( q, t ) is equivalent to(A.6) ( q ( t − − ≤ q exp(2)(exp( t ) − t / . By the exponential series expansion, exp( t ) ≥ t + t / t ≥ 0, the right handside is bounded from below by ( t + 1) q exp(2), and for (A.6) to hold it is sufficient toshow (exp(2)( t + 1) − ( t − ) q + 2( t − q − ≥ . (A.7)The derivative of this expression with respect to q equals2(exp(2)( t + 1) − ( t − ) q + 2( t − ≥ exp(2)( t + 1) − t + 4 t − ≥ q ≥ / ≤ t ≤ 2. Hence, the expression in (A.7) itself is bounded frombelow by its value at q = 3 / 5, which is approximately 0.1001.This establishes h ( q, t ) ≤ h ( q, t ) for all q > q and all 0 ≤ t ≤ h ( q, t ) ≤ h ( q, t ) is equivalent to(1 − q ) q + q ≤ exp( q (2 t + (2 − t ) q ) − ( t + 2)) . -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 19 Again, by q (2 t + (2 − t ) q ) − ( t + 2) ≥ q − 1) for all 0 < t < 2, see (A.4), it is sufficientto show ((1 − q ) q + q ) exp(4(1 − q )) ≤ q ≥ 0. The derivative of this expression equals2(1 − q ) (1 − q ) q exp(4(1 − q )) . Hence, for q ≥ q = 1, where it isequal to 1. This implies h ( q, t ) ≤ h ( q, t ) for all q > q and all 0 ≤ t ≤ (cid:3) Proof of Theorem 5.2. Here we only give a sketch of the proof. As in the Proof ofLemma 4.3 we see that the paths of equal intensity constitute hyper-planes intersectingthe design region at equilateral simplices. On each straight line within these simplicesthe sensitivity function is a polynomial of degree four with positive leading term.Hence, following the idea of the proofs in Farrell et al. [5] we can conclude by symmetryconsiderations with respect to permutation of the entries in x we can conclude thatthe sensitivity function may attain a maximum in the interior of the design regiononly at the diagonal, where all entries in x are equal ( x = x = ... = x k = x ) and inthe relative interior of each j -dimensional face of the design region on the respectivediagonal, where all the j non-zero entries of x are equal to some x , 2 ≤ j ≤ k .Similar to the Proof of Lemma 4.4 on each face the deduced sensitivity functionis equal to its counterpart for the D -optimal design in the two-dimensional marginalmodel on that face and is, thus, bounded by 0.Finally, to derive the deduced sensitivity function on the diagonals we specify theessential design matrix F and its inverse F = k I k C ( k, S I C ( k, A and F − = A − − k I k C ( k, − S I C ( k, , where A = diag(1 , (cid:62) k , (cid:62) C ( k, ) is a diagonal matrix related to the product of thenon-zero coordinates of the design points, m is a m -dimensional vector with all entriesequal to 1, I m is the m × m identity matrix, C ( m, n ) denotes binomial coefficient (cid:0) mn (cid:1) ,and S is the incidence matrix of a balanced incomplete block design (BIBD) for k varieties and all C ( k, 2) blocks of size 2. Then by (A.1) the deduced sensitivityfunction equals( C ( j, q − jq + 1) + j exp(2)(( j − q − q ) + C ( j, 2) exp(4) q − exp(2 jq )on the diagonals of all j -dimensional faces, j < k , and the interior diagonal for j = k ,where q = x/ Mathematica and a powerseries expansion of order 5 for the term exp(2 kq ) the above expression can be seen not to exceed 0 for all q ≥ D -optimality in view of theequivalence theorem. (cid:3) Proof of Theorem 5.4. The proof goes along the lines of the Proof of Theorem 5.2. Theessential design matrix F and its inverse are specified as F = k I k C ( k, S I C ( k, C ( k, S S I C ( k, A , F − = A − − k I k C ( k, − S I C ( k, − C ( k, S − S I C ( k, , where now A = diag(1 , (cid:62) k , (cid:62) C ( k, , (cid:62) C ( k, ), S is the incidence matrix of a BIBD for k varieties and all C ( k, 3) blocks of size 3, and S is the (generalized) C ( k, × C ( k, C ( j, q − C ( j, q + jq − + j exp(2)(( C ( j, − j + 1) q − ( j − q + q ) + C ( j, 2) exp(4)(( j − q − q ) + C ( j, 3) exp(6) q − exp(2 jq )on the diagonals, where q = x/ 2. By using Mathematica and a power series expan-sion of order 9 for the term exp(2 kq ) the above expression can be seen not to exceed0 for all q ≥ D -optimality. (cid:3) References [1] A. C. Atkinson, A. Donev, and R. Tobias. Optimum Experimental Designs, With SAS . OUPOxford, 2007.[2] A. C. Atkinson, V. V. Fedorov, A. M. Herzberg, and R. Zhang. Elemental information matricesand optimal experimental design for generalized regression models. Journal of Statistical Planningand Inference , 144:81 – 91, 2014.[3] A. C. Cameron and P. K. Trivedi. Regression analysis of count data . Cambridge University Press,2013.[4] L. Fahrmeir and H. Kaufmann. Consistency and asymptotic normality of the maximum likelihoodestimator in generalized linear models. The Annals of Statistics , 13(1):342–368, 1985.[5] R. H. Farrell, J. Kiefer, and A. Walbran. Optimum multivariate designs. In Proceedings of theFifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics , pages113–138, Berkeley, Calif., 1967. University of California Press.[6] V. V. Fedorov. Theory of optimal experiments . Academic Press, New York–London, 1972.[7] I. Ford, B. Torsney, and C.-F. J. Wu. The use of a canonical form in the construction of lo-cally optimal designs for nonlinear problems. Journal of the Royal Statistical Society. Series B.Methodological , 54(2):569–583, 1992. -OPTIMAL DESIGNS FOR POISSON REGRESSION WITH SYNERGETIC INTERACTION 21 [8] U. Graßhoff, H. Holling, and R. Schwabe. D-optimal design for the rasch counts model withmultiple binary predictors. British Journal of Mathematical and Statistical Psychology , in press.[9] T. Kahle, K. Oelbermann, and R. Schwabe. Algebraic geometry of Poisson regression. Journalof Algebraic Statistics , 7:29–44, 2016.[10] J. Kiefer. General equivalence theory for optimum designs (approximate theory). The Annals ofStatistics , 2(5):849–879, 1974.[11] J. Kiefer and J. Wolfowitz. The equivalence of two extremum problems. Canadian Journal ofMathematics , 12:363–366, 1960.[12] D. K¨onner. Optimale Designs f¨ur Poisson-Regression . Fakult¨at f¨ur Mathematik, Otto-von-Guericke-Universit¨at Magdeburg, 2011. Unpublished Manuscript.[13] P. McCullagh and J.A. Nelder. Generalized Linear Models, Second Edition . Chapman &Hall/CRC, 1989.[14] M. Radloff and R. Schwabe. Invariance and equivariance in experimental design for nonlinearmodels. In J. Kunert, C. H. M¨uller, and A. C. Atkinson, editors, mODa 11 - Advances in Model-Oriented Design and Analysis , pages 217–224. Springer International Publishing, Cham, 2016.[15] G. Rasch. Probabilistic Models for Some Intelligence and Attainment Tests . Danmarks Paeda-gogiske Institut, 1960.[16] C. Rodr´ıguez-Torreblanca and J. M. Rodr´ıguez-D´ıaz. Locally D- and c-optimal designs for Poissonand negative binomial regression models. Metrika , 66(2):161–172, 2007.[17] K. G. Russell, D. C. Woods, S. M. Lewis, and J. A. Eccleston. D-optimal designs for Poissonregression models. Statistica Sinica , 19(2):721–730, 2009.[18] D. Schmidt and R. Schwabe. Optimal design for multiple regression with information driven bythe linear predictor. Statistica Sinica , 27(3):1371–1384, 2017.[19] R. Schwabe. Optimum designs for multi-factor models . Springer-Verlag, New York, 1996.[20] S. D. Silvey. Optimal design . Chapman & Hall, London-New York, 1980.[21] J. Vives, J.-M. Losilla, and M.-F. Rodrigo. Count data in psychological applied research. Psy-chological Reports , 98(3):821–835, 2006. PMID: 16933680.[22] Y. Wang, R. H. Myers, E. P. Smith, and K. Ye. D -optimal designs for Poisson regression models. Journal of Statistical Planning and Inference , 136(8):2831–2845, 2006.[23] Wolfram Research, Inc. Mathematica, Version 12.1. Champaign, IL, 2020. Department of Biometry, Epidemiology and Information Processing, University ofVeterinary Medicine Hannover, B¨unteweg 2, 30559 Hannover, Germany E-mail address : [email protected] School of Business and Economics, Humboldt-University Berlin, Unter den Lin-den 6, 10099 Berlin, Germany E-mail address : [email protected] MPI MiS Leipzig, Inselstraße 22, 04103 Leipzig, Germany E-mail address : [email protected] URL : https://sites.google.com/view/roettger Institute for Mathematical Stochastics, Otto-von-Guericke-University Magdeburg,Universit¨atsplatz 2, 39106 Magdeburg, Germany E-mail address : [email protected] URL ::