Statistical Approaches for Modelling Cancer Bioassays
Christos P. Kitsos, Nikolaos K. Tavoularis, Thomas L. Toulias, George Lolas
SStatistical Approaches for Modelling Cancer Bioassays
Christos P. Kitsos, Nikolaos K. Tavoularis , Thomas L. Toulias, George Lolas Technological Educational Institute of Athens Department of Mathematics
Abstract
This paper discusses the possible ways to analyse the data, adopting a matrix notation, so often used in Bioassays. The paper also reviews the Multistage Models (MM). The MM class of models is applied for extrapolation, to the region of Low-Dose. The effect of covariates in experimental carcinogenesis is introduced and the relative efficiency is evaluated. Certainly the discussed case was refereed to uncorrelated covariates and therefore an open problem might be the multicollinear predictive covariates.Various nonlinear models are discussed, giving more emphasis on the Michaelis-Menten and the Fisher’s information for them is discussed.
Keywords:
Multistage Models, Covariates, Michalis-Menten, Non-linear models, Fisher’s Information.
1. Introduction
This paper reviews the Multistage Models (MM) and provides some nonlinear models adopting to Cancer Bioassays. The MM class of models is applied for extrapolation, to the region of Low-Dose, Kitsos (1997) while the general prediction problem has been discussed under a different approach by Kitsos (1993), with some applications to carcinogenesis. The effect of covariate omissions in experimental carcinogenesis is introduced and the relative efficiency is evaluated. In Appendix the Michalis-Menten model and some extensions on it are discussed. A number of graphs of various curves adopting in bioassays are presented in Appendix B, while the Fisher’s Information for some models is evaluated in Section 5.
2. Descriptive Summary Tables and Analysis
A common way of presenting summary data, in all the of statistical problems, and therefore in Cancer Epidemiological studies, is through the use of a data matrix, to Died in a car accident on March 17, 2009. Ι CCRA3 Proceedings create a Table, to group the data, in Kendal’s suggestion. However, a formal construction of a Table, must be defined in such a way so that to model the logical design underlying of the appropriate statistical data set of a epidemiological study or Bioassay. A summary statistic can be eventually viewed as a mathematical function. The so called “dependent” variable of the function is a numerical variable, which is referred to as the summary variable (elsewhere called summary attribute, and analysis variable, or response variable). A summary variable is defined by a name and a type (e.g., real, integer, nonnegative real, nonnegative integer). The “independent” variables are nominal or range variables, which are referred to as category attributes (elsewhere called categorical variables and classification variables, or input variables). In principle a category attribute in Biological studies and especial in Cancer Bioassays is defined by a name and a domain. Usually the domain consists from a few values (usually called “codes”), that is the ordinal number of the domain is a small positive integer number. For example, the categorical variable sex is a two valued category attribute, which is often used to all epidemiological studies, and “acts as covariate” to some of these. Same treatment needs the categorical variable Cancer (“yes” or “no”), while the value “yes” needs further investigation (through another categorical value) concerning the “type” of Cancer. So if we consider the set C ={ C ,....., C r } of all category attributes in a given population Ω we can consider that there is assigning a mapping (or set function) μ j from the domain of C j to the power set of Ω , j =1,….., r . Given a category value c j , then the values of μ j ( c j ) denotes the set of units of observation matching the condition C j = c j , and is called the category associated with the category value c j . Two main assumptions has to be considered, even though there are not stated in Cancer Bioassays problems: • Assumption 1.
Partitioning attributes. Each unit of observation in Ω falls into at most one elementary category. • Assumption 2 . Data additivity. The summary variable X provides additive total information. CCRA3 Proceedings 3
Assumption 1 simply declares that elementary categories are mutually disjoint and cover Ω . In other words, the set of category attributes C acts as a classification scheme for the units of observation in Ω . This assumption is usually testing by a n m × contingency table and the appropriate X test. The 2x2 is the most well known and is sometimes linked with the logit model, Kitsos (2007). Assumption 2 declares that the total information for the variable X is the sum of the information provided by each observation of X . In principle, by “information” the defined by Fisher information is considered. The augmentation of the data provides “more information” additively, in the sense that the total information is the sum of the information added by the n observations. This certainly needs the independence assumption which is always there, somehow considered by the experimentalist, although not always true. But even is such cases the likelihood function can be evaluated for all the observations, by “pretending” that Ford and Silvey (1980), are independed, Kitsos (1992). It is widely accepted that the first stage statistical analysis, especially to Biological data analysis is based on a compact “table form” of the data. Considering a statistical analysis, which contains multiple summary tables over the same population, the following definition is given. Definition 2.1 . Two or more summary tables ,..., s T T are homogeneous, if they contain data on the same summary variable X for the same population Ω , but use different classification schemes . The collection of the T i ’s is referred to as a polyptych (diptych for s = , triptych for 3 s = ) of summary tables on X in Ω . A problem raised by the management/manipulation of a statistical analysis for the collected data, is when containing a polyptych of summary tables. In such a case the data integration, which consists in viewing the tables of a polyptych as “projections” of a higher-dimensional summary table, called the universal table. As an example it is rather difficult the EU data set from Cancer projected to country XX to provide the exact summary statistics for this particular country. For a universal-scheme interface to be practical, the polyptych under examination has to be consistent. That means that there exist universal tables, that is, a summary table with classification scheme U , whose “projection” on the R i’s return the tables of the polyptych. This needs a certain organization of the data sets, which is not so happen Ι CCRA3 Proceedings in practice, although various research centers on Cancer attempts so. In order to state the notions of a universal table and consistency precisely, we introduce two notions: that of a marginal of a summary table and that of a universal category relation.
Definition 2.2.
A universal category relation is a category relation over the universal classification scheme U such that its projections onto the R i ’s restore the category relations R i ’s of the tables of the polyptych. An interesting case occur whenever two or more category attributes refer to a common, but not identical categorization criterion, and therefore is more or less tightly connected. This is a weak point of the statistical analysis of the Bioassay. That is we would recommend a pilot study before any statistical data analysis. Based on this knowledge, it is asked to take the proper subset U of the natural join U *, as a universal category relation, obtained by removing from U * the part (tuple) that can be inconsistent with the semantic constrains known to the statistical and medical analysts. Notice that a tuple in U may refer to an empty category (accidentally empty category); such a tuple is called a dummy tuple. This distinction between structurally empty categories and accidentally empty categories which often occurs in the analysis of statistical data has to be reminded. Definition 2.3.
A universal table for a polyptych of summary tables is a summary table with classification scheme U and category relation U such that its marginal over the C i ’s coincide with the summary tables of the polyptych Definition 2.4.
A polyptych is consistent if it admits a universal table. Due to assumptions 1 and 2, we eventually represent, in any Bioassay the universal tables as solutions of a linear constraint system.
A typical case in which a polyptych of summary table turns out to be consistent occurs when these come from a single data source. When summary tables are taken from distinct data sources, it is very improbable that the requirement of consistency will be fulfilled-inconsistency. These points might be not so widely considered by an experimentalist, but the statistical techniques are strength with such considerations.
CCRA3 Proceedings 5
3. The covariates as extra categories
In most bioassays and at the experimental carcinogenesis as well, the target is to compare two different therapies/factors, so according to Section 1 it is { , }
C C C = , or to evaluate the prognostic factors. But in principle, the population under study Ω , is rather heterogeneous with respect to prognosis, it is asked to adjust the covariate effect describing the above mentioned heterogeneity, Cox and Snell (1989). Let x be the factor of interest and x the covariate and β , β be the corresponding regression parameters. Fitting the full model with link function Link, McCullagh and Nedler (1989),
11 2 0 1 1 2 2 ( | , ) Link ( )
E Y x x x x β β β − = + + , (3.1) while the restricted model with estimate *1 β is ( | ) Link ( ) E Y x x β β − = + . (3.2) Notice that the models (3.1) and (3.2) although nonlinear, are intrinsic linear. The corresponding variances to models (3.1) and (3.2) are: var( | , ) , var( | )
Y Y
Y x x Y x σ σ= = (3.3)
Therefore the (relative) efficiency of ˆ β to *1 ˆ β can be defined as ( )
1ˆ ˆeff , 1
YY Y σ ρβ β σ ρ−= = − , (3.4) with ρ = Corr( x , x ) and ρ Υ is the effect of x on Y . From (3.4) is easy to see that ( )
1, if | | | |ˆ ˆeff , 1, if1, if
Y YY ρ ρβ β ρ ρρ ρ=⎧⎪= < >⎨⎪> <⎩ . (3.5) In principle, interest is concentrated on a randomized treatment effect, i.e. ρ =0. That is emphasis is given in adjustment as eventually eff( *1 1 ˆ ˆ, β β ) ≥
1. The question if β = , which is actually a statistical null hypothesis, versus β ≠ is crucial on misspecification by omitting or including x . Indeed: if x is adjusted for, the assumed correct model (3.1) is fitted provided β ≠ . If β = this leads to overspecification of the model. If x is not included, the (3.2) model is correct if β = while if β ≠ the model (3.2) is underspecified. Ι CCRA3 Proceedings
If the link function is the logistic function, which remains invariant under certain transformation, Kitsos (2007a), the models (3.1) and (3.2) are reduced to log 1
P x xP β β β= + +− , (3.6) * *1.1 0 1 11.1 log 1 P xP β β= +− , (3.7) with [ 1| , ] P P Y x x = = and [ 1| ]
P P Y x = = . In case that *1 1 β β= the plane (3.6) and the line (3.7) are parallel (Figure 1). The statistical implementation of this is equivalent that the RR for x estimated either from (3.6) and (3.7) are equal as *1 1 RR e e β β = = , otherwise the evaluated Relative Risk are non equal and is a matter of investigation. Notice that in both cases (3.6) and (3.7 are linear). If a second order model was consider, at least fir (3.7) the test for the curvature is really a necessity. Figure 1. (3.6) plane and (3.7) line.
Notice that for the logistic model the curvature 1/ ( )
Link ⋅ , is a convex leading function, to a downward bias of *1 β i.e. *1 1 ˆlim | | | | β β< , therefore the bias tends to zero only when β = . Relative Risk to be equal to unit, i.e. RR 1 = . There is a similarity of the logistic and the Cox model, Prentice and Kalbfleisch (1979), Schumacher et al. (1987), Legakos and Schoenfeld (1984), while the behavior of the variances of the adjusted and unadjusted estimators need, we think, more x xy (3.7) (3.6) CCRA3 Proceedings 7 investigation in this particular problem. We emphasize that the *0 var( ) β ) and *1 var( ) β in (3.7) can be reduced if a D-optimal or D s -optimal design approach is adopted, Ford et al (1989), Kitsos (2007a).
4. Low-Dose models
There are different statistical models to describe a process by which a normal cell becomes malignant through, at least one, transformation. When the malignancy is referred to a tumor we are referred to cancer. Then, the interest is focused on the “growth rate” of affected tissues as malignant tumors are capable of floating away and forming new malignant growths in other sites. Humans are certainly exposure to carcinogens, which is accepted for two main reasons: • there are no reliable estimates for “safe doses” and • the epidemiological methods are insensitive to small increase in cancer. In quantitative toxicology the following definitions are adopted. Definition 4.1.
By “dose” we define the amount of chemical or energy in a radiological situation administered to or reviewed by exposed subject.
Definition 4.2.
By “effect” we define an action as a result of a stimulus received through a receptor.
Definition 4.3.
By “response” we mean any detectable change (and is assumed approached through a statistical model). A number of extra definitions on the introduced term “dose” are considered. We briefly referred to them. A “safe dose” it is assumed that will not increase the current cancer incidence rate by more than an “acceptable” low risk level, see the early work the Hartley and Sielken (1977). The “virtually safe dose” (VSD) was based on confidence intervals and it was rather mechanistic (Grump et al. (1977), Armitage (1982)). The upper confidence limit for the proportion of tumours was calculated and the dose-response curve was extrapolated towards zero. Certainly one of the problems in the cancer risk assessment is the extrapolation from the experimental results to human Zapponi et al (1989). There are also different scale parameters and are discussed in comparisons of LD10 ≡ L .1 , Brown and Howel (1983). These parameters are based on different species and a given set of chemicals. That is Ι CCRA3 Proceedings still interest is focused on low dose, which is equivalent to calculate the percentile L p , p ∈ (0,1) for “small” p , 0 < p ≤ p L , as it has no meaning to perform experiments below an unknown level of dose. The crucial issue when fitting a model is prediction. The effect of covariates has to be considered, through the, assumed correct, model. The low-dose effects in a risk assessment have also to be within the study. Different nonlinear models have been developed and applied under the name multistage models (MM). The class of MM has been mainly applied for the analysis of a large number of epidemiological data, Armitage (1985).The crucial issue when fitting a model from MM class is prediction. That is extrapolation downwards in the neighbourhood of zero, while prediction in statistics is rather related with a forward extrapolation. This is an interesting exception in the statistical General Model theory, where prediction is relating, mainly, in a forward extrapolation and in a lesser extend to the within the domain values. There is a rather empirical approach to these models by toxicologists and environmentalists, without a reference to an explicit dose-response model, as tumour incidence data is titled to a prescribed dose-response relationship. These experimental approaches are still valuable and might provie interest results, Angelopoulou et al. (2008a, 2008b). The dose-response curve, ( ) F ⋅ say, is a result of a binary response problem, recall Section 1, i.e. there are only two categories C and C , see Kitsos (1998) for details. From the statistical point of view F ( x ) is the cumulative distribution function, for the underlying probability model, describing the phenomenon. The notation x is referred to the dose level. However, due to a number of factors, including the temporal variability of animal population characteristics and the difficulty in identifying a specific animal breed, this dose-response curve is rather problematic to biologists and toxicologists. Moreover, F(x) is rather an assumed approximation, than a known deterministic mechanism for the phenomenon which describes. Therefore it needs an estimation and we would strongly recommend a Kolmogorov-Smirnov test. CCRA3 Proceedings 9
When it is assumed that cancer is the result of a single event (or "hit") in a single cell, the one parameter exponential model ( ) 1 exp( ), 0
F x x θ θ= − − > , (4.1) it is, known as one-hit model . When a fixed number, say k , of (identical) "hits" occur in a tissue the multi-hit model is assumed to describe the phenomenon and the corresponding F ( x ) is approximated by the assumed correct model
1( ) ( 1)! x k u
F x u e duk − − = − ∫ . (4.2) When a sup linear relationship is assumed the one hit model is transformed to Weibull model with shape parameter, s say ( ) ( ) 1 . s x F x e θ− = − The maximum likelihood estimation (MLE) for the parameters ( , ) s θ , both assumed unknown, of the Weibull distribution is ( , ) L L s θ = and the log-likelihood ( , ) l l s θ = . Recall that the Weibull does not belong to the exponential family of models. The first derivatives are needed to evaluate the corresponding score functions , s si l sdU s x θ θθ θ − ∂= = −∂ ∑ (4.3) . ssi xt θ ⎛ ⎞= ⎜ ⎟⎜ ⎟⎝ ⎠ ∑ (4.4) When s is given, the MLE θ * of θ can be found explicitly by solving U θ = as The second derivatives of the log-likelihood l are ( 1) , s s s si i l sd sdI s t s s t θθ θ θθ θ θ θ − − ∂ ∂ ⎛ ⎞= = − = − − −⎜ ⎟∂ ∂ ⎝ ⎠ ∑ ∑ ( ) s s s ss i i i l dI s t s t ts θ θ θ θθ θ − − ∂= = − + −∂ ∂ ∑ ∑ Ι CCRA3 Proceedings ( ) log . s sss i i l dI t ts s θ θ∂= = − − ⎡ ⎤⎣ ⎦∂ ∑ Therefore Fisher’s information matrix can be evaluated, as ij I I = , i , j = 1,2 with
11 22 12 21 , , ss s
I I I I I I I θθ θ = = = = . When it is assumed that the susceptible cell can be transformed through k distinct stages in order to be a malignant one the multistage model of Armitage-Doll (1954) described the phenomenon. The main assumption was that the transformation rate from each stage to the next on is linear. Eventually the cdf of developing cancer from exposure to a dose x, within a fixed time period, is given by ( ) 1 exp[ ( ... )] kk F x x x θ θ ϑ= − − + + + , (4.5) where , 0,1,..., i i k θ = are defined through the coefficients of the linear transfor-mations assumed between stages, Grumb et al. (1977), i.e. i i (t) ϑ ϑ= . The most usual model are the multistage linear model and the multistage model. Notice that model (4.5) developed on a completely different biological insight and not as general mathematical form of the previous models. The Logit and Probit models, Mc Cullagh and Nedler (1993), known as tolerance distribution models in cancer risk assessments, are also useful to toxicology and are included to MM class. In pharmacokinetics for cancer risk assessment the Michaelis-Menten metabolic process is usually considered when it is assumed to lead to a concentration of the active metabolite in the target tissue considering as function of x, see Appendix 1 for more details. The MM class is the earlier appeared, Armitage and Doll (1954), and is based on the assumption that a single normal cell may become fully malignant when a sequence of say k , irreversible heritable mutation-like changes assumed. Now, under the assumption that the intermediate cells are subject to a stochastic birth-death process for cell proliferation and cell differentiation, when 2 k = the Biologically Based Models (BBM) was created by Moolgavkar and his associates, see Moolgavkar and Venson (1979) and developed by a series of papers by Luebeck and Moolgavkar (1989, 1991, 1992). CCRA3 Proceedings 11
The two families of models, the MM and BBM, are based on different hazard functions. Indeed if the mutation rates are very small and independent of time the hazard function of cancer for the Armitage–Doll model is ( ) ( ) , 0 k t c t t c λ − = − > , (4.6) where k is the number of stages, t is a fixed and positive number for the growth of tumour. When interest is focused to identify etiological agents of cancer and develop the appropriate statistics for risk assessment of environmental agents then the most appropriate hazard function is the one defined by Cox (1972) as ( ) ( ) ( , ) t t S W λ λ β= , (4.7) with λ ( t ) > 0 known as baseline hazard function, ( , ) S W β the risk function which relates the environmental factor W , i.e. the covariates and the vector of unknown parameters β . This model is known as a proportional-hazard model. An interesting application of the proportional-hazard models has been discussed by Pargament et al. (2001). He worked on religious struggle as a predictor of mortality among sick patients. His data set was based on 576 Baptists and Methodists, age over 55, hospitalized in a particular hospital, and they were follow-up for two years with 176 deaths and 152 subjects were lost to follow-up. Adopting o proportional-hazard model an interesting, rather social than medical analysis is presented. A well known technique for cancer is screening. If screening speeds-up detection that will eventually increase the time (known as “lead time”) from detection to death. The lead time for the breast cancer screening was discussed by Patz et al. (2000), Welch at al. (2007). Proportional-hazard models are mainly applied in clinical trials. In principle, in a clinical trial we need to know a curve for the treatment group and another one for the control group, due to Kaplan-Meier estimator. If the treatment is not depended on failure time the corresponding survival curve will fall off slowly, while if the treatment has no effect the two curves will statistically coincide. The above discussion only tries to encourage that the mathematical formulation, does not solve the problem, it describes it. An essential analysis is needed as the conclusions are rather sensitive, concerning human lives. Ι CCRA3 Proceedings
The MMB are based on a Poisson process for stage to stage. For example Moolgarkar and Venzon (1979) assumed a Poisson process with birth rate at i cell ( ) i b t ib = and death rate at i cell ( ) i d t id = , i.e. a homogenous birth-rate process.
5. Nonlinear Models
In this section we briefly discuss typical non-linear models which might provide response curves with no significant difference between them are, with the same parameter vector. But when the parameter vector is based on different values the same non-linear might appear close to a line. That is why we review these models, we evaluated their graphs in Appendix B were these graphs, within an MS Excel environment, can easily provide the curve by changing the initial guesses for the estimators. For a number of these models the Fisher’s information matrix is evaluated as ( ) ( ) ( ) T i f f θ σ = = ∇ ∇ for an observation. In other cases, the partial derivatives are evaluated so that to form ( ) i θ , see Kitsos (2007b). MODEL
NAME ( , ) exp( ) uG f u e θ θ θ θ= : Gompetz model. ( , ) exp( ) J f u u θ θ θ θ θ= + : Janoscheck model. ( , ) (1 ) uL f u e θ θ θ θ= + : Logistic model.
30 1 ( , ) [ ] uB f u e θ θ θ θ= + : Bertalanffy model. ( ) tanh 0 1 2 3 ( , ) tanh ( ) f u u θ θ θ θ θ= + − : tahn-model. ( )
2( , ) 1 arctan ( )2 f u u θθ θ θπ⎡ ⎤= + −⎢ ⎥⎣ ⎦ : 3-tanh-model. ( )
2( , ) arctan ( ) f u u θ θ θ θ θπ= + − : 4-tanh-model. lnexp 0 0 ( , ) u f u u e θ θ θ θ ϑ= = : Exponential time-power model. ln2exp 0 1 ( , ) u f u e θ θ θ θ − = − : Reparametrized Exponential time-power model. ( ) ( , ) ( ) exp ( ) W f u u θ θ θ θ θ θ= − − − : Reconstructed Weibull model. ( )0 ( , ) 1 g uGL f u e θ θ θ θ + ⎡ ⎤= +⎣ ⎦ : Generalized Logistic model CCRA3 Proceedings 13
We evaluate for various models parameters Fisher’s Information Matrix for one observation is ( ) ( ) ( ) T i f f θ σ − = ∇ ∇ . Indeed: a. For the Gompetz model, uui u u θ θθ θ θσ ⎡ ⎤= ⎢ ⎥⎣ ⎦ . b. For the reparametrized Exponential time-power model,
12 22 12 221 1 1
1( ) ( ) u uu u uu u u e uei e e ueue ue u e θ θθ θ θθ θ θ θθ σ θθ θ θ −− −− − − − ⎡ ⎤−⎢ ⎥= − −⎢ ⎥⎢ ⎥−⎣ ⎦ . c. For the reconstructed Weibull model, the partial derivatives are: ( )0 ( , ) 1 uW f u e θ θ θθ − ∂ = −∂ , ( )1 ( , ) uW f u e θ θ θθ − ∂ = −∂ ,
1( )0 1 2 22 ( , ) ( ) ( >0) uW f u e u θ θ θθ θ θ θ θ θ θ −− ∂ = −∂ , ( )0 1 2 2 23 ( , ) ( ) ( ) ln( ) ( 0) uW f u e u u u θ θθ θ θ θ θ θ θθ − ∂ = − >∂ . Therefore, [ ] ( ) kl k l i i θ σ − ∈ = (cid:96) , where
2( )11 u i e θ θ− ⎡ ⎤= −⎣ ⎦ ,
2( )22 u i e θ θ− = ,
2( )0 133 22 ( ) u i e u θ θθ θ θ θθ − ⎡ ⎤−= ⎢ ⎥⎣ ⎦ ,
22( )2 244 0 1 2 2 ( ) (ln ) ( ) u i u e u θ θθ θ θ θ θ − = − , ( ) ( )12 21 u u i i e e θ θ θ θ− − ⎡ ⎤= = − −⎣ ⎦ , ( ) ( )0 113 31 22 u u i i e e u θ θ θθ θ θ θ θθ − − −⎡ ⎤= = −⎣ ⎦ , ( ) ( )14 41 0 1 2 2 u u i i e u e u θ θ θθ θ θ θ θ θ − − ⎡ ⎤= = − −⎣ ⎦ , Ι CCRA3 Proceedings ( ) ( )0 123 32 22 ( ) u u i i e e u θ θ θθ θ θ θ θθ − − −= = − , ( ) ( )24 42 0 1 2 2 ( ) ln( ) ( ) u u i i e u e u θ θ θθ θ θ θ θ θ − − = = − − , ( ) ln( ) ( ) u i i u e u θ θθ θ θ θ θθ − −= = . d. For the Generalized Logistic model, we have the following cases: Case (i) : ( ) g u u u u θ θ θ = + + , and therefore ( , ) 1 GL u u u f u e θ θ θ θ θθ + + + = + . For this case the partial derivatives are:
1( , ) 1
GL u u u f u e θ θ θ θ θθ + + + ∂ =∂ + , ( ) ( , ) 1 u u uGL u u u f eu e θ θ θ θθ θ θ θ θθθ + + ++ + + ∂ = −∂ + , ( ) ( , ) 1 u u uGL u u u f ueu e θ θ θ θθ θ θ θ θθθ + + ++ + + ∂ = −∂ + , ( )
20 23 ( , ) 1 u u uGL u u u f u eu e θ θ θ θθ θ θ θ θθθ + + ++ + + ∂ = −∂ + , ( )
30 24 ( , ) 1 u u uGL u u u f u eu e θ θ θ θθ θ θ θ θθθ + + ++ + + ∂ = −∂ + . Case (ii) :
1( ) ug u θ θ −= , and therefore
31 2 3 ( , ) 1
GL u f u e θ θ θ θ θθ −+ = + . For this case the partial derivatives are: CCRA3 Proceedings 15
31 2 3
1( , ) 1
GL u f u e θ θ θ θ θθ −+ ∂ =∂ + ,
31 2 331 2 3
10 211 ( , ) 1 uGL u f eu e θθ θ θ θθ θ θ θθθ −+ −+ ∂ = −∂ ⎛ ⎞⎜ ⎟+⎜ ⎟⎝ ⎠ ,
31 23 331 2 3
10 212 3 ( 1)( , ) 1 uGL u f u eu e θθ θ θθ θθ θ θ θθθ θ −+ −+ ∂ −= −∂ ⎛ ⎞⎜ ⎟+⎜ ⎟⎝ ⎠ , [ ]
31 23 3 331 2 3
110 2 3 213 23 ( 1) (ln ) 1( , ) 1 uGL u u u u ef u e θθ θ θθ θ θθ θ θ θ θ θθθ θ −+− −+ − −∂ = −∂ ⎛ ⎞⎜ ⎟+⎜ ⎟⎝ ⎠ . Then we can evaluate the Fisher’s Information matrix ( ) i θ . The information matrix of this model does not depend on the linear added term. That is, in principle, for the model ( , ) g u θ and ( , ) ( , ) f u g u θ θ θ= + the estimated Fisher’s information matrix needs prior information of the parameters involved in ( , ) g u θ , not for θ . The estimation of σ , s is also needed so that to have an estimate ˆˆ( ) ( ) ( ) | T i f f s θ θ θ − = = ∇ ∇ . Conclusions
There is a theoretical background to cover the performance of any Bioassay, so for a cancer one. Not only to impose the appropriate formulation to group the data set in Tables as far as descriptive statistics concern. Interest was also on how to use the appropriate non-linear usually model. This model can be either a binary response one, or any other non-linear model in the continuous case. We provided a critical view of this analysis and, we believe, we offer this appropriate background to experimentalists. So the link between statistical/mathematical model and a cancer bioassay to be better bridged. Ι CCRA3 Proceedings
References
Angelopoulou, R., Bala, M., Lavranos, G., Chalikias,M., Kitsos, C . , Baka, S., Kittas, C. (2008a). Evaluation of immunohistochemical markers of germ cells’ proliferation in the developing rat testis: A comparative study. Tissue and Cell , 40(1), pg 43-50. Angelopoulou, R., Bala, M., Lavranos, G., Chalikias,M., Kitsos, C . , Baka, S., Kittas, C. (2008a). Sertoli cell proliferation in the fatal and neofatal rat testis: A continous phenomenon? Acta Histochemica . Armitage, P. (1982). The Assessment of Low Dose Carcinogenicity.
Biometrics,
28 (sup.), 119-129. Armitage, P. (1985). Multistage Models of Carcinogenesis.
Environmental Health Perspectives,
63, 195-201. Armitage, P., Doll, R. (1954). The Age Distribution of Cancer and a Multi-Stage Theory of Carcinogenesis.
Brit. J. Cancer,
8, 1-12. Baker, I. (1936). Analytic Studies in plant respiration.
Proc. R. Soc. B , 119, pg. 453-473. Begg, M.D., Legakos, S. (1993). Loss in efficiency caused by omitting covariates and misspecifying exposure in logistic regression models.
JASA,
88, 166-170. Benzecri, J. P. (1980).
L’ analyse des donnees . Dunod. Bishop, Y. M. M., Fienberg, S.E., and Holland, P. W. (1975).
Discrete multivariate analysis: theory and practice . MIT press, Cambridge, mass., . Bieler, G.S., Williams, R.L. (1993). Ratio Estimates, the Delta Method and Quantal Response Tests for Increased Carcinogenity.
Biometrics,
49, 793-801. Bowman, D., Chen, J.J., George, E.O. (1995). Estimating Variance Function in Developmental Toxicity Studies.
Biometrics , 51, 1523-1528. Bucchi, A.R., Gabrielle, M., Lupi, C. and Zapponi, G.A. (1988). Dose-response Relationships in Rodents of Promoter Carcinogens: A Tentative Intrepretation of Some Downward Trends.
Biomedical and Environmental Sciences,
1, 184-193. Chen, C., Gibb, H., Moini, A. (1991). Models for Analyzing Data in Initiation-Promotion Studies.
Environmental Health Respectives,
90, 287-292. Cox, D.R. (1972). Regression Models and Life Tables. (with discussion).
JRSS, B , 74, 187-220. Cox, D.R., Snell, E. J. (1989).
Analysis of binary data . Chapman and Hall.
CCRA3 Proceedings 17
Cogliano, V. J, C. Kitsos (2001). Modeling I.,
Folia Histochemica Et Cytobiologica , 39, pg11. Consonni, G., Marin, J.M (2007). Mean-field variational approximate Bayesian inference for latent variable models.
CSDA , 52, 790-798. Crump, K.S., Guess, H.A. and Deal, K.L. (1977). Confidence Intervals and Tests of Hypotheses Concerning Dose Response Relations Inferred from Animal Carcinogenicity Data.
Biometrics,
33, 437-451. Dean, A.C.R., Hinshelwood, C. (1960). Growth, Function and Regulation in Bacterial Cells . University Press . Oxford. Dixon, M., Webb, E.C. (1964).
Enzymes. University Press . Cambridge. Doll, R. (1971). The Age Distribution of Cancer: Implications for Models of Carcinogenesis.
JRSS,
A134, 133-166. Doll, R. (1978). An Epidemiological Perspective on the Biology of Cancer.
Cancer Res.
38, 3573-3583. Ford, I., Kitsos, C.P., Titterington, D.M. (1989). Recent Advances in Nonlinear Experimental Design.
Technometrics , Vol. 31, pg. 49-60. Ford, I., Silvey, S.D. (1980). A sequentially constructed design for estimating a non-linear parameter function.
Biometrika , 67, 381-388. Goldberg, A. V., and Tarjan, R. E. (1998). A new approach to the maximum flow problem.
J. ACM 3
Statistical Science , Vol. 1, 297-318. Hartley, H.O., Sielken, R.L. (1977). Estimation of ‘Safe Dose’ in Carcinogenic Hearron, J.Z. (1952). Rate behaviour of metabolic systems.
Physiol. Rev.
32, pg 499-523. Henschke, C.I., Yankelevitz, D.F., Libby, D.M. et al. (2006). The International Early Lung Cancer Action Program Investigators. Survival of Patients with Stage I Lung Cancer Detected on CT Sreening.
New England Journal of Medicine.
Kafadar, K., Tukey, J.W. (1993). U.S. Cancer Death Rates: A Simple Adjustment for Urbanization.
International Statistical Review,
61, 257-281. Kitsos, C.P. (1992). Adopting Sequential Procedures for Biological Experiments. In
Model Oriented Data Analysis,
Muller, Wynn, Zhigljavsky (Eds), p. 3-9, Physica-Verlag. Ι CCRA3 Proceedings
Kitsos, C.P. (1993). An Algorithm for Constructing the Best Predictive Model. In Advances in Statistical Software, F. Faulbau (Ed.), p. 535-539,
ZUMA Publication . Kitsos, C.P. (1995). Sequential Assays for Experimental Carginogenesis.
ISI 50th Session,
Book 1, pg 625-626, Beijing, 21-29 Aug. 1995. Kitsos, C.P. (1997). Optimal Designs for Percentiles at Multistage Models in Carcinogenesis.
Biometrical Journal , 41, No 1, pg. 33-43. Kitsos, C. P. (1998). The Role of Covariates in Experimental Carcinogenesis.
Biometrical Letters , Vol. 35, No 2, pg.95-106. Kitsos, C. P. (2007a). On The Logit Methods for Ca Problems (Design and Fit).
Communications in Dependability and Quality Management, 10(2): 88-95.
Kitsos, C. P. (2007b). Applying Nonlinear Models in Cancer. In: 56 th session of the ISI, Lisbon, 22-29 Aug 2007, Portugal., e-proceedings volume. Kitsos, C.P., Tavoularis, K.N. (2009). Non-Linear Models for Biological Cancer Bioassays. In. 2 nd Greek Statistical Conference, Chania, Crete, 22-26 April 2009.
Legakos, S.W., Schoenfeld, D.A. (1984). Properties of Proportional – Hazard Score Tests under Misspecified Regression Models.
Biometrics , 40, 1037-1048. Luebeck, G.E., Moolgarkar, S.H. (1989). Two-Event Model for Carcinogenesis: Biological, Mathematical and Statistical Considerations.
Risk Analysis,
10, 323-341. Luebeck, G.E., Moolgarkar, S.H. (1991). Stochastic Analysis of Intermediate Lesions in Carcinogenesis Experiments.
Risk Analysis,
11, 149-157. Luebeck, G.E., Moolgavkar, S.H. (1992). Multistage Carcinogenesis: Population-Based Model for Colon Cancer.
Journal of National Institute, p. 610-618. McCullagh, P., Nedler, J.A. (1989).
Generalized Linear Models.
Chapman and Hall, London. Moolgavkar, S. Venzon, D. (1979). Two-Event Modls for Carcinogenesis: Incidence Curres for Childhood and Adult Tumors.
Mathematical Biosciences,
47, 55-77. Patz, E.F.Jr., Goodman, P.C., Bepler, G. (2000). Screening for Lung Cancer.
New England Journal of Medicine.
Appl. Statist.,
35, 281-288. Prentice, R.L., Gloeckler, L.A. (1978). Regression analysis of grouped survival data with application to breast cancer data.
Biometrics,
34, 56-67.
CCRA3 Proceedings 19
Prentice, R.L., Kalbfleisch, J.D. (1979). Hazard Rate Models with Covariates.
Biometrics,
Statistics in Medicine
6, 773-784. Seber, G.A.F. (1977).
Linear Regression Analysis . John Wiley and Sons. Seber, G.A.F., Wild, C.J. (1989). Nonl inear Regression . John Wiley and Sons. Shore, J. E. and Johnson, R. W. (1980). Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy.
IEEEE inf. Theor.
26 , 26-40. Travis, C.C., White, R.K. (1988). Interspecific Scaling for Toxicity Data.
Risk Analysis,
8, 119-125. Travis, C.C., White, R.K., Ward, R.C. (1990). Interspecies Extrapolation of Pharmacokinetics.
J. Theor. Biol.,
42, 285-304. Welch, H.G., Woloshin, S., Shwarz, L.M. ey al. (2007). Overstarting the Evidence for Lung Cancer Screening: The International Early Lung Cancer Action Program (I-ELCAP) Study.
Archives of International Medicine . 167, 2289-2295. Yemm, E.W. (1965). The respiration of plants and their organs. In “Plant Physiology IV.A” by F.C. Steward (ED), pg. 231-310. Academic Press. New York. Zapponi, G.A., Loizzo, A., Valente, P. (1989). Carcinogenic Risk Assessment; Some Comparisons Between Risk Estimates Dreived from Human and Animal Data.
Experimental Pathology,
37, 210-218.
Appendix A • More about Michaelis-Menten model
The biochemical model for a simple enzyme-substate reaction, derivate by Michaelis-Menten has various form extension. I. Consider the reaction scheme k kk
E S ES E P + ⎯⎯→ + (cid:90)(cid:90)(cid:90)(cid:88)(cid:89)(cid:90)(cid:90)(cid:90) , (A1.1) with E : enzyme, S : its substrate, P : product of the reaction, ,, kkk : rate constants. In the steady state concentration, denoted by [], of ES is constant so that [ ][ ] ( )[ ] k E S k k ES = + . (A1.2) Ι CCRA3 Proceedings If E is the total concentration of enzyme present, actually independent of time, then [ ] [ ] E E ES = + . (A1.3) So (2) from (3) becomes [ ][ ] ( [ ] )[ ] [ ] [ ] k S Ek s E k S k k ES ES k S k k = + + ⇒ = + + . (A1.4) Equation (4) presents a rectangular hyperbola in ][ S , and provides the concentration of substrate molecules that are combined with enzyme molecules. The older than MM known as Langmuir´s absorption isotherm was of the same form. It was referring to the absorption of gas molecules on the solid surfaces. Let as demote by ν the speed of the steady-state reaction [ ] k ES ν = . Then from (4) we get [ ] [ ]:[ ] [ ] k k E S V Sk S k k K S ν = =+ + + . (A1.5) With
EkV = , max1 32 :: KKk kkK M ==+= , (A1.5a) the MM constant max M K K K ≡ ≡ is the value of substrate concentration for the half-maximal velocity, max V V = . The max V denote the maximum velocity of the reaction and is obtained when all the active sites on the enzyme molecules are occupied by substrate molecules. From (5) it is easy to see that the passes through the origin when [ ] 0 S = . The slope is KVSdd S max0][ ][ = = ν . CCRA3 Proceedings 21
Figure A.1.
The Michelis-Menten model. Νο tice that, if we let [ ] x S = , )( xK KVdxd M M += ν ,
2( ) MM V Kddx K x ν = − + . (A1.7) The rectangular hyperbola or MM needs two parameters to be specified: those which define the asymptotes or equivalently the parameters of the model. • Applications of Rectangular Hyperbola (single substrate) • The specific growth rates of micro-organisms [Dean-Hinshelwood, pg. 80-81] • The respiration rate of mature leaves [Yemm, pg. 275] • The respiration in potatoes [Barker] It is interest to notice that leaf photosynthesis, among other physiological problems, are considered as non-rectangular hyperbolic responses with single substrate, while rectangular hyperbolas for two substrates of the form xxCxCxC xkx +++= ν . (A1.8) max VV = K K K K − max V − Ι CCRA3 Proceedings
The equation (5) describes two-substrate enzyme kinetics under certain conditions; see Dixon-Webb, pg. 100. So (8) is a more general form of (5) or any rectangular hyperbola of the form xbax + , with ba , constants. II.
Now, modify the MM in the form E X EX E product + ⎯⎯→ + (cid:90)(cid:90)(cid:88)(cid:89)(cid:90)(cid:90) , where : E enzyme, : X substrate. In such a case molecules of X can combine with E only two at the time. It can be shown that the utilization rate V of substrate X is given by
22 2
XK VxU += . (A1.9) Sometimes a factor X is above a critical value c K . The response can be derived from molecular models and it can be of the form nnc n XK VxU += , (A1.10) with c K the threshold, U the response, X the density concentration level of some substance : c K value of X for half-maximal response, : n usually positive integer and V a constant. ncnc KXKXVU )(1 )( += . (A1.11) Another model can be nc KXVU )(1 1 += . (A1.11a) Model (A1.11) is of the same form of the Morgan-Marcer-Flodin (MMF) family of models ( , ) MMF xf u x θθ θ θθθ θ+= + , (A1.11b) CCRA3 Proceedings 23 see Seber and Wild (1989, pg. 342) for details. • MM process in PARALLEL
The overall behavior of two MM transport process working independently in parallel process. The total flux density M is assumed to be [ ] [ ]MM CMM CMM [ ] [ ]
V S V SK S K S = + = ++ + , with CMM contributed MM , CMM contributed MM , , , , V V K K constants. [ ] dd S M= ( [ ]) ( [ ]) V K V KK S K S ++ + , [ ] dd S M =
V K V KK S K S − −+ + , ][ Sd d M [ ] 0 S = = [ ] V V M SK K + →∞ ≈ V V + . • MM processes in SERIES
312 4 kkk k
E S ES E I + + (cid:90)(cid:90)(cid:90)(cid:88) (cid:90)(cid:90)(cid:90)(cid:88)(cid:89)(cid:90)(cid:90)(cid:90) (cid:89)(cid:90)(cid:90)(cid:90) , (A1.13) where , , E S I are enzymes, MM= ( [ ] [ ])[ ] [ ]
E k k S k k Ik k k S k I −+ + + , where E : total concentration of enzyme present. • Photosynthetic Response Light and CO , P C P uP u CP C P u η ηη= = =+ + . • Leaf Response to light flux density t I : maxmax t dt aI PP RaI P η = −+ . Ι CCRA3 Proceedings
Appendix B
The figures presented here are evaluated with parameters equal to 1. However, in any of the following models, the reader can examine how a model behaves by changing the appropriate i θ ’s and then can observe corresponding change in the related figure (in the MS Word document of this paper double-click on i θ ’s in order to change them, and then right-click on the related figure below and choose “update link” in order to update the figure. Thus, the reader needs to have the MS Word document of this paper and the additional Graph.xls file –provided by the authors upon request– in the same folder). CCRA3 Proceedings 25 f G u f J u • ( , ) exp( ) uG f u e θ θ θ θ= . Figure B.1.
Gompetz model. • ( , ) exp( ) J f u u θ θ θ θ θ= + . Figure B.2.
Janoscheck model. θ θ θ θ θ θ θ Ι CCRA3 Proceedings f B u f L u • ( , ) [1 exp( )] L f u u θ θ θ θ = + . Figure B.3.
Logistic model. •
30 1 2 ( , ) [ exp( )] B f u u θ θ θ θ = + . Figure B.4.
Bertalanffy model. θ θ θ θ θ θ CCRA3 Proceedings 27 f tanh u f J - f B u • ( , ) exp( ) J f u u θ θ θ θ θ = + ,
30 1 2 ( , ) [ exp( )] B f u u θ θ θ θ = + . Figure B.5.
Comparison of Janoscheck and Bertalanffy models. • ( ) tanh 0 1 2 3 ( , ) tanh ( ) f u u θ θ θ θ θ= + − . Figure B.6. tanh-model. θ θ θ θ θ θ θ θ Ι CCRA3 Proceedings f u f u • ( )
2( , ) 1 arctan ( )2 f u u θθ θ θπ ⎡ ⎤= + −⎢ ⎥⎣ ⎦ . Figure B.7. • ( )
2( , ) arctan ( ) f u u θ θ θ θ θπ= + − . Figure B.8. θ θ θ θ θ θ θ CCRA3 Proceedings 29 u f tanh - f - f -10-5051015 0 1 2 3 4 5 f exp - f u • ( ) tanh 0 1 2 3 ( , ) tanh ( ) f u u θ θ θ θ θ= + − , ( )
2( , ) 1 arctan ( )2 f u u θθ θ θπ⎡ ⎤= + −⎢ ⎥⎣ ⎦ , ( )
2( , ) arctan ( ) f u u θ θ θ θ θπ= + − . Figure B.9.
Comparison of tanh-,3-tanh- and 4-tanh-models. • lnexp 0 0 ( , ) u f u u e θ θ θ θ ϑ= = , ln2exp 0 1 ( , ) u f u e θ θ θ θ − = − . Figure B.10.
Comparison of exponential and reparametrized exponential time-power models. θ θ θ θ θ θ θ Ι CCRA3 Proceedings -6-5-4-3-2-1012-5 -4 -3 -2 -1 0 1 2 3 4 5 f W u -0,200,20,40,60,81-5 -4 -3 -2 -1 0 1 2 3 4 5 f GL1 - f GL2 u • ( ) ( , ) ( ) exp ( ) W f u u θ θ θ θ θ θ = − − − . Figure B.11.
Reconstructed Weibull model. • ( , ) 1 GL u u u f u e θ θ θ θ θθ + + + = + ,
31 2 3
02 1 ( , ) 1
GL u f u e θ θ θ θ θθ −+ = + . Figure B.12.
Generalized Logistic models (cases i and ii). θ θ θ θ θ θ θ θ θ4