aa r X i v : . [ s t a t . M E ] M a r SMOOTHING SPLINE GROWTH CURVESWITH COVARIATESKurt S. Riedel and
Kaya Imre ∗ Courant Institute of Mathematical SciencesNew York University251 Mercer St.New York, NY 10012KEYWORDS: Growth curves, Smoothing splines, Multivariate analysis,Generalized crossvalidation, Adaptive splines, Fusion physics.ABSTRACTWe adapt the interactive spline model of Wahba to growth curveswith covariates. The smoothing spline formulation permits a nonparamet-ric representation of the growth curves. In the limit when the discretizationerror is small relative to the estimation error, the resulting growth curveestimates often depend only weakly on the number and locations of theknots. The smoothness parameter is determined from the data by mini-mizing an empirical estimate of the expected error. We show that the riskestimate of Craven and Wahba is a weighted goodness of fit estimate. Amodified loss estimate is given, where σ is replaced by its unbiased esti-mate. ∗ Permanent address: College of Staten Island, C.U.N.Y., Staten Island1 . INTRODUCTION
Growth curve analysis is used to parameterize a family of temporal curveswhose shapes depend on a vector of covariates ~u (Potthoff and Roy (1964),Grizzle and Allen (1969), Geisser (1980)). As an example, we considerthe heights of children as a function of time or age and of a number ofcovariates; both discrete covariates such as sex, and continuous covariatessuch as drug dosages. We are primarily interested in the case where eachindividual has been measured a large number of times, so that the growthas a function of time may be treated nonparametrically. We assume thatthe number of individuals is small enough, or the covariate dependenciesare simple enough, that the covariate dependencies may be treated para-metrically for each fixed time.The general theory of growth curves allows arbitrary basis functionsand Bayesian priors (Lee and Geisser (1972), Rao (1975), Strenio et al.(1983)). In practice, however, the basis functions are usually assumed tobe polynomials in time. Similarly, the covariance matrix of the Bayesianprior is usually a diagonal matrix, or is determined empirically from re-peated measurements, or is given a low order autoregressive structure.Smoothing splines is a powerful technique used to reconstruct nonpara-metric curves and surfaces (see Silverman (1985), Eubank (1988), Muller(1988), Wahba (1990) for reviews). The smoothness penalty function cor-responds to a Bayesian prior. Recently, Wahba has developed a generaltheory of interactive smoothing spline models (Ch. 10, Wahba (1990)).In this brief article, we adapt interactive spline models to growth curveanalysis. Our nonparametric growth curve models may also be viewed asa variant of the Π model of Breiman (1991) which is appropriate whenonly one of the covariates is “timelike” and therefore needs to be treatednonparametrically. 2moothing techniques are widely used for the nonparametric deter-mination of a single growth curve, including the case of repeated measure-ments (Gasser et al. (1984)). For families of covariate growth curves, how-ever, there has been surprisingly little nonparametric work. Partial linearmodels (Rice (1986), Heckman (1986), Speckman (1988)) consider growthcurves where the parametric part of the model depends on covariates, butthe nonparametric part depends only on time: µ ( t, ~ξ i , i ) = ~ξ i · ~β + f ( t ).Raz (1989) considers families of grouped growth curves, and uses smooth-ing splines to determine both the individual growth curves and the maineffect growth curves. Due to the repeated measurement structure of hismodel, he is able to determine each nonparametric function separately.A multidimensional nonparametric ANOVA decomposition is given in Guand Wahba (1991).Of the two main smoothing techniques, kernel smoothing and smooth-ing splines, we concentrate on smoothing splines because they generalizemore naturally to growth curves with covariates. Despite its generality, weare unaware of any growth curve analysis where the nonparametric partof the model depends on covariates through the standard sum of productsstructure. Similarly, except our own research (Kardaun et al. (1988), Mc-Carthy et al. (1991), Riedel (1992)), we are unaware of any growth curveanalysis with the sum of products covariate structure where the basis rep-resentations are regression splines.We briefly review smoothing splines for a single growth curve with n independent measurements: y i = f o ( t ) + ǫ i where ǫ i ∼ N (0 , σ ). Werepresent the curve as a sum of cubic spline basis functions: f o ( t ) = P Kk =1 α k B k ( t ). In the cubic spline model, the model is a piecewise cubicpolynomial with a discontinuous third derivative at each knot locations.When the knots are in the interior of the domain, the number of free pa-3ameters is equal to the number of knots plus four. The free parameters, α k , are determined by minimizing the functionalmin ~α n X i ( y i − f o ( t i , ~α )) σ + λ o Z ( f ( γ ) o ( t, ~α )) dt , (1)where γ is a positive integer with 2 ≤ γ ≤
3. When the smoothing param-eter, λ o , is zero or too small, the spline coefficients become ill-conditionedand spurious oscillations develop with a wavelength proportional to twicethe the distance between knots (Wegman and Wright (1983)). Two reme-dies for this ill-conditioning are to decrease the number of basis functionsand to increase the smoothness parameter λ o . Adjusting the spline repre-sentation is usually a very complicated optimization. Therefore researchershave concentrated on finding methods which determine the optimal valueof λ o from the data. The smoothness penalty function then shrinks the apriori probability of oscillatory behavior to a small and acceptable level.For a single function, the minimization of Eq. 1 can be done over aninfinite dimensional Sobolev space, and the resulting minimum is a cubicspline function with knots at all the measurement locations. In practice,many fewer knots may be sufficient to represent the unknown function, f o ( t ). This class of representations is called hybrid spline models. Inthe covariate growth curve models of Sec. II, the measurement times areusually nonuniform, and we normally consider hybrid spline models withsignificantly fewer knots than the total number of distinct measurementtimes.When hybrid splines are used, two types of error arise: model errorfrom discretization effects and estimation error. Agarwal and Studden(1980) give bounds on the discretization error. We present generalizationsof the standard estimation error formulas in Sec. III. Detailed asymptoticanalysis can be found in Cox (1984), Eubank (1988), and Wahba (1990).4hen discretization error is small relative to the estimation error, hybridmodels are generally appropriate since they reduce the ill-conditioning ofthe estimate at relatively little cost in terms of the total error. For theprofile data of Sec. V, we find that the fitted function depends only weaklyon the locations of the knots. II. SEMIPARAMETRIC GROWTH CURVE MODELS
For covariate growth curves, we consider a family of N individu-als with the i th individual having measured responses at n i timepoints, t ,i , . . . t n i ,i . We assume the covariates are fixed in time for the i th individ-ual and denote their value by the M vector, ~u i . We denote the n i vector ofthe i th measurement times, ( t ,i , . . . t n i ,i ) t , by ~t i . We denote the n i vectorof measured responses, ( y ,i , . . . y n i ,i ) t , by ~y i and the vector of true val-ues, ( µ ( t ,i , ~u i , i ) , . . . µ ( t n i ,i , ~u i , i )) t , by ~µ i ( ~t i , ~u i ). Although we are primarilyinterested in data which is grouped by individual, multiple time measure-ments of each individual are not necessary. Uncorrelated measurementsmay be treated with n i = 1.We divide the model for µ ( t, ~u, i ) into a parametric part, h ( t, ~u, i ), anda nonparametric part, f ( t, ~u, i ). The parametric part of the model is as-sumed to have a specific known functional form: h ( t, ~u, ~β, i ) = P Jj =1 h j ( t, ~u, i ) β j ,where the h j ( t, ~u, i ) are known functions and the β j are undetermined freeparameters. We allow both the parametric and nonparametric terms todepend on the specific individual to allow for fixed effects models.We therefore consider growth curve models of the form µ ( t, ~u, i ) = h ( t, ~u, ~β, i ) + f ( t ) + L X ℓ =1 f ℓ ( t ) g ℓ ( ~u, i ) . (2 a )We assume that the temporal structure of the f ℓ ( t ) is sufficiently complexthat it requires a nonparametric representation. The model of Eq. 2a5s a special case of the Π model of Breiman (1991). We assume thatthe covariate dependencies, g ℓ ( ~u, i ), are either given or that they dependon a low number of free parameters, ~β : g ℓ ( ~u, i, ~β ). In contrast, Breimanestimates both f ℓ ( t ) and g ℓ ( ~u, i ) nonparametrically.When the last term, P Lℓ =1 f ℓ ( t ) g ℓ ( ~u, i ), is omitted, our models reduceto the standard partial linear spline models. Our sum of products covariatespline models resemble generalized additive models, and we expect many ofthe same techniques and theorems to apply (Friedman and Stuetzle (1981),Hastie and Tibshirani (1990), Friedman (1991)). Similar to Wahba’s in-teractive spline model, we use a separate smoothing spline for each of the f ℓ ( t ).In most applications, the covariate basis functions, g ℓ ( ~u ), are linear,( u m − ¯ u m ), or quadratic, and are centered about the database mean orare discrete variables. If the nonparametric part of each individual curveis considered noise, a nonparametric function may be included for eachseparate individual: g ℓ ( ~u, i ) = δ ℓ,i .In our initial study of profile variation (Kardaun et al. (1988), Mc-Carthy et al. (1991)), the data consisted of N sets of temperature mea-surements at 15 different radial locations in a fusion plasma, with 10 ≤ N ≤ r , is the timelike variable, and the maincovariate is q , which corresponds to the ratio of the average magnetic fieldto the total plasma current. The shape of the profiles is much less variablethan the overall magnitude; therefore each individual curve was given itsown own intercept: h j ( t, ~u, i ) = δ j,i . Thus the log linear nonparametricmodel of McCarthy et al. (1991) is ln [ T ( r, q, i )] = β i + f ( r ) + f ( r ) ln ( q ) . (2 b )In our earlier study, we used regression splines ( λ ≡
0) with only three6nots. In Section V, we consider data with more than 60 measurements perprofile. This data requires many more knots, and the resulting estimationproblem is ill-conditioned without a smoothness penalty function.We assume that both the number and choice of both the parametricbasis functions, h j ( t, ~u, i ), and the covariate basis functions, g ℓ ( ~u, i ) aregiven a priori . In practice, both sets of basis functions are often deter-mined iteratively using a combination of statistical common sense and theminimizations of weighted sums of residual errors and smoothness/degreeof freedom/predictive error penalty functions.Each of the functions, f ℓ ( t ), is given a radial representation: f ℓ ( t ) = P Kk =1 α kℓ B k ( t ) , where the B k ( t ) are basis functions. We recommend thechoice of B splines for the B k (deBoor (1978)). B splines are a particu-lar reparameterization of the standard piecewise polynomial splines whichhave the advantage that each function is spatially localized. The result-ing design matrix has a band structure. In practice, the innermost andoutermost basis functions are often restricted to be linear in time.When the temporal design is uniform, ~t i = ~t j , the knot positionsmay be chosen at all the distinct measurement times. If each individualis measured at slightly different times, t p,i = t p + ˜ t p,i , we can choose theknot locations at the typical measurement times, t p . Alternatively, we canplace a knot at every distinct measurement time: t p,i . The value of theadditional knots in reducing the model error depends on the ratio of thetypical time between measurements, t p +1 − t p , to the the spread of themeasurement times, σ tp where σ tp ≡ N P Ni =1 ˜ t p,i . In many cases, the spreadof times is small relative to the typical time between measurements. Inthese cases, we typically choose the first alternative, knot locations onlyat t p . Alternatively, the number and location of the knot positions can bedetermined by convergence tests or by the data-based parameter selection7riteria of Sec. III.We denote the K × ( L + 1) matrix whose elements are α k,ℓ , ℓ =0 , . . . L, k = 1 , . . . K by α , and the ℓ th column vector by ~α ℓ : ~α ℓ k ≡ α k,ℓ .The n i × K temporal design matrix for the nonparametric part of the i thindividual is denoted by X i with elements X ip,k = B k ( t p,i ). Similarly, theparametric part of the design matrix is defined by H ip,j ′ = h j ′ ( t p,i , ~u i , i ).Thus the i th individual is parameterized by ~µ i ( ~t i , ~u i ) = H i ~β + X i α ~g ( ~u i , i ) , (3)where ~g ( ~u i , i ) ≡ (1 , g ( ~u i , i ) , . . . g L ( ~u i , i )) t , and α is the K × ( L + 1) matrixof unknown parameters. Our construction of the nonparametric designmatrix specifically assumes that the covariates, ~u i , are time independent. Ifthe covariates are time dependent, the Kronecker product design structure,( ~g ( ~u i , i ) t ⊗ X i ), should be replaced by X iG p,ℓK + k = B k ( t p,i ) g ℓ ( ~u i ( t p,i ) , i ),and α is replaced by ~α G , the concatenation of the columns of α .The error structure is assumed to be independent between individu-als, but arbitrary within each individual. The n i × n i covariance matrix, Σ i , of the measurement errors may be either given or estimated. Thus ourmodels include random effects models for the nonparametric part.To determine the free parameter matrix in our smooth spline growthcurve model, we minimize over α , ~β the functional: N X i ( ~y i − ~µ i ( ~t i , ~u i , α , ~β )) t Σ − i ( ~y i − ~µ i ( ~t i , ~u i , α , ~β )) + L X ℓ =0 λ ℓ Z | f ( γ ) ℓ ( t, ~α ℓ ) | dt (4)where γ is an integer with 2 ≤ γ ≤
3. The second term consists of a sep-arate smoothness penalty function for each of the f ( γ ) ℓ ( t ) . The smoothingparameters , λ ℓ , control the tradeoff between the goodness-of-fit, P Ni ( ~y i − ~µ i ( ~t i , ~u i , α , ~β )) t Σ − i ( ~y i − ~µ i ( ~t i , ~u i , α , ~β )), and the smoothness of the solu-tion. If necessary, the utility functional may be robustified in the standard8ays. Our analysis generalizes to smoothing surfaces with covariates. Inother words, if time, t , is replaced by two nonparametric variables suchas time and age, ( t, s ), our formulas remain valid if f ℓ ′′ ( t ) is replaced by | ∂ t f ℓ | + 2 | ∂ t ∂ s f ℓ | + ∂ s | f ℓ | .Each penalty function may be represented parametrically as ~α tℓ S ~α ℓ = R f ( γ ) ℓ ( t ) dt , where S k,k ′ ≡ R B ( γ ) k ( t ) B ( γ ) k ′ ( t ) dt . Differentiating Eq. 4 withrespect to ~α ℓ yields N X i =1 g ℓ ( ~u i , i ) (cid:18) X ti Σ − i X i ˆ α ~g ( ~u i , i ) − X ti Σ − i ( ~y i − H i ˆ ~β ) (cid:19) + λ ℓ S ˆ ~α ℓ = 0 . (5 a )The corresponding variation of Eq. 4 with respect to ~β , the parametricpart of the model, yields: N X i =1 H ti Σ − i H i ˆ ~β = N X i =1 H ti Σ − i ( ~y i − X i ˆ α ~g ( ~u i , i )) . (5 b )This is a linear system of ( L + 1) K + J unknowns. The optimal valuesof the ( L + 1) smoothing parameters, λ ℓ , are unknown and also may bedetermined from the data.We now reformulate Eq. 5 as a generalized ridge regression with K ( L + 1) + J unknowns and N T measurements, where N T ≡ P Ni =1 n i .We concatenate the columns of the n individual measurements into a N T vector, ~Y c , and the columns of α and ~β into a K ( L + 1) + J vector ~α c : ~α tc ≡ ( ~α to , ~α t ...~α tL , ~β t ). The N T × ( K ( L + 1) + J ) design matrix consistsof the concatenation of the N matrices ( ~g ( ~u i , i ) t ⊗ X i , H i ), or for timedependent covariates, ( X G,i , H i ).The total covariance matrix, Σ c , consists of diagonal matrix entries, Σ i . The total penalty function, S c ( ~λ ), consists of diagonal matrix entries λ ℓ S : S ( ~λ ) ℓK + k,ℓ ′ K + k ′ = λ ℓ S k,k ′ δ ℓ,ℓ ′ for 0 ≤ ℓ ≤ L and zero otherwise. In9his formulation, Eq. 5 is transformed to (cid:16) X tc Σ − c X c + S c ( ~λ ) (cid:17) ˆ ~α c ( ~λ ) = X tc Σ − c ~Y c . (6)Our interactive spline models are a special class of mixed models. As such,the covariance structure may be parametrized with a free parameter vector, θ : Σ c = Σ c ( θ ), and θ may be estimated using a restricted maximumlikelihood estimator (Harville (1977)). The Bayesian posterior covarianceof the discretized system is Cov (cid:16) ~α c ~α tc (cid:17) ≡ (cid:18)(cid:16) X tc Σ − c X c (cid:17) − + S c ( ~λ ) − (cid:19) − , (7)where S c ( ~λ ) − is the Moore-Penrose generalized inverse of S c ( ~λ ). III. DATA-BASED PARAMETER DETERMINATION
The optimal values of the smoothing parameters, ~λ , are unknown, andwe endeavor to estimate them from the data. Most data-based estimatesdetermine the free parameters by minimizing a functional of the data.Risk-based methods minimize a functional related to the predictive error,appropriately weighted (Hall and Titterington (1987)). Goodness of fitmethods minimize the residual squared error weighted by a measure of thenumber of effective degrees of freedom.We now present several common goodness of fit functionals, andthen a class of risk-based functionals. These functionals do not requirethe growth curve structure which we discussed previously. Instead, werequire only that the covariance structure, Σ c , is known up to an ar-bitrary scalar: Σ c = σ ˆΣ . We also assume that the penalty matrix, S c ( ~λ ) satisfies S c ( ~λ ) = P ℓ λ ℓ S ℓ . We redefine ~λ : ~λ new ≡ ~λ old σ . We de-fine C ≡ X tc ˆΣ − c X c , and the matrices, G ( ~λ ), and the influence matrix,10 c ( ~λ ): G ( ~λ ) ≡ (cid:16) X tc ˆΣ − c X c + S c ( ~λ ) (cid:17) − , A c ( ~λ ) ≡ X c G ( ~λ ) X tc ˆΣ − c . (8)The influence matrix is not a projection due to the penalty matrix, S c ( ~λ ): A c ( ~λ ) > A c ( ~λ ) A c ( ~λ ), and therefore the concept of effective degrees offreedom is tenuous.Most of the goodness of fit data-based functionals (when σ is un-known) are of the form: V ( ~λ ) ≡ k ( ~Y c − A c ( ~λ ) ~Y c ) t ˆΣ − ( ~Y c − A c ( ~λ ) ~Y c ) k N T M ( A c ( ~λ )) , (9)where M is a real valued function on N T × N T matrices. Furthermore,these functionals usually satisfy M ( A c ( ~λ )) ∼ . − T race ( A c ( ~λ )) /N T when N T >> T race ( A c ( ~λ )) (Hardle, Hall, and Marron (1988)). Gener-alized cross-validation (G.C.V.) (Craven and Wahba (1979)) is the mostwidespread data-based functional of the form given by Eq. 9, with M ( A c ( ~λ )) ≡| . − T race ( A c ( ~λ )) /N T | . Another goodness of fit functional is the cor-rected Akaike information criteria (Hurvich and Tsai (1989)), which isbased on the information content. G.C.V. automatically includes the ef-fects of model error including discretization error in its estimate of theoptimal ~λ . In contrast, estimates of the expected error often neglect modelerror including discretization error.The total expected error in ~α c ( ~λ ) from the sampling perspective, as-suming the discrete model is correct, is E h (ˆ ~α c ( ~λ ) − ~α c )(ˆ ~α c ( ~λ ) − ~α c ) t i = σ G ( ~λ ) C G ( ~λ ) + G ( ~λ ) S c ( ~λ ) ~α c ~α tc S c ( ~λ ) G ( ~λ ) . (10 a ) σ G ( ~λ ) C G ( ~λ ) is the variance, and G ( ~λ ) S c ( ~λ ) ~α c ~α tc S c ( ~λ ) G ( ~λ ) is the biaserror. For any positive semidefinite matrix, Q , we can select ~λ by mini-mizing 11 ( ~λ, Q ) = T race ( Q E h (ˆ ~α c ( ~λ ) − ~α c )(ˆ ~α c ( ~λ ) − ~α c ) t i ) = T race ( Q h σ G ( ~λ ) C G ( ~λ ) + G ( ~λ ) S c ( ~λ ) ~α c ~α tc S c ( ~λ ) G ( ~λ ) i ) , (10 b )with respect to ~λ . When Q ≡ X t Σ − X , the risk estimate correspondsto minimizing the predictive error. Choosing Q = S c corresponds tominimizing the average expected square error in estimating the γ th radialderivative. From the asymptotic results of Cox (1984), we expect thatderivative estimation will require larger values of λ than estimating theunknown profiles. Equations 9 and 10 are generalizations of previouslyknown functionals (Craven and Wahba (1979)) to an arbitrary covariancematrix, Σ c , and similar equations are given in Diggle and Hutchinson(1989). When ~λ is selected to minimize the risk, we have:12 ∂R ( ~λ, Q ) ∂λ ℓ = 0 = T race h Q G ( ~λ ) S ℓ G ( ~λ ) (cid:16) C ~α c ~α tc S c ( ~λ ) − σ C (cid:17) G ( ~λ ) i , (11)When σ is known, but ~α c is unknown, we have the following estimate forthe minimum of the expected loss: ∂ ˆ R ( ~λ, C ) ∂λ ℓ = 0 = ~Y tc ˆΣ − X c G ( ~λ ) S c ( ~λ ) G ( ~λ ) S ℓ G ( ~λ ) X tc ˆΣ − ~Y c − σ T r h G ( ~λ ) S ℓ G ( ~λ ) C i , (12)where we have restricted to Q = C . Equation 11 or 12 constitute a setof L + 1 equations to determine the optimal values of { λ ℓ } . When Σ isknown and the errors are uncorrelated, the risk-based estimate, ˆ R ( ~λ, C )was proposed in Craven and Wahba (1979). We prefer the differentialformulation of Eq. 12, ∂ ˆ R ( ~λ, C ) ∂λ ℓ = 0, because Eq. 12 shows explicitly thatminimizing ˆ R is a weighted goodness of fit estimator.In practice, σ is often unknown, and can be estimated from the data12sing ˆ σ = k ( ~Y c − ˆ A c (0) ~Y c ) t ˆΣ − ( ~Y c − ˆ A c (0) ~Y c ) k [ N T − tr( ˆ A c (0))] , (13)where ˆ A c (0) = X c C X tc ˆΣ − . The variance in the estimate of σ of Eq.13 is inversely proportional to [ N T − tr( ˆ A c (0))], and therefore increasesas the number of spline basis functions grow. We can choose the numberof basis functions to minimize the tradeoff of variance in ˆ σ to bias in thediscretization error.The number and choice of the basis functions, h j ( t, ~u, i ), and g ℓ ( ~u ),may also be determined by minimizing the data-based functional. However,as the dimension of the multivariate minimization increases, these data-based functionals may have a number of relative minima. Furthermore,the minimum of the function is often shallow, and the estimated value of ~λ may converge slowly to its optimal value as N tends to infinity. Theconvergence to the optimal value is slow when the utility function, V ( λ ),is an insensitive function of the smoothing. On the other hand, in thesecases, the risk/goodness of fit is not dramatically worsened by the use ofa suboptimal value of ~λ . IV. NUMERICAL IMPLEMENTATION
In our numerical implementation, we concatenate the rows of the K × ( L + 1) matrix α instead of the columns. The corresponding n i × K ( L + 1) design matrix for the i th individual satisfies X ij, ( L +1)( k − ℓ +1 = B k ( r ij ) g ℓ ( u i ) where j = 1 . . . n i . The advantage of this reordering of theunknown coefficients is that the resulting X t X matrix has a band struc-ture. The empirical estimate of the minimizing risk, Eq. 12 can be rewrit-13en as λ ℓ = σ T r h G ( ~λ ) S ℓ G ( ~λ ) C i − P ℓ ′ = ℓ λ ℓ ′ ˆ ~α t S ℓ ′ G ( ~λ ) S ℓ ˆ ~α ˆ ~α t S ℓ G ( ~λ ) S ℓ ˆ ~α , (14 a )where ˆ ~α is given in Eq. 6. We iteratively evaluate Eq. 14a for each λ ℓ tominimize Eq. 12.Two simplifications of the λ estimate are possible. First, when G ( ~λ )is approximately block diagonal, we can neglect the second term in thenumerator of Eq. 14a, and the simplified equation is λ ℓ = σ T r h G ( ~λ ) S ℓ G ( ~λ ) C i ˆ ~α t S ℓ G ( ~λ ) S ℓ ˆ ~α . (14 b )In practice, the variance in the estimated mean profile, f ( t ), is usuallymuch smaller than the variance in the estimates of f ( t ) . . . f ℓ ( t ). Thus ifall λ ℓ are equal, the smoothing tends to be too little for f ( t ) or too much for f ( t ) . . . f ℓ ( t ). Thus a second simplification is to reduce the dimensionalityof the minimization by imposing the model restriction: λ ≡ λ . . . λ ℓ and λ o = λ . V. EXAMPLE
We consider a 40 profile dataset from the Tokamak Fusion Test Reac-tor (T.F.T.R.) at the Princeton Plasma Physics Laboratory (Hiroe et al.(1988). Each profile consists of approximately 61 temperature measure-ments at different spatial locations. In contrast,the data from our earlierstudy of the A.S.D.E.X. tokamak had only 15 spatial locations. Thus theA.S.D.E.X data was well modeled with a spline with only three knots whilethe T.F.T.R. data is better suited to a smoothing spline ten or more knots.In the middle of the plasma, the error bars are proportional to thetemperature, while the errors at boundary are roughly constant. Thus14e use the logarithm of the temperature as the dependent variable andincrease Σ k,k towards the boundary. We estimate σ by fitting each profileseparately to a one dimensional model. We use γ = 3 in the penaltyfunction. To determine λ and λ , we iterate Eqs. 6 & 14b.Figure 1 plots the raw data and the fitted profile for the two profileswith the largest and smallest value of q for the fourteen knot fit. Figure2 gives the corresponding fit with 46 knots. The similarity of the two fitssupports our assertion that smoothing spline fits are often only weakly de-pendent on the choice of knots. Similarly, the fits are also insensitive tothe values of λ and λ . The error bars in Figs. 1 & 2 are given by the“plug-in” approximation, i.e. we substitute ˆ ~α into Eq. 10a to estimate thelocal value of the expected square error for the profiles fits. The expectederror increases near the boundaries, partly because the measurement vari-ance increases near the boundary. To reduce the boundary effect, we haveincreased the spacing between knots near the boundary. The vertical lineson Figs. 1 & 2 give the knot locations.We have fit the profiles with up to four covariates, plasma q , averageparticle density, average magnetic field, and electrical voltage. The ad-ditional covariates do not produce any noticable change in the predictedprofiles.The slight misfits near r = 2 for the low q profile and near r = 2 . q profile appear to be due to random variation in the profileshape rather than systematic errors in the additive spline model. For agiven profile, the residual errors tend to be uniformly positive or negative,indicating that the shape of the temperature profile is better determinedthan the magnitude of the average temperature. We are currently imple-menting the random intercept covariance model: Σ j,k = σ δ j,k + σ o . Themodel of Eq. 2b is the corresponding fixed effects model for the intercept.15or a more detailed discussion of the experimental findings, we refer thereader to (Imre and Riedel (1993)). VI. SUMMARY
Growth curves characteristically have much temporal structure andresolution and relatively poorly resolved covariate structure. Thus we ap-ply nonparametric smoothing splines in the temporal representation andsimple, parametric representations in the covariate directions. This classof models does not require multiple time measurements, but the class iswell suited to this structure. In other situations, there may be only enoughdata in the temporal direction for parametric representations or sufficientdata in the covariate directions for a nonparametric representation.We close by noting that Eq. 12 is the general matrix weighting of theresidual error that estimates the value of λ which minimizes the expectedloss. Under the assumption that C and S are equal up to a scalar multiple,Hall & Titterington (1987) have shown that estimating minimizer of theexpected loss is equivalent to a specific goodness of fit estimator. Equation12 generalizes the Hall & Titterington result to the case when C = c S . APPENDIX: UNIFORM TEMPORAL DESIGN
When the temporal design is uniform, i.e. X = X . . . = X N ,Eq. 5a may be recast in multivariate form. For simplicity, we assume ~β ≡ N observed profiles, each consisting of thesame n time points, can be represented by a n × N data matrix, Y . The( L + 1) × N covariate data matrix, U , has columns ~g ( ~u ) through ~g ( ~u N ).In multivariate notation, the growth curve model is Y = X α U + E , (15)16here E is the n × N matrix of random errors. The first variation of thethe utility functional (Eq. 5a) can be rewritten as (cid:16) X t Σ − X α U U t + S α Λ (cid:17) = X t Σ − Y U t , (16)where Λ is a ( L + 1) × ( L + 1) diagonal matrix whose elements are λ ℓ .Unfortunately, the tensor product formulation does not allow a solutionbased on the separation of variables. Instead, the full ( L + 1) K × ( L + 1) K system must be solved.A separable smoothing spline model may be constructed as follows.We decompose U U t into O D O t where O is orthonormal and D is di-agonal. We then replace the growth curve model of Eqs. 2 & 3 with theequivalent model with α new = α old O and ~g ( ~u i , i ) new = O t ~g ( ~u i , i ) old . Thisis equivalent to replacing the old penalty function for the functions f ℓ ( t )with a matrix valued penalty function. The separable system splits thegrowth curve problem into L + 1 independent one dimensional problems.For the separable problem, the one dimensional convergence of Cox (1984)results may be extended trivially.ACKNOWLEDGMENTSKSR’s understanding of growth curves has benefited from years of col-laboration with O. Kardaun and P. McCarthy. We thank C. Hurvich fordiscussions on data-based selection criteria. D. Stevens’s help in the initialimplementation of the spline fit is gratefully acknowledged. The T.F.T.R.data was collected by the T.F.T.R. experimental team at Princeton Uni-versity. We received the data from the U.S. D.O.E.-I.T.E.R. magnetic fu-sion energy database. We thank both the T.F.T.R. and M.F.E. databasegroups, in particular Dr. W.H. Miner Jr. The valuable comments of thereferee are appreciated. This work was supported by the U.S. Department17f Energy. 18IBLIOGRAPHY1. Agarwal, G. and Studden, W. (1980). Asymptotic mean square errorusing least squares and bias minimizing splines. Annals of Statistics , 1307-1325.2. Breiman, L. (1991). The Π method for estimating multivariate func-tions from noisy data. Technometrics , 125-1603. Cox, D.D. (1984). Multivariate smoothing splines functions. SIAMJ. Numer. Anal. . 789-813.4. Craven, P. and Wahba, G. (1979). Smoothing noisy data with splinefunctions: estimating the correct degree of smoothing by the methodof generalized cross-validation. Numer. Math. , 377-403.5. deBoor, C. (1978). A Practical Guide to Splines . Springer-Verlag,New York.6. Diggle, P.J. and Hutchinson, M.F. (1989). Spline smoothing withautocorrelated errors.
Australian J. Stat. , 166-182.7. Eubank, R. (1988). Smoothing splines and nonparametric regression.
Marcel Dekkar, N.Y., Basel.8. Friedman, J.H. (1991). Multivariate adaptive regression splines (withdiscussion).
Annals of Statistics , 1-143.9. Friedman, J.H., and Stuetzle, W. (1981). Projection pursuit regres-sion. J.A.S.A. , 817-823.10. Gasser, T., Muller, H.-G., Kohler, W. et al. (1984). Nonparametricregression analysis of growth curves. Annals of Statistics , 210-229. 191. Geisser, S. (1980). Growth Curve Analysis. Handbook of Statistics ,P.R. Krishnaih, ed., Vol.1, 89-115, North Holland Publishing Co.,Amsterdam.12. Grizzle, J.E. and Allen, D.M (1969). Analysis of growth and doseresponse curves.
Biometrics , 357-381.13. Gu, C. and Wahba, G. (1991). Smoothing spline ANOVA with com-ponentwise Bayesian “confidence intervals”. Technical Report 881,Department of Statistics, University of Wisconsin, Madison.14. Hall, P., and Titterington, D. (1987). Common structure of tech-niques for choosing smoothing parameters in regression problems. J.Roy. Stat. Soc. Ser. B , 184-198.15. Hardle, W., Hall, P., and Marron, S. (1988). How far are automat-ically chosen smoothing parameters from their optimum? J.A.S.A. , 86-95.16. Harville, D.A. (1977). Maximum likelihood approaches to variancecomponent estimation and related problems. J.A.S.A. , 320-340.17. Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models.
Chapman Hall, London.18. Heckman, N. (1986). Spline smoothing in partly linear models.
J.Roy. Stat. Soc. Ser. B , 244-248.19. Hiroe, A., et al., (1988). Scale length study in T.F.T.R. PrincetonPlasma Physics Laboratory Report
20. Hurvich, C.M. and Tsai, C.-L. (1989). Regression and time seriesmodel selection in small samples.
Biometrika , 297-307.201. Imre, K., and Riedel, K.S. (1993). In progress.22. Kardaun, O., McCarthy, P.J., Lackner, K., Riedel K.S., and Gruber,O. (1988). A statistical approach to profile invariance. Theoryof Fusion Plasmas (Varenna 1987 Proc.), Societ`a Italiana di Fisica,Bologna , 435-444.23. Lee, J.C., and Geisser, S. (1972). Growth curve prediction.
SankhyaSer. A , 393-412.24. McCarthy, P.J., Riedel, K.S., Kardaun, O., et al. (1991). Scalingsand plasma profile parameterisation of ASDEX high density Ohmicdischarges. Nuclear Fusion , 1595-1633 .25. Muller, H. (1988). Nonparametric Regression Analysis of Longitudi-nal Data , Lecture Notes in Statistics Vol. 46, Springer, Heidelberg.26. Potthoff, R.F. and Roy, S.N. (1964). A generalised multivariate anal-ysis of variance model useful especially for growth curve problems.
Biometrika , 313-326.27. Rao, C.R. (1975). Simultaneous estimation of parameters in differentlinear models and applications to biometric problems. Biometrics ,545-554.28. Raz, J. (1989). Analysis of repeated measurements using nonpara-metric smoothers and randomisation tests. Biometrics , 851-871.29. Rice, J.B. (1986). Convergence rates for partially splined models. Statist. Prob. Lett. , 203-208.30. Riedel, K.S. (1992). Smoothing spline growth curves with covari-ates. Courant Institute of Mathematical Sciences Report MF-123, ew York University.
31. Silverman, B. (1985). Some aspects of the spline smoothing approachto nonparametric regression curve fitting.
J. Roy. Stat. Soc. Ser. B , 1-52.32. Speckman, P.E. (1988). Regression analysis for partially linear mod-els. J. Roy. Stat. Soc. Ser. B , 413-436.33. Strenio, J.F., Weisberg, H.I., and Bryk, A.S. (1983). EmpiricalBayesian estimation of individual growth curve parameters and theirrelationship to covariates. Biometrics , 71.34. Wahba, G. (1990). Spline Models for Observational Data , S.I.A.M.,Philadelphia.35. Wegman, E. and Wright, I. (1983). Splines in statistics.
J.A.S.A.78