Selecting the Derivative of a Functional Covariate in Scalar-on-Function Regression
SSelecting the Derivative of a Functional Covariate inScalar-on-Function Regression
Giles Hooker * Department of Statistical Sciences, Cornell UniversityResearch School of Finance, Actuarial Studies and StatisticsAustralian National UniversityHan Lin ShangDepartment of Actuarial Studies and Business AnalyticsMacquarie UniversityAugust 19, 2020
Abstract
This paper presents tests to formally choose between regression models using differentderivatives of a functional covariate in scalar-on-function regression. We demonstrate thatfor linear regression, models using different derivatives can be nested within a model thatincludes point-impact effects at the end-points of the observed functions. Contrasts can then beemployed to test the specification of different derivatives. When nonlinear regression modelsare defined, we apply a J test to determine the statistical significance of the nonlinear structurebetween a functional covariate and a scalar response. The finite-sample performance of thesemethods is verified in simulation, and their practical application is demonstrated using achemometric data set.Keywords: model selection; variable selection; likelihood ratio test; J test Recent advances in computer recording and storing technology facilitate the presence of functionaldata sets, which motivated many researchers to consider various functional regression models forestimating the relationship between predictor and response variables, where at least one variableis function-valued. The functional formulation of a linear model dates back to a discussion byHastie and Mallows (1993), Dalzell and Ramsay (1993); see Ramsay and Silverman (2005) for a fulldetailed overview and Ramsay et al. (2009) for software implementation. * Postal address: Department of Statistical Sciences, 1186 Comstock Hall, Cornell University, Ithaca, NY 14853,USA; Telephone: +1 607 255 1638; Fax: +1 607 255 4698; Email: [email protected] a r X i v : . [ s t a t . M E ] A ug ince then, models to incorporate functional variable have been extended to include gener-alized linear models (Aguilera et al., 2008; M ¨uller and Stadm ¨uller, 2005), additive regression(Febrero-Bande and Gonz´alez-Manteiga, 2013; McLean et al., 2014), polynomial models (Yaoand M ¨uller, 2010), nonparametric functional regression models (Ferraty and Vieu, 2006), semi-functional partial linear models (Aneiros-P´erez and Vieu, 2006, 2008) and many more. Because thefast development in functional regression models, it has received increasing popularity in variousfields of application, such as age-specific mortality and fertility forecasting in demography (Hynd-man and Shang, 2009), analysis of spectroscopy data in chemometrics (Ferraty and Vieu, 2002),earthquake modeling (Quintela-del-R´ıo et al., 2011) and ozone-level prediction (Quintela-del-R´ıoand Francisco-Fern´andez, 2011).Despite relatively mature literature on functional models, there has been little attention given toselecting which derivative of observed functional data X ( t ) to use as a covariate. One distinguish-ing feature of functional data is access to multiple derivatives X ( k ) ( t ) of X ( t ) . It is therefore naturalto consider using one or more of these as a covariate. Indeed Ferraty and Vieu (2002) discussesthe use of semi-metrics based on derivatives for non-parametric regression, and Ferraty andVieu (2009) empirically finds that the use of second derivatives provides significant performanceimprovement in the example data set we use below. We examine formal methods of comparingmodels that use different derivatives of X ( t ) .Below we distinguish between two forms of the model. When X ( t ) or its derivatives enter themodel linearly, integration by parts provides a means of embedding smooth linear functionals ofboth X ( t ) and X ( ) ( t ) within a larger space: (cid:90) α ( t ) X ( ) i ( t ) dt = α ( ) X i ( ) − α ( ) X i ( ) − (cid:90) α ( ) ( t ) X i ( t ) dt . (1)Starting from using X i , we can assess whether X ( ) i is more appropriate by first testing the ex-pansion of a model using X i ( t ) to include separate point-impact effects for the endpoints. Wethen formulate a contrast to test whether the reduction of the expanded model corresponds tothe left-hand side of (1). These can both be done via F -tests formulated for a penalized linearregression. The same formulation allows us to reverse the inference – to start with X ( ) ( t ) and testfor X ( t ) – and to consider changes of more than one derivative as well.When X ( k ) ( t ) does not enter the model through a smooth linear operator, the formulationin (1) cannot be applied generically. Instead, we propose a form of the J test of Davidson andMacKinnon (1981) to allow us to form nested models after estimating each separately.2ur paper proceeds as follows. In Section 2, we develop linear contrasts to test the adequacyof different derivatives within a linear model specification. Although not investigated here,these methods can be readily extended to generalized linear models. Section 3 examines a J testfor explicitly nonlinear models that can also be used in conjunction with functional principalcomponents regression. Section 4 provides some simulation results for our methods and Section 5illustrates these on the Tecator data set (see also Ferraty and Vieu, 2006).Throughout the below, we distinguish between a data generating process and a fitted model.Here we use the notation y i = f ( X i ) + (cid:101) i to indicate a data generating process or a hypothesizedmodel (where f may be given by a linear model in terms of functional parameters), and y i ∼ g ( X i ) to indicate that we fit the model g ( X i ) (including any parameters) to the data. For this paper, wewill only consider fitting by penalized least squares, i.e., minimizing ∑ [ y i − g ( X i )] + P ( g ) where P ( g ) is quadratic in any parameters that are used to fit. However, extensions to other likelihoodsare fairly immediate. This section develops formalized tests between functional derivatives used as covariates. Withoutloss of generality, we assume a collection of data [ y i , X i ( · )] for i =
1, . . . , n with each X i a functionof the interval [
0, 1 ] . We consider a model of the form y i = β + (cid:90) β k ( t ) X ( k ) i ( t ) dt + (cid:101) i , (cid:101) i ∼ N ( σ ) ,where k indicates the order of the derivative to use, typically k ∈ {
0, 1, 2 } , and a central challengeis the choice of k .A nested test can always be constructed by estimating a model that includes multiple deriva-tives: y i ∼ β + ∑ k (cid:90) β k ( t ) X ( k ) i ( t ) dt .We explore this framework below, but here we note that this test is complicated by the associationbetween the derivatives of X i ( t ) . These derivatives do not cover the same linear space, but theirspaces do overlap considerably. In this paper, we apply integration by parts to show that tocompare models with different derivatives; we can embed both models into a common space byadding a finite number of point impacts and use this to construct a set of contrasts to distinguishdifferent derivatives. 3e begin by setting up contrasts to produce tests between derivatives defined through inte-gration by parts and then discuss the numerical implementation of tests of these contrasts withincommon functional data packages. We will do this for three specific tests: taking X i as a baselineand testing whether X ( ) i is more appropriate, the reverse procedure starting from X ( ) i and testingwhether X i is better, and testing a change of two derivatives from X i to X ( ) i . While these tests canbe given as special cases of a more general procedure, we expect that they cover all the cases thatare likely to be practically relevant. We start by considering the first derivative as a generating model: y i = α + (cid:90) β ( t ) X ( ) ( t ) dt + (cid:101) i , (2)and we may have hypothesized a model using the 0 th derivative X ( t ) y i = α + (cid:90) α ( t ) X ( t ) dt + (cid:101) i , (3)and wish to test whether (2) is a more appropriate model. To carry such a test out, we need toformulate a contrast that we obtain via integration by parts in (1). Here we can conduct a nestedtest for the adequacy of the model at k = y i ∼ α + γ X i ( ) + γ X i ( ) + (cid:90) γ ( t ) X i ( t ) dt . (4)This model adds two degrees of freedom to the original functional linear model using just X i ( t ) and can thus be represented as a nested test with an appropriate contrast matrix. However, thismodel is over-specified and corresponds to (2) only under the constraint: γ + γ + (cid:90) γ ( t ) dt =
0. (5)The left-hand side of this equation represents a linear contrast that can be tested via a Wald-typeprocedure or equivalently by solving for γ and fitting the model y i ∼ α + γ [ X i ( ) − X i ( )] + (cid:90) γ ( t )[ X i ( t ) − X i ( )] dt , (6)from which it should be clear that (3) cannot be expressed as being nested within (6). Becausethese are equivalent tests, we will take the first approach and test the agreement with model (2)through contrast. 4e can thus define a two-stage procedure:1) Test the significance of γ and γ in (4) to assess the adequacy of using X ( t ) as a covariaterelative to the alternative X ( ) ( t ) .2) Test the significance of the contrast (5) as a goodness of fit assessment of the functional linearmodel using X ( ) ( t ) as a covariate.As we discuss below, the test of the contrast should, in theory, be equivalent to a comparison of(2) with (4). However, the numerical implementation of these tests in commonly-used functionaldata analysis software may render this correspondence inexact in practice, and we recommendassessing both the estimated (2) and (4) under constraint (5) when choosing a model. Using similar arguments, we can also consider testing a lower-order derivative as an alternative.To illustrate this, we swap the roles of (2) and (3) so that we start with a model for X ( ) ( t ) andconsider X ( t ) as an alternative. Here again integration by parts yields (cid:90) β ( t ) X i ( t ) dt = β ( − ) ( ) X i ( ) − β ( − ) ( ) X i ( ) − (cid:90) β ( − ) ( t ) X ( ) i ( t ) dt ,where we have used the anti-derivative β ( − ) ( t ) = (cid:90) t β ( s ) ds ,and we will set β ( − ) ( ) = y i ∼ δ X i ( ) + (cid:90) δ ( t ) X ( ) i ( t ) dt ,and testing H : δ =
0. As above, the additional degrees of freedom over-specify the model andan exact agreement with (2) requires the constraint: δ − δ ( ) = .3 Moving More Than One Derivative The same arguments can be extended to tests moves of more than one derivative. For example, tocompare 0 th and 2 nd derivatives, we can iterate integration by parts: (cid:90) β ( t ) X ( ) i ( t ) dt = β ( ) X ( ) i ( ) − β ( ) X ( ) i ( ) − β ( ) ( ) X i ( ) + β ( ) ( ) X i ( )+ (cid:90) β ( ) ( t ) X i ( t ) dt ,which can be assessed by including end-point impacts for X and X ( ) : y i ∼ ζ X i ( ) + ζ X i ( ) + ζ X ( ) i ( ) + ζ X ( ) i ( ) + (cid:90) ζ ( t ) X i ( t ) dt ,with the alternative model based on X ( ) i ( t ) corresponding to the contrasts: ζ + ζ + (cid:90) ζ ( t ) dt = ζ + ζ − (cid:90) (cid:18) ζ + (cid:90) t ζ ( s ) ds (cid:19) dt = β ( ) = β ( ) =
0. If the X i ( t ) are periodic with X i ( ) = X i ( ) – if represented by Fouriercomponents, for example – (4) is not estimable, but we also have no power if β ( ) = β ( ) .Testing in the converse direction will similarly have no power if (cid:82) β ( t ) = We provide a numerical implementation of the test above through the penalized basis expansionframework described in Ramsay and Silverman (2005) and taken up in several software packages,see fda package of Ramsay et al. (2020), fda.usc package of Febrero-Bande and Oviedo de la Fuente(2012) and refund package of Goldsmith et al. (2019). Before describing the calculation that weundertake, we first note a number of numerical issues that may make the correspondence betweenfitting the model (4) under constraints (5) inexact and some consequences of this.The first observation is that in many popular FDA software packages, derivatives are notnecessarily represented exactly. For example, the fda package represents the functions X i via abasis expansion. The derivatives of such functions need not themselves be within the span of this6asis, but deriv.fd will create X ( k ) i as functional data by projecting the derivative onto the basisexpansion for X i . This introduces a numerical error into the integration by parts formula (1) whoseseverity depends on the basis used and the smoothness of the X i .Additionally, any estimate for β k ( t ) is subject to bias associated with the basis expansion orthe smoothing penalty. Thus in (4) it may be easier, with finite data, to estimate γ ( t ) with target − β ( ) ( t ) than to estimate β ( t ) directly in (2) or vice versa . To account for this, we have introduceda noncentrality parameter in the tests we describe in Section 2.5. However, these biases can stillaffect the level or power of the test, particularly when one representation of the relationship issignificantly smoother than another.Both of these observations mean that the observed squared error from fitting (4) under con-straints (5) may be different from that for fitting (2) despite these being theoretically equivalent. If,as we find in the Tecator data, our contrasts both conclude that (3) is inadequate but (2) is not, thechoice of using (4) versus re-fitting (2) depends on their predictive performance and the purposeof the modeling exercise.In this paper, we have not examined the use of functional principal components regression(Yao et al., 2005). If we use the eigenfunctions for some derivative X ( k ) as a basis expansion(that we hold fixed even when examining a different derivative), the calculations below remainunchanged. However, we would expect a strong bias towards using the derivative that producedthe eigenfunctions. It may be more natural to project onto a different eigenbasis for each derivativein which case the J test detailed in Section 3 can be employed, but the change of representationfrom β to β is much harder to account for mathematically. All the models described above can be fit through functions in one of several software packages forthe FDA. Our discussion here centers on the use of a single functional covariate, but the extensionto an additional scalar or functional covariates is straightforward.We use a basis expansion Φ ( t ) = [ φ ( t ) , . . . , φ K ( t )] to represent β k ( t ) and define the designmatrix (cid:104) Z k (cid:105) ij = (cid:90) X ( k ) i ( t ) φ j ( t ) dt ,so that (cid:82) X ( k ) i ( t ) γ ( t ) dt = Z ki · g for a vector of coefficients g . We write X and X for the vectorscontaining the X i ( ) and X i ( ) respectively and we assume that a quadratic smoothing penalty isapplied to γ ( t ) that can be represented as g (cid:62) P g for some matrix P (e.g. see Ramsay et al., 2009).7e then estimate parameters in (4) by minimizing (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Y − α − γ X − γ X − Z g (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + λ g (cid:62) P g ,which gives ˜ g = (cid:16) ˜ Z (cid:62) ˜ Z + λ ˜ P (cid:17) − ˜ Z (cid:62) Y ,using the augmented objects ˜ g = ( α , γ , γ , g ) , ˜ Z = [ X , X , Z ] and ˜ P contains P preceded bythree rows and columns of 0’s.We can estimate σ fromˆ σ = n − df (cid:12)(cid:12)(cid:12)(cid:12) Y − ˜ Z ˜ g (cid:12)(cid:12)(cid:12)(cid:12) , df = tr (cid:18) ˜ Z (cid:16) ˜ Z (cid:62) ˜ Z + λ ˜ P (cid:17) − ˜ Z (cid:62) (cid:19) .Using the sandwich matrix V = (cid:16) ˜ Z (cid:62) ˜ Z + λ ˜ P (cid:17) − ˜ Z (cid:62) Z (cid:16) ˜ Z (cid:62) ˜ Z + λ ˜ P (cid:17) − ,we can obtain an F statistic for the contrast C ˜ g F = p C ˆ σ ˜ g (cid:62) C (cid:62) (cid:16) C (cid:62) VC (cid:17) − C ˜ g ,where we are interested in the contrast matrices C = [ × I × × k ] to assess the significance of γ and γ and C = [ m ] to test (5) where m j = (cid:82) φ j ( t ) dt and p C is the dimension of the column space of C .Under the null hypothesis, and ignoring smoothing and numerical biases, F should be dis-tributed as an F statistic with degrees of freedom corresponding to p C and n − df. However,smoothing can generate a significant bias in favor of the alternative, compromising the level of thetest. To account for this, we introduce a non-centrality parameter as follows:1) Obtain fitted values ˆ Y a and an estimate of residual variance ˆ σ a using the alternative model(4), using the smallest smoothing parameters that allow for model identifiability; in ourimplementation we used λ = − .2) Project ˆ Y a onto the null hypothesis space of our test as follows:ˆ Y = ˜ Z ( I − C (cid:62) ( CC (cid:62) ) C )( ˜ Z (cid:62) ˜ Z ) − ˜ Z (cid:62) ˆ Y a .This first represents ˆ Y a in terms of the coefficients in the model (4), and then projects into thenull space of C , being equivalent in this case to setting γ = γ = Y ˜ g = (cid:16) ˜ Z (cid:62) ˜ Z + λ ˜ P (cid:17) − ˜ Z (cid:62) ˆ Y ,and form the non-centrality parameter η = σ a ˜ g (cid:62) C (cid:62) (cid:16) C (cid:62) VC (cid:17) − C ˜ g .We now test F against the relevant quantile of an F -distribution with p C and n − df degrees offreedom and non-centrality parameter η . The non-centrality parameter corrects the level of the testfor smoothing bias; from (1), if β ( ) = β ( ) = X ( t ) or X ( ) ( t ) is equivalent and adistinction between them depends on whether β ( t ) or β ( ) ( t ) incurs more bias. In the context ofour test, the point impacts at 0 and 1 can be correlated with an over-smoothed linear functional,thereby affecting the level of the test. We base the non-centrality parameter on the orthogonalprojection of an undersmoothed model onto the null hypothesis space to estimate this bias as wellas possible.Notice that ˆ Y a and ˆ σ a are calculated as part of a search over smoothing parameter values. Thusthe calculation of the non-centrality parameter only requires a second penalized regression at eachvalue of λ .In a similar fashion, we may test (3) as an alternative to a null hypothesis (2) by applying thesame structure as above, forming ˜ Z = [ X , Z ] and the same contrast C to reject (2) and thenassessing C = (cid:104) − Φ ( ) (cid:105) using the analogous statistic and non-centrality parameter as above.Similarly a comparison of the 0th with the 2nd derivative as an alternative can be made byletting X (cid:48) = X ( ) ( ) and X (cid:48) = X ( ) ( ) and setting ˜ Z = [ X , X , X (cid:48) , X (cid:48) , Z ] and assessing C = [ × I × × k ] and C = m − − ¯ m with ¯ m j = (cid:82) (cid:82) t φ j ( s ) dsdt . When not available analytically, we can obtain this vector by observingthat (cid:90) (cid:90) t φ j ( s ) dsdt = (cid:90) φ j ( t ) dt − (cid:90) t φ ( t ) dt
9y a further integration by parts argument. The expressions above can be evaluated by, forexample, the function inprod in the fda package. See the code in the supplementary materials fordetails. J Test for More General Models
An analysis using integration by parts as described above requires X ( k ) ( t ) to enter the model viaa smooth linear operator (cid:82) β k ( t ) X ( k ) ( t ) dt . When this is not the case – with the non-parametricregression methods of Ferraty and Vieu (2002), or in the additive models in McLean et al. (2014) –we cannot generically embed models based on X ( k ) ( t ) and X ( j ) ( t ) for j (cid:54) = k within a common spaceand thereby allow a nested hypothesis test. Instead, we employ the J -test to assess non-nestedmodels.Specifically, we consider H : y = m ( X ) + δ (7) H : y = s ( X (cid:48) ) + δ , (8)where δ denotes independent normally distributed error term with mean 0 and variance σ δ . Totest the null hypothesis, we express y = (cid:98) m ( X ) + θ (cid:98) s ( X (cid:48) ) , (9)where (cid:98) m ( · ) and (cid:98) s ( · ) are the fitted values under the null and alternative hypotheses. Effectively,the null hypothesis is H : θ = θ within a linear model will not account for the degrees of freedomused in fitting (cid:98) m and (cid:98) s . Instead, we employ subsample-splitting (see, e.g., Jarque, 1987) to estimate (cid:98) m and (cid:98) s on subsample S and conduct a test for θ on S .Our procedure is summarized below:1) We fit a nonparametric scalar-on-function regression, and obtained fitted values (cid:98) m using S .2) We fit a nonparametric scalar-on-function regression, and obtained fitted values (cid:98) s using S .3) Via a t test using S , we examine the statistical significance of regression coefficient associatedwith (cid:98) s . 10 Simulation Examples
We explore the properties of the tests described above through a simulated framework. For this,we set a generative model y i = (cid:90) β ( t ) X ( ) i ( t ) dt + (cid:101) i , (cid:101) i ∼ N (
0, 0.01 ) , (10)where we set β ( t ) = β + ( π t ) + ( π t ) + ( π t ) ,and we note that when β = β ( ) = β ( ) = y i = (cid:90) β ( ) ( t ) X i ( t ) dt + (cid:101) i , y i = (cid:90) β ( − ) ( t ) X ( ) i ( t ) dt + (cid:101) i ,both of which we will use as null hypotheses below. In this framework, varying β allows us totest power.We generated functional covariates X i by generating random coefficients for a Fourier basiswith 25 basis functions plus linear and exponential terms. Specifically X i ( t ) = d + d ( t − ) + d e ( t − ) + ∑ k = (cid:16) f k e min ( − ( k − ) ,0 ) sin ( π kt ) + g i e − ( k − ) cos ( π kt ) (cid:17) ,where all coefficients d j , f j , g j are independently normally distributed, and we have includedscaling factors as part of the basis. These are then projected onto an order 6 B-spline basis with21 knots (see Ramsay et al., 2009). The projected functions are then used as the covariates whengenerating the y i as in (10).Throughout the following, we represent coefficient functions via a basis comprising of thefunctions 1, t , { sin ( π kt ) , cos ( π kt ) } k = . This is a Fourier basis augmented with a linear term. Thelinear term is necessary to represent β ( − ) ( t ) = (cid:82) t β ( s ) ds when β (cid:54) = P to be derived from a second derivative penalty (cid:82) ˆ β ( ) ( t ) dt .Figure 1 demonstrates the power of the three tests detailed in Section 2 as a function of β .We generated a sample of 250 covariates as above and reproduced the responses y i igure 1: Power analysis of tests between different derivatives. At β = all models are equivalent. We observe thatselecting λ via OCV increases power, but the correlation between X ( ) and the point impacts ( X ( ) , X ( )) can result in values of λ that compromise the level of the test in the X versus X ( ) case. a small but fixed value of λ – enough to ensure that matrix inverses are well defined – and for λ chosen by ordinary cross-validation.We find that the power of testing X ( ) versus X ( ) increases much more rapidly than testing X versus X ( ) (note two orders of magnitude difference in the range of the x -axes in Figure 1).Using λ chosen by cross-validation improves power relative to λ = − for testing X versus X ( ) ,but the opposite is true when testing X versus X ( ) . This is likely due to smoothing effects; therelationship for X ( ) is considerably smoother than for X , thus introducing significant smoothingbias and, therefore, a large non-centrality parameter. Without the non-centrality parameter, Z can have a very high correlation with the point impacts ( X ( ) , X ( ) , X ( ) ( ) , X ( ) ( )) which thencompensate for the bias induced by smoothing penalties and resulted in rejection over 50% of thetime under the null hypothesis. By contrast, the point impacts in the test of X ( ) versus X ( ) arenearly uncorrelated with the functional component of the model, partly accounting for its higherpower. This result suggests that including point impacts may be useful to alleviate smoothingbias in functional linear regression, whether or not they necessarily imply the use of a differentderivative. We illustrate the proposed tests using an example that focuses on estimating the fat contentof meat samples based on near-infrared (NIR) absorbance spectra. These data were obtainedfrom http://lib.stat.cmu.edu/datasets/tecator , and have been studied by Ferraty and Vieu122006) and Aneiros-P´erez and Vieu (2006), among many others. Each sample contains finelychopped pure meat with different percentages of the fat, protein, and moisture contents. Foreach unit i (among 215 pieces of finely chopped meat), we observe one spectrometric curve,denoted by T i , which corresponds to the absorbance measured at a grid of 100 wavelengths (i.e., T i = [ T i ( t ) , . . . , T i ( t )] ). We also observe its fat, protein, and moisture contents X ∈ R , obtainedby chemical processing. Graphical displays of the original spectrometric curves and their first andsecond derivatives are shown in Figure 2.
850 900 950 1000 1050
Spectrometric curves
Wavelength (mm) A b s o r ban c e s
850 900 950 1000 1050 − . . . . . . . First derivative of spectrometric curves
Wavelength (mm) A b s o r ban c e s Fat < 20%Fat >= 20%850 900 950 1000 1050 − . − . . . . Second derivative of spectrometric curves
Wavelength (mm) A b s o r ban c e s Figure 2:
Graphical displays of spectrometric curves and their 1 st and 2 nd derivatives. Curves with fat content < are shown in solid red lines, while curves with fat content > = are shown in blue dashed lines.Bottom-right gives p-values for each test between derivatives: the top panel is tests of the adequacy of thenull hypothesis; the bottom panel gives p-values for the adequacy of the alternative derivative. Verticallines indicate the cross-validated value for the test of the same line type. y i = β + Z i β + (cid:90) β k ( t ) X ( k ) i ( t ) dt + (cid:101) i , (11)where Z i represents the linear effect of covariates, which can be incorporated naturally into thetests above. Applying our linear-specification test we find that tests of X versus X ( ) , X ( ) versus X ( ) and X versus X ( ) all reject at the cross-validated value of λ ; X versus X ( ) does reject thesecond test indicating possible further model elaborations, but no others do. We also tested theconverse X ( ) versus X ( ) without rejecting. These results are displayed graphically in the bottompanel of Figure 2, where we have plotted p-values of each test as a function of λ and indicatedvalues minimizing cross-validation with vertical lines.The second test for X versus X ( ) also rejects, suggesting that simply using the second derivativemay not be adequate. We can assess the robustness of this conclusion by examining the sametest between derivatives using the nonparametric functional regression techniques of Ferratyand Vieu (2002). With the nonparametric model, we implement the J -test. Between the 0th and1st derivative, we obtain a p -value of < × − , which indicates a strong preference towardsthe 1st derivative. Further, we compare the 1st and 2nd derivative, and we obtain a p -value of < × − , which also indicates a strong preference towards 2nd derivative. Having selected k = y i = β + Z i β + (cid:90) β ( t ) X ( ) i ( t ) dt + m [ X ( ) i ( t )] + (cid:101) i . (12)Using the J -test, we compare the functional partial linear model in (11) and a semiparametricregression model in (12). Based on the p -value of 5.25 × − , we conclude that there may be anonlinear effect between 2nd order derivative of the spectroscopy curve and fat content.14hile the p -values were computed based on in-sample goodness-of-fit using all the datasamples, we suggest using sample splitting to examine out-of-sample predictive accuracy. Amongthe 215 curves, we randomly select 160 curves as the training sample with the remaining 55 curvesas the testing sample. We implement all the tests again and report the corresponding p -values inTable 1. Table 1: p-values for various tests for selecting optimal derivative in a functional linear model and for selectingpreferable model based on either in-sample goodness-of-fit or out-of-sample predictive accuracy.
CriterionModel Derivative Goodness-of-fit Predictive accuracyFunctional linear model 1st vs 0th 2 × − × − × − k × − × − Functional partial linear model vsSemiparametric model 5.25 × − × − The use of derivatives is a feature that distinguishes functional from multivariate data. Theselection of which derivative to use can make a substantial difference to the performance offunctional regression. Despite the observation that derivatives can be important, there has beenrelatively little formal attention given to this problem.Within a linear model, derivatives of functional covariates cover non-nested function spaces.However, we have shown that a simple integration by parts analysis allows us to embed modelsbased on two different derivatives within a common space by adding a finite number of pointimpacts. This allows us to construct finite-dimensional contrasts to assess the fit of each derivativewhich can be tested using standard procedures. In contrast to linear models, more general modelscannot be as readily embedded in a common space. Instead, we have suggested adapting the J testof Davidson and MacKinnon (1981) with subsample splitting to distinguish between two models15hat have already been fit.While we have shown that these models perform well in simulation, there is clear scope forfurther development. An important component of our tests is a correction for the bias, since β ( t ) or β ( ) ( t ) may be easier to estimate, and this can affect the conclusions that we draw; while ournon-centrality parameter appears to work well a better theoretical grounding for it would giveuseful guidance. Similarly, the finite-dimensional representation of X ( k ) ( t ) can make the implicitfunction theorem inexact. In non-parametric models, we have proposed a generic framework, butthis comes at the cost of sample splitting and will likely be inefficient for any given non-parametricmodel; more detailed analysis will need to focus on the particular model at hand. Acknowledgments
The first author was partially supported by NSF grants DMS-1053252 and DEB-1353039. Thesecond author acknowledges a sabbatical opportunity from the Research School of Finance,Actuarial Studies and Statistics at the Australian National University, and the hospitality of theDepartment of Statistical Science at Cornell University.16 eferences
Aguilera, A., Escabias, M. and Valderrama, M. (2008), ‘Discussion of different logistic models withfunctional data. Application to systemic lupus erythematosus’,
Computational Statistics and DataAnalysis (1), 151–163.Aneiros-P´erez, G. and Vieu, P. (2006), ‘Semi-functional partial linear regression’, Statistics &Probability Letters (11), 1102–1110.Aneiros-P´erez, G. and Vieu, P. (2008), ‘Nonparametric time series prediction: A semi-functionalpartial linear modeling’, Journal of Multivariate Analysis , 834–857.Dalzell, C. J. and Ramsay, J. O. (1993), ‘Computing reproduing kernels with arbitrary boundaryconstraints’, SIAM Journal of Scientific Computing (3), 511–518.Davidson, R. and MacKinnon, J. G. (1981), ‘Several tests for model specification in the presence ofalternative hypotheses’, Econometrica (3), 781–793.Febrero-Bande, M. and Gonz´alez-Manteiga, W. (2013), ‘Gregression additive models for functionaldata’, Test (2), 278–292.Febrero-Bande, M. and Oviedo de la Fuente, M. (2012), ‘Statistical computing in functional dataanalysis: The R package fda.usc’, Journal of Statistical Software (4), 1–28.Ferraty, F. and Vieu, P. (2002), ‘The functional nonparametric model and application to spectromet-ric data’, Computational Statistics (4), 545–564.Ferraty, F. and Vieu, P. (2006), Nonparametric Functional Data Analysis , Springer, New York.Ferraty, F. and Vieu, P. (2009), ‘Additive prediction and boosting for functional data’,
ComputationalStatistics & Data Analysis (4), 1400–1413.Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M. W., Swihart, B.,Xiao, L., Crainiceanu, C. and Reiss, P. T. (2019), refund: Regression with Functional Data . R packageversion 0.1-21. URL: https://CRAN.R-project.org/package=refund
Hastie, T. and Mallows, C. (1993), ‘A statistical view of some chemometrics regression tools(discussion)’,
Technometrics (2), 140–143. 17yndman, R. J. and Shang, H. L. (2009), ‘Forecasting functional time series (with discussions)’, Journal of the Korean Statistical Society (3), 199–211.Jarque, C. M. (1987), ‘Sample splitting and applied econometric modeling’, Journal of Business &Economic Statistics (2), 267–274.McLean, M. W., Hooker, G., Staicu, A. M., Scheipl, F. and Ruppert, D. (2014), ‘Functional general-ized additive models’, Journal of Computational and Graphical Statistics (1), 249–269.M ¨uller, H.-G. and Stadm ¨uller, U. (2005), ‘Generalized functional linear models’, The Annals ofStatistics (2), 774–805.Quintela-del-R´ıo, A., Ferraty, F. and Vieu, P. (2011), ‘Analysis of time of occurrence of earthquakes:A functional data approach’, Mathematical Geoscience (6), 695–719.Quintela-del-R´ıo, A. and Francisco-Fern´andez, M. (2011), ‘Nonparametric functional data estima-tion applied to ozone data: Prediction and extreme value analysis’, Chemosphere (6), 800–808.Ramsay, J. and Hooker, G. (2017), Dynamic Data Analysis: Modeling Data with Differential Equations ,Springer, New York.Ramsay, J., Hooker, G. and Graves, S. (2009),
Functional Data Analysis with R and MATLAB , Springer,Dordrecht.Ramsay, J. O., Graves, S. and Hooker, G. (2020), fda: Functional Data Analysis . R package version5.1.5.1.
URL: https://CRAN.R-project.org/package=fda
Ramsay, J. and Silverman, B. (2005),
Functional Data Analysis , 2nd edn, Springer, New York.Savitzky, A. and Golay, M. J. E. (1964), ‘Smoothing and differentiation of data by simplified leastsquares procedures’,
Analytical Chemistry (8), 1627–1639.Shang, H. L. (2019), ‘Visualizing rate of change: An application to age-specific fertility rates’, Journal of the Royal Statistical Society: Series A (1), 249–262.Yao, F., Mller, H.-G. and Wang, J.-L. (2005), ‘Functional data analysis for sparse longitudinal data’,
Journal of the American Statistical Association (470), 577–590.Yao, F. and M ¨uller, H.-G. (2010), ‘Functional quadratic regression’,
Biometrika97