Functional Registration and Local Variations: Identifiability, Rank, and Tuning
FFunctional Registration and Local Variations:Identifiability, Rank, and Tuning
Anirvan Chakraborty and Victor M. Panaretos
Institut de Math´ematiques,´Ecole Polytechnique F´ed´erale de Lausannee-mail: [email protected] ; [email protected] Abstract:
We develop theory and methodology for nonparametric registration of functional datathat have been subjected to random deformation of their time scale. The separation of this phase(“horizontal”) variation from the amplitude (“vertical”) variation is crucial for properly conductingfurther analyses, which otherwise can be severely distorted. We determine precise nonparametricconditions under which the two forms of variation are identifiable, and this delicately depends onthe underlying rank. Using several counterexamples, we show that our conditions are sharp if onewishes a truly nonparametric setup. We show that contrary to popular belief, the problem canbe severely unidentifiable even under structural assumptions (such as assuming the synchroniseddata are cubic splines) or roughness penalties (smoothness of the registration maps). We thenpropose a nonparametric registration method based on a “local variation measure”, the main el-ement in elucidating identifiability. A key advantage of the method is that it is free of tuning orpenalisation parameters regulating the amount of alignment, thus circumventing the problem ofover/under-registration often encountered in practice. We carry out detailed theoretical investiga-tion of the asymptotic properties of the resulting functional estimators, establishing consistencyand rates of convergence, when identifiability holds. When deviating from identifiability, we give acomplementary asymptotic analysis quantifying the unavoidable bias in terms of the spectral gap ofthe amplitude variation, establishing stability to mild departures from identifiability. Our methodsand theory cover both continuous and discrete observations with and without measurement error.Simulations demonstrate the good finite sample performance of our method compared to othermethods in the literature, and this is further illustrated by means of a data analysis.
Keywords : Identifiability, Phase Variation, Synchronisation, Warping
Contents a r X i v : . [ s t a t . M E ] O c t . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations
1. Background and Contributions
Background
Functional observations can fluctuate around their mean structure in broadly two ways: (a) amplitudevariation, and (b) phase variation. The first type of variation is analysed using functional principalcomponent analysis, which stratifies the variation in amplitude (or variation in the “vertical axis”)across the different eigenfunctions of the covariance operator of the underlying distribution. The secondkind of variation, if present, is more subtle and can drastically distort the analysis of a functional dataset.It typically manifests itself in functional data representing physiological processes or physical motion,and consists in deformations of the time scale of the functional data (or variation in the “horizontalaxis”), associating to each observation its own unobservable time scale resulting from a transformationof the original time scale by a time warp. Specifically, instead of observing curves t X i p t q : r , s Ñ R u ni “ , one actually observes warped versions r X i “ X i ˝ T ´ i , where the T i ’s are unobservable (random)homeomorphisms termed warp maps . In the presence of phase variation, the mean of the warped dataconditional on the warping, E p X i | T i q “ µ ˝ T ´ i , is a distortion of the true mean µ by the warp map. Failingto account for the time transformation will yield deformed mean estimates, converging to E r µ ˝ T ´ i s ratherthan µ . More dramatic still will be the effect on the estimation of the covariance of the latent process,inflating its essential rank, and yielding uninterpretable principal components. We refer to Section 2 inPanaretos and Zemel (2016) for a detailed discussion of these effects. Consequently, in the presence ofphase variation in the data, the natural first step in the analysis should be to register the data, i.e., tosimultaneously transform/synchronise the curves back to the objective time scale.Owing to the rather complex nature of the registration problem, a variety of different assumptions onthe latent process X i and the warp maps T i have been considered, and correspondingly a multitude ofmethods have been investigated: landmark based registration (Kneip and Gasser, 1992); template/targetbased registration (Ramsay and Li, 1998); registration using dynamic time warping (Wang and Gasser,1997, 1999); registration based on local regression (Kneip et al., 2000); a “self-modelling” approachby Gervini and Gasser (2004) for warp maps expressible as linear combinations of B-splines; relatedregistration procedures under assumptions on functional forms of the warp maps that result in a finitedimensional family of deformations (Rønn, 2001; Gervini and Gasser, 2005); a functional convex synchro-nization approach to registration (Liu and M¨uller, 2004); registration using “moments” of the data curves(James, 2007); registration based on a parsimonious representation of the registered observations by theprincipal components (Kneip and Ramsay, 2008); pairwise registration of the warped functional dataunder monotone piecewise-linear warp maps (Tang and M¨uller, 2008); a joint amplitude-phase analysiswith this pairwise registration procedure but considering step-function (thus finite dimensional) approx-imations of the warp maps using finite difference of their log-derivatives (centered log-ratio transform)(Hadjipantelis et al., 2015); registration when the warp maps are generated as compositions of elemen-tary “warplets” (Claeskens, Silverman and Slaets, 2010); and registration using a warp-invariant metricbetween curves when the warp functions are diffeomorphisms on an interval (Srivastava et al., 2011). Theabove list is not exhaustive and we refer to Marron et al. (2015) for an oveview and comparison of someof the registration procedures mentioned above. More recently, Pigoli et al. (2017) applied the pairwiseregistration procedure of Tang and M¨uller (2008) for two-dimensional curves, where the warping is inonly one of the dimensions, while Lila and Aston (2017) generalized the pairwise registration method formanifold valued data.Several of the above contributions consider the case when the warp maps are themselves random, andin such cases, a canonical set of assumptions is usually required:(a) T is a strictly increasing homeomorphism with probability one, and(b) E p T q “ Id , where Id is the identity map, Id p x q “ x .The first assumption rules out “time-reversal” or “time-jumps”, while the second disallows an overallspeed-up or slow-down of time. Further to these natural assumptions, most of the above cited papersimpose additional smoothness and structural assumptions on the warp maps, which require tuning pa- . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations rameters to be selected. However, it is unclear whether these additional assumptions are either necessaryor indeed sufficient for identifiability to hold. It is an open problem to determine what assumptions mustone minimally impose on the latent functional data generating process so that the registration problembe identifiable under conditions (a) and (b) on the warp maps. This is of importance to understand since,in practice, one rarely has more detailed insights regarding the underlying warping phenomenon.Consider the model X i p t q “ ξ i φ p t q ` δ(cid:15) i p t q , i “ , , . . . , n (1)for the latent process, with φ a unit norm deterministic function, ξ i random scalars, and (cid:15) i p t q zero-meanrandom functions of unit variance (i.e. E || (cid:15) i || “ δ is unrestricted, the model (1) spans anypossible functional datum. The value of δ then regulates the balance between an (effectively) low rankmodel ( δ ! var t ξ i u ) or a higher rank model (larger δ „ var t ξ i u ). When one has exactly δ “ δ is small relative to var t ξ i u (for this reason,and for ease of reference, we thus henceforth refer to Model (1) as the “standard model”). In other words,it is postulated that if it were not for phase variation, important landmark features such as peaks andvalleys of the latent process would not drastically change from realisation to realisation. In effect, thereseems to be a a certain concordance that identifiability (and hence consistency in the usual sense) restscrucially on an implicit assumption that the amplitude variation of the syncrhonised functions is of lowrank . In other words, that phase variation is dominant over amplitude variation.Observe that the dominating component ξ i φ p T ´ i p t qq in the warped process X i p T ´ i p t qq obtained bywarping model (1) forms a sub-class of the so-called general non-linear shift models (NLSM). These mod-els find extensive use in comparison of semi-parametric regression models (see, e.g., H¨ardle and Marron(1990)), and have been studied in the context of landmark and dynamic registration techniques by Kneipand Gasser (1992) and Wang and Gasser (1997, 1999). Also note that the landmark principle of regis-tration essentially stipulates that the true curves have similar shape (thus having the same landmarks)but possibly differ in their amplitude component. Although some of the earlier papers, e.g., Ramsay andLi (1998), Kneip et al. (2000), Kneip and Ramsay (2008), Claeskens, Silverman and Slaets (2010) con-sider higher rank models for the latent process corresponding to nontrivial δ (with additional structuralassumptions on warp maps), it is not known whether these procedures are truly identificable/consistent.Indeed, Kneip and Ramsay (2008) (see p. 1160) acknowledged the fact that for such higher rank models,one can have different valid registrations based on the degree of complexity of the warp maps that oneallows (cf. Counterexample 5). Further, as hinted in Tang and M¨uller (2008), who consider model (1),identifiable (consistent) registration appears not to be guaranteed unless one lets δ Ñ n Ñ 8 . Our Contributions
We contribute to the nonparametric synchronisation problem with theory, methodology, and asymptotics,and corroborate our findings with simulations and a data analysis:1. Firstly, we provide a comprehensive study of the issue of identifiability, which is notorious infunctional registration but to date remained largely open. In particular, we provide sharp conditionsfor the standard model 1 to be identifiable, elucidating the role of the parameter δ that controls theeffective rank of the synchronised process (Section 2). Specifically, we prove that the registrationproblem is identifiable when the amplitude variation is exactly of rank 1, i.e. δ “ sharp . It cannot be relaxed while rescuing nonparametric identifiability, even undercircumstances that were informally expected to suffice: spline models for the synchronised process,smoothness restrictions on the warp maps, rank restrictions on the warp maps, or a combinationthereof. Indeed, so reliant is identifiability on the rank 1 assumption, that even rank 2 models failto be identifiable. Our findings serve as a word of caution to practitioners, and it appears that a . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations tentative conclusion is that low rank (or at least approximately low rank) assumption is effectively necessary .2. Secondly, we develop methodology to address the problem of nonparametric and consistent recoveryof the warp maps from discretely warped curves, without structural assumptions on the warp maps further to (a) and (b), and without any penalisation or tuning parameters related to the warp mapsthemselves. Minimal structural assumptions are particularly desirable since, in practice, one rarelyhas more detailed insights regarding the underlying warping phenomenon. And, circumventingpenalisation/tuning has two crucial practical advantages: there is no danger of “over-registering”(overfitting) the data, on account of the tuning of a penalty on the registration maps (cf. the discus-sion in the paragraph before this subsection); and, there is no arbitrary pre-processing choice madein the registration analysis, so that any further statistical analyses/conclusions are not contingenton tuning choices. Our methodology is adapted to cover all three standard observation settings: complete observation , discrete observation , and discrete observation with measurement error .3. We carry out a complete asymptotic analysis in all three observation settings. In all cases, and underthe identifiable regime, we prove that the nonparametric estimators obtained are consistent as thenumber of observations grows, and the measurement grid becomes dense, and additionally deriverates of convergence and weak convergence for all the quantities involved (Section 4, Theorems 2,3, 4, 5). We also investigate in detail the setting when the model is unidentifiable. Consistentlymakes no sense in this setting, of course, but in Section 4.2 we derive theoretical results quantifyingthe amount of asymptotic bias incurred in the registration procedure in terms of the spectral gapof the amplitude variation (Theorem 6).4. We probe the finite sample performance of our methodology (Section 5), for all possible observationregimes, and compare to other popular registration techniques. In particular, we numerically probethe impact of departurting from the identifiable regime, and observe a noteworthy stability of ourmethod to mild such departures. The method is further illustrated by analysis of a functionaldataset of Triboleum beetle larvae growth curves (Section 6), yielding biologically interpretableresults. Here, too, we compare to other registration procedures.The key to our results is the novel use of a criterion that measures the local amount of deformationof the time scale (Section 3). Specifically, we introduce the local variation measure of X , with associatedcumulative distribution J X p t q “ ş t | X p u q| du , which reflects how the total amount of variation of thecurve is distributed on the real axis. The simple but consequential insight is that by a change-of-variableargument, the total variation measure remains invariant under any strictly increasing deformation T ofthe time scale of X , namely, J X p q “ J r X p q , where r X “ X ˝ T ´ . However, it is the local amount of de-formation that provides the information about the warping mechanism. This allows us to track the effectof the time deformation on the local variation distribution and has a transparent interpretation in termsof transportation of measure. Our approach exploits this connection in order to deduce identifiabilityand to estimate the unobservable warp maps and register the functional data. Indeed, it is precisely thestructure of optimal transportation that exempts us from the need of additional smoothness/structuralconditions on the warp maps T , and consequently from the need to introduce registration tuning param-eters – even when the curves are observed over a discrete grid . This connection also guides us in theconstruction of counterexamples, illustrating where caution should be taken. Although our procedureinvolves derivatives, we actually do not need to estimate any derivatives from discretely observed data ifthere is no measurement error, as we can exploit an equivalent definition of total variation using finitedifferences over partitions of the domain. If there is measurement error, a pre-processing smoothing stepis required, but no additional penalisation of the registration maps is necessary (a smoothing step wouldanyway be eventually be required when observing discrete data under measurement error). Of course, once the warp maps are estimated, one would have to smooth the warped discrete data in order to registerthem, since the warped data are not observed at all points of their domain. And, if there is measurement error in theobservations, then some pre-smoothing will be needed. But in either case, this smoothing will be on the data itself (eitheras a pre-processing or post-processing step), and no smoothing penalties or structural assumptions will be required on theregistration maps themselves. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations
2. Identifiability and Counterexamples
Recall that the standard model for the latent/synchronised process prior to warping (Equation 1) takesthe general form X p t q “ ξφ p t q ` δ(cid:15) p t q . This, depending on the constraints imposed on the random variable ξ and the scalar δ , can be of arbitrarilylarge rank, and indeed can span any functional datum. Usually var t ξ u is expected to be the dominanteffect relative to δ (i.e. δ ! var t ξ u ), corresponding to an effectively low rank model. We now givesufficient conditions on the standard model for that identifiability will hold in a genuine nonparametricsense. In simple terms, the process must be exactly of rank 1 (i.e. δ “ (cid:15) p t q P span t φ p t quq . Theorem 1 (Identifiability) . Let t X , X u be a random elements in C r , s of rank one, i.e., X i p t q “ ξ i φ i p t q for deterministic functions φ i with || φ i || “ , and with φ i vanishing on at most a countable set.Assume that t T , T u are strictly increasing homeomorphisms in C r , s , and such that E p T i q “ Id .Write r X i “ X i p T ´ i p t qq . Then, r X d “ r X ðñ ! T d “ T , φ “ ˘ φ , ξ “ ˘ ξ ) . The assumption that φ does not vanish except perhaps on a countable set excludes the possibility ofconstant functions, in which case the problem is vacuous and identifiability trivially fails. Note that theidentifiability result in Theorem 1 does not require that ξ and T be independent. Remark 1.
Further to being evidently natural, the assumption E p T q “ Id in the above theorem cannotbe dropped as in shown by the following counterexample. Suppose that E p T q “ f with f ‰ Id and f being a strictly increasing homeomorphism on r , s . Define S “ T ˝ f ´ . It follows that E p S q “ Id . Now r X “ ξφ ˝ T ´ “ ξφ ˝ f ´ ˝ S ´ “ ξφ ˝ S ´ , where φ “ φ ˝ f ´ . Let c “ || φ || . Define ξ “ c ξ and φ “ φ { c . Then, || φ || “ . So the resulting processes are equal but have been generated using differentwarp maps S and T , which do not have the same distribution as they have different means. In this case,one can estimate φ (using the algorithm given in Section 3), and thus register the warped observationsto the new time scale given by f , i.e., get an estimate of X i ˝ f ´ instead of the true X i . Of course, if f is known, then these registered observations can be re-registered to the original time scale. So the essenceof the assumption E p T q “ Id is that the objective time scale be known , and not so much that it be theidentity. One might understandably argue that the rank 1 assumption in the previous theorem is restrictive.Perhaps surprisingly, though, the condition can be seen to be sharp. We construct a series of counterex-amples below, demonstrating how badly identifiability can fail with higher ranks (even rank 2). Theseillustrate that the situation cannot be rectified at a genuinely nonparametric level, not even by assumingspecific classes of models on the synchronised processes (such as splines or trigonometric functions) orimposing qualitative non-parametric constraints, e.g., roughness penalties, Sobolev norm bounds or rankrestrictions on the warp maps (or combinations of these). It looks as though, if one wishes to maintainidentifiability at a genuinely non-parametric level, a rank 1 assumptions is essentially necessary . Counterexample 1.
Our first counterexample shows that the same rank 2 process can arise either aswarped rank 1 process, or as a syncrhonised rank 2 process. Both the process itself and the warp mapscan be taken to be of rank at most 2 (notice that a rank 1 warp map would need to be the identityalmost surely). Define f p t q “ p t ` t q{ g p t q “ p t ´ t q{ , t P r , s . Take ξ to be a standardGaussian random variable and φ p t q “ t {? t P r , s . Now define a random warp map T suchthat P r T “ f s “ P r T “ g s “ {
2. Then T satisfies (a) and (b). Now define r X “ ξφ ˝ T ´ “ ξ T ´ “ ξ p f ´ U ` g ´ p ´ U qq , where U is a Bernoulli random variable with success probability 1 { ξ “ ξ {? V “ ξ U and W “ ξ p ´ U q so that r X “ V f ` W g , where f p t q “ f ´ p t q “ p? ` t ´ q{ g p t q “ g ´ p t q “ p ´ ? ´ t q{ t P r , s . Since f and g are C , and f and g are boundedaway from zero on r , s , so are their inverses. Also, the inverses are C as well. It is easy to check that . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − . − . . . Y_1 (rank two) t 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . T_1 t 0.0 0.2 0.4 0.6 0.8 1.0 − . − . . . Y_2 (rank two) t 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . T_2 t Fig 1 . Plots of some sample paths of the rank two latent processes Y and Y in part (A) of Counterexample 2 along withthe warp maps T and T mentioned there, which warp them into the same rank one process. Cov p V, W q “
0. Further, it is easy to show that f and g are linearly independent. Consequently, wemay define a new process Y “ V f ` W g , which is a rank two process. Define r Y “ Y ˝ Id ´ “ Y . Then, r X d “ r Y (in fact r X “ r Y ) but they have been generated using two different C latent processes, namely X and Y , and C warp maps, namely T and Id , which of course do not have the same distribution. Counterexample 2.
We will give two constructions demonstrating that the same rank one process canarise in one of infinitely many ways: (i) as a rank one analytic process with no warping, and (ii) one ofan infinite collection of rank two analytic processes subjected to warping by one of an infinite collectionof non-trivial analytic warp maps T satisfying (a) and (b).(A) First take the latent model class to consist of linear combinations of trigonometric functions andpolynomials. Define µ p t q “ t ´ φ k p t q “ sin pp k ´ q πt q{rp k ´ q π s , t P r , s for some k ě
1. Let T k p t q “ t ´ p U k ´ q φ k p t q , where U k „ Unif p a, b q . Here a “ p { qp ´ M ´ q and b “ p { qp ` M ´ q with M satisfying M ą
1. It can be checked that T k satisfies (a) and (b) for all k ě
1. Let ξ be arandom variable independent of U k . Define X p t q “ ξµ p t q and Y k p t q “ ξµ p t q ` ξ p ´ U k q φ k p t q . It canbe checked that X “ r Y k : “ Y k ˝ T ´ k for all k ě
1. Since ξ an U k are independent, it follows thatCov p ξ, ξ p ´ U k qq “
0. Also, since x µ, φ k y “ Y k given above is infact its Karhunen-Lo´eve (KL) expansion, which is of rank 2, and this holds for all k ě
1. The plots ofsample paths of Y and Y along with the warp maps T and T are shown in Figure 1.(B) For the second construction, we take the latent model class to consist of linear combinations ofpolynomials only. Define µ p t q “ t . Fix R P N and any finite subset t k , k , . . . , k R u of N . Also, fix reals a , a , . . . , a R satisfying ř Rl “ a l “
0. Consider the Legendre polynomials P k l ` on r´ , s . Since thesesatisfy P k l ` p´ t q “ P k l ` p t q for t P r , s , it follows that ş tP k l ` p t q dt “ p { q ş ´ tP k l ` p t q dt “ φ p t q “ ř Rl “ a l P k l ` p t q and T p t q “ t ´ p U ´ q φ p t q , where U „ Unif p a, b q , where M ą || φ || : “ sup t Pr , s | φ p t q| . The above construction ensures that T p q “ T p q “
1, and T satisfies (a). It is clearthat T satisfies (b). Let X p t q “ ξt and Y p t q “ ξt ´ ξ p U ´ q φ p t q , where ξ is as in the first construction.Then, it can be shown that X “ r Y : “ Y ˝ T ´ . Also, Y is rank 2, and the above form is in fact its KLexpansion because Cov p ξ, ξ p U ´ qq “ x µ, φ y “
0, which follows as earlier.By taking ξ to be a constant random variable, this counterexample also shows that one cannot extendthe identifiable regime from ξφ p t q to µ p t q ` ξφ p t q , where µ R span t φ u . Counterexample 3.
We will show that even if one penalises the warp maps, e.g., by one or bothof ş E pr T p t q ´ t s q dt and ş E rp T p t qq s dt , still one can get infinitely many possible solutions for theregistration problem. Under the setup of (A) in Counterexample 2, ş E pr T p t q ´ t s q dt “ r? M π p k ´ qs ´ and ş E rp T p t qq s dt “ p k ´ q π {p M q . For (B) in the previous counterexample, it can be shownusing the orthogonality of the Legendre polynomials that ş E pr T p t q´ t s q dt “ t ř Rl “ a l {p k l ` qu{p M q and ş E rp T p t qq s dt “ || ř Rl “ a l P k l ` || {p M q , where || ¨ || denotes the L r , s norm. Thus, in bothcases, for any (cid:15) ą
0, the sum of the two penalty terms can be made arbitrarily small by choosing largeenough M (depending on the choices of the other parameters – k , R , k l ’s and a l ’s). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations The above facts imply that if one wants to carry out the registration using the penalization proceduremin h P T ş E tr W h p t q ´ X p h p t qqs ` λ r T p t q ´ t s ` λ p T p t qq u dt , where T is a class of C warp maps,and W h takes values in an appropriate synchronized space S of linear combinations of C functions,then we have infinitely many registrations valid registrations as follows:(i) under setup (A) – if we allow T to include monotone homoemorphisms on r , s whose deviationfrom the identity is a trigonometric function, and even if S is restricted to linear combinations of linearand trigonometric functions (both X and Y k belong to this class).(ii) under setup (B) – even if we allow T and S to only include polynomials.Note that for both (i) and (ii), the “fit” term E tr W h p t q ´ X p h p t qqs becomes zero. Counterexample 4.
Our next counterexample shows that structural restrictions on the latent synchro-nised process, such as spline models, will also fail if the rank is higher than 1. We will consider cubicsplines but one can similarly construct more elaborate counterexamples involving higher order splinesand more knots. Let φ be a cubic spline with a single knot at a P p , q , i.e., φ p t q “ ř i “ θ i t i ` δ p t ´ a q ` ,and define s p t q “ c p a ´ a q ´ p t ´ a q I t a ď t ď a u ` c p ´ a q ´ p ´ t q I t a ă t ď u , t P r , s , where c P R and a P p a , q are fixed. Let X p t q “ ξφ p t q and T p t q “ t ´ p U ´ q s p t q with U and ξ as before,and choose M ą | c |{ min tp a ´ a q , p ´ a qu . This ensures that T satisfies (a) and (b). Define Y p t q “ ξφ p t q ` V s p t qt θ ` θ t ´ θ t ´ δ p t ´ a q ` u ` V s p t qt θ ` θ t ` δ p t ´ a q ` u ` V s p t q , where V “ ξ p ´ U q , V “ ξ p U ´ q and V “ ξ p ´ U q p θ ` δ q . Note that s is a linear spline withknots at a and a . Also, p p t q : “ θ ` θ t ´ θ t ´ δ p t ´ { q ` and p p t q : “ θ ` θ t ` δ p t ´ { q ` aresplines (quadratic and linear, respectively) with knots at 1 {
2. Hence, these can be considered as elementsof the cubic spline space S with knots at a and a . So, by repeated application of Theorem 3.1 inMø rken (1991), the functions φ , sp , s p and s are elements of the space S of cubic splines with afinite set of knots (including a and a ). So, both X and Y lie in S Ą S . If we assume that φ p q ‰ φ is linearly independent of sp , s p and s (since these three functions equal zeroat t “ Y is of rank at least two. Now, it can be checked that r Y p t q : “ Y p T ´ p t qq “ X p t q . Thus,two distinct processes X and Y can be warped (by the maps Id and T , respectively) to produce thesame process.If we choose a “
0, i.e., take φ to be a cubic polynomial (which also lies in S trivially), then we canchoose s to be a spline on r , s of degree ě M ą || s || . Then, forthe same Y , the conclusion of the above counterexample holds. Counterexample 5.
Our last counterexample illustrates that even a priori knowledge of landmarksdoes not help rectify identifiability if the rank 1 condition is violated. Let X p t q “ ξt p ´ t q , t P r , s sothat the latent process has a unique maximum at t “ {
2. A priori knowledge of existence of a uniquemaximum in synchronized space can be utilized to carry out a landmark/peak alignment of the warpedcurves. Let us denote the vector space of functions with unique maximum at t “ { U , and the vectorspace of functions proportional to the bell-shaped curve f p t q “ t p ´ t q by S f . Obviously, X P S f Ă U .Let T be any warp map independent of ξ and satisfying (a) and (b). Define a new warp map S as follows: S p t q “ tT p { q I t ď t ď { u ` T p { q ` p t ´ qr ´ T p { qs I t { ď t ď u . Note that S satisfies (a)and (b). Define Y p t q “ ξT ´ p S p t qqr ´ T ´ p S p t qqs , t P r , s . It can be checked that the process Y hasa unique maximum at t , where t satisfies T ´ p S p t qq “ {
2, equivalently, t “ S ´ p T p { qq . However,from the construction of S , it is easy to check that S ´ p T p { qq “ {
2. So, Y P U . Defining r X “ X ˝ T ´ and r Y “ Y ˝ S ´ , it follows that r X “ r Y although X and Y are different processes. Further, although X P S f , it holds that Y R S f provided S ‰ T , and Y has rank at least two. This counterexample(without explicit constructions of the latent processes or of the warp maps) is mentioned in Kneip andRamsay (2008).What we learn from these counterexamples is that identifiability crucially rests upon constructing asynchronised space of processes S (contained within continuous processes on r , s ) and a warp map . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations space of processes T (contained within strictly monotone homeomorphisms onto r , s with identityexpectation) such that:(I) Warping causes the latent process to exit the synchronised space, i.e. X P S but r X R S .(II) There exists a unique process X P S such that r X “ X ˝ T ´ for some random T P T .Theorem 1 informs us that such a construction is possible by taking S to essentially be C rank 1 non-constant processes, and otherwise not restricting T except for a C assumption. The counterexamplesdemonstrate that allowing higher ranks can have severe effect on identifiability, even if S is modeled moreconcretely, or indeed if T is restricted to be smoother. In light of this, we will introduce the terminologyof “identifiable regime” to mean the pair p S , T q implied by the context of Theorem 1. Deviations fromthis regime will be generally termed as an “unidentifiable regime”: Definition 1 (Identifiable Regime) . We define the identifiable regime to involve latent synchronisedprocesses X P S , warp maps T P T , and warped processes r X p t q “ X p T ´ p t qq , where:(I1) The synchronised process space is S “ t X P C r , s : X p t q “ ξϕ p t qu , for ξ a real-valued randomvariable of finite variance and ϕ P C pr , sq is a deterministic function of unit L -norm, whosederivative vanishes at most on a countable subset of r , s .(I2) The warp map space is T “ t T P C r , s : E r T s “ Id & T strictly increasing homeomorphism u . With identifiability clarified, we now turn to nonparametric methods of estimation. Our goal will beto construct methods that perform well in the identifiable regime, remain stable under small departures(e.g. effectively rank 1 rather than precisely rank 1 models), and do not rely on tuning (which adds alayer of arbitrariness and in any case was seen to be unavailing). For these, we will require the notion of local variation measure , introduced in the next section.
3. Tuning-Free Methodology
Recall that the total variation of a continuous function h p x q : r , s Ñ R measures the total distancesweeped by the ordinate y “ h p x q of its graph, as the abscissa x moves from 0 to 1. By distortingfunctions “in the x -domain” through an increasing homeomorphism, phase variation will not affect thetotal amount of variation accrued over the interval r , s . However, it will redistribute this total variationover the subintervals of r , s . This redistribution can be measured by focussing on local variation : Definition 2 (Local Variation Distribution) . Given any real function h P C pr , sq , we define J h p t q “ sup K P K t | K | ÿ k “ | h p τ k ` q ´ h p τ k q| (2) where K t “ t τ , τ , . . . , τ | K | u is a partition of r , t s and K t is the collection of all finite partitions of r , t s .Noting that J h p q is the total variation of h , define the local variation distribution as F h p t q “ J h p t q L J h p q . Remark 2.
Recall that when h P C pr , sq , it holds that J h p t q “ ş t | h p u q| du . The general definitioncomes handy under discrete observation, this one under continuous observation. We now show that, in the identifiable regime, warping affects the local variation of the underlyingprocess in a rather predictable manner – one that can be used to motivate estimators. We will write r F “ F r X and F “ F X for simplicity. Lemma 1 (Local Variations and Warp Maps) . When r X “ X ˝ T ´ fall under the Identifiable Regime(1), F and r F are strictly monotone almost surely, E ! r F ´ ) “ F ´ “ F ´ φ , and T “ r F ´ ˝ F “ r F ´ ˝ F φ . Remark 3.
Even under the unidentifiable regime, we have T “ r F ´ ˝ F . However, in this case, F isnot deterministic unlike the identifiable regime, and we have E ! r F ´ ˇˇˇ X ) “ F ´ almost surely so that E ! r F ´ ) “ E (cid:32) F ´ ( . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Remark 4.
In the language of transportation of measure, Lemma 1 says that the warp map pushesforward the original local variation distribution to the warped local variation distribution, in fact optimally so in terms of quadratic transportation cost; and that the synchronised local variation measure is the
Fr´echet mean of the (random) warped local variation measure in Wasserstein distance.
Remark 5.
The local variation measure can also be seen through the prism of area-under-the-curvecriteria discussed by Liu and M¨uller (2004). These authors use these criteria to assign the time syn-chronization maps by utilizing the observed warped data. They derive a registration procedure based ondata-driven parametric modelling of the warp maps. We, on the other hand, aim to extract the timesynchronization maps from the observed warped data by using the local variation measure. Thus, nomodelling of the warp maps is necessary – our goal is a method that is fully data-driven and completelynon-parametric.
Now suppose we have an i.i.d. sample t r X i : i “ , , . . . , n u of randomly warped functional data thatwe wish to register, i.e. we wish to construct nonparametric estimators of the t X i u ni “ and the t T i u ni “ onthe basis of t Ă X i u ni “ . If we expect the data to (at least approximately) conform to the identifiable regime(1), we can rely on Lemma (1) as inspiration for tuning-free methodology. We would like to emphasizethat this methodology will be applicable whatever the “true model”, of course, but the point is for it tobe accurate under the identifiable regime, and stable when mildly departing from identifiability. We con-struct such methodology under all three different observation regimes on t Ă X i u ni “ : complete observations(Section 3.1), discrete noiseless observations (Section 3.2), and discrete observations with measurementerror (Section 3.3). We then study the performance under identifiability/unidentifiability theoretically inSection 4 and numerically in Section 5. Assuming the functions t Ă X i u are fully observed, we may proceed as follows:Step 1: Set p F “ ˜ n ´ n ÿ i “ r F ´ i ¸ ´ , noting that the t r F i u are immediately available by complete observation of the t r X i u .Note that under the identifiable regime (1), p F estimates F φ .Step 2: Estimate the warp map T i by p T i “ r F ´ i ˝ p F , and the registration map T ´ i by p T ´ i .Step 3: Register the observed warped functional data, by means of p X i “ r X i ˝ p T i .If we suspect to be in the identifiable regime (1), we may also want to estimate the pairs t φ, ξ i u . In thiscase, the obvious additional steps will be:Step 4: Compute the empirical covariance operator, say, x K r of the registered data t p X i u and estimate φ bythe leading eigenfunction p φ of x K r (as a convention, assume that this estimator is aligned with thetrue φ , i.e., x p φ, φ y ě ξ i by p ξ i “ x p X i , p φ y . Remark 6.
The above algorithm can be viewed as a non-parametric version of the pairwise registrationprocedure by Tang and M¨uller (2008) albeit at the level of local variation measures rather than the originalcurves. Consider the data to be r F , . . . , r F n . Since r F i “ F i ˝ T ´ i , we have a standard warping problemat the level of variation measures. Now suppose that we apply the pairwise registration procedure to thisnew data set as follows: p g ji “ arg min h P C ż ” r F j p h p t qq ´ r F i p t q ı dt, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations where the minimization is conditional on r F i and r F j , and C is the set of strictly monotone homeomor-phisms on r , s . This corresponds to choosing the shape penalty parameter λ “ (see p. 878 in Tangand M¨uller (2008)) and not placing any structural assumption of the pairwise warping function g ji , i.e.,the above minimization is non-parametric. It is now easy to see that p g ji “ r F ´ j ˝ r F i . So, by equation (7)in Tang and M¨uller (2008), it follows that the pairwise registration estimator of T i is p T i,p “ ˜ n ´ n ÿ j “ p g ji ¸ ´ “ ˜ n ´ n ÿ j “ r F ´ j ˝ r F i ¸ ´ “ r F ´ i ˝ p F , which is precisely the estimator in the previous algorithm.
In the discretely observed setting, the r X i ’s are not fully observed. Instead, we observe point evaluations r X i,d “ p r X i p t q , r X i p t q , . . . , r X i p t r qq , i “ , ..., n. Here, 0 ď t ă t ă . . . ă t r ď r , s , assumed asymptotically homogeneous inthat max ď j ď r ´ p t j ` ´ t j q “ O p r ´ q as r Ñ 8 . The latent discrete process is denoted by X i,d “p X i p t q , X i p t q , . . . , X i p t r qq .Our strategy will be to mimic Steps 1–5 from the fully observed setup. Since the X i ’s are no longerfully observed, though, in order to have versions of the F i and r F i , we will draw inspiration from thegeneral definition of the local variation distribution (Equation 2 in Definition 2). First, define F i,d p t q “ ÿ j P I t | X i p t j ` q ´ X i p t j q| N r ´ ÿ j “ | X i p t j ` q ´ X i p t j q| for t P r , s and each i “ , , . . . , n , where I t is the set of all j ’s satisfying t j ` ď t . Note that becausewe only observe each curve over the grid 0 ď t ă t ă . . . ă t r ď
1, we have replaced the supremumover all grids in Equation 2 of Definition 2 by just this one (the finest grid we get to observe). Clearly, F d has jump discontinuities at the grid points t j ’s, is c`adl`ag, and satisfies F d p t q “ t P r , t q and F d p t q “ t P r t r , s .For the (discretely) observable warped process, we define r F i,d p t q “ ÿ j P I t | r X i p t j ` q ´ r X i p t j q| N r ´ ÿ j “ | r X i p t j ` q ´ r X i p t j q| , (3)The r F i,d ’s also have jump discontinuities at the grid points, and are c`adl`ag.Under the identifiable regime, in particular, we would have F i,d p t q “ F d p t q for all i “ , , . . . , n , where F d p t q “ ÿ j P I t | φ p t j ` q ´ φ p t j q| N r ´ ÿ j “ | φ p t j ` q ´ φ p t j q| . Its jumps are at most of size a r “ max ď j ď r ´ | φ p t j ` q ´ φ p t j q|{ ř r ´ j “ | φ p t j ` q ´ φ p t j q| . Moreover, in theidentifiable regime, r F i,d p t q “ ÿ j P I t | φ p s i,j ` q ´ φ p s i,j q| N r ´ ÿ j “ | φ p s i,j ` q ´ φ p s i,j q| , where s i,j “ T ´ i p t j q for each i and j are unobserved random variables. The maximum jump size of r F i,d is A i,r “ max ď j ď r ´ | φ p s i,j ` q ´ φ p s i,j q|{ ř r ´ j “ | φ p s i,j ` q ´ φ p s i,j q| . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations With the general definitions of F i,d and r F i,d in place, we can now adapt Steps 1–5 to the discrete case.In what follows, the generalized inverse of a function G is denoted by G ´ , i.e., G ´ p t q “ inf t u : G p u q ě t u .The first two steps will remain invariant, except for the fact that they will now employ the discretelocal variation measures. This means that we will not require any tuning parameters or smoothnessassumptions to estimate the warp and registration maps. The registration itself (the last three steps)will require some smoothing, of course, if it is to make sense:Step 1 ˚ : Set p F d “ t n ´ ř ni “ r F ´ i,d u ´ and p F ˚ d “ n ´ ř ni “ r F ´ i,d .Note that under the identifiable regime (1), p F d mimics F d .Step 2 ˚ : Predict the random warp map T i by p T i,d “ r F ´ i,d ˝ p F d and the registration map T ´ i by p T ˚ i,d “ p F ˚ d ˝ r F i,d “ t n ´ ř ni “ r F ´ i,d u ˝ r F i,d .Step 3 ˚ : Since the r X i ’s are observed discretely, we do not have information about their values betweengrid points. Thus, we first smooth each of the r X i,d using the Nadaraya-Watson kernel regressionestimator for an appropriately chosen kernel k and bandwidth h , denoting resulting smoothedfunctions by X : i , X : i p t q “ r ÿ j “ k ˆ t ´ t j h ˙ r X i p t j q N r ÿ j “ k ˆ t ´ t j h ˙ . Define p X ˚ i p t q “ X : i p p T i,d p t qq , i “ , , . . . , n to be the registered functional observations and write X r ˚ “ n ´ ř ni “ p X ˚ i for their mean.As in the fully observed situation, if we suspect to be in the identifiable regime (1), we estimate the pairs t φ, ξ i u as follows:Step 4 ˚ : Compute the empirical covariance operator x K r ˚ of the registered curves p X ˚ i , and use its leadingeigenfunction p φ ˚ as the estimator of φ (again, assume the convention that the sign is correctlyidentified, i.e., x p φ ˚ , φ y ě ˚ : Finally, estimate ξ i by p ξ i ˚ “ x p X ˚ i , p φ ˚ y for each i ě r X i ’s are observed, say, 0 ď t i, ă t i, ă . . . ă t i,r i ď
1, differs with i . The reasonfor this compatibility is the fact that our approach considers only one curve at a time. We formulate itin the notationally simpler case of a common grid, in order to alleviate the notation in the statement ofour asymptotic results in Section 4. As mentioned earlier, r F i,d is a step function with jump discontinuities at the grid points. In particular, r F i,d p t q “ t P r , t q and r F i,d p t q “ t P r t r , s . Thus, r F ´ i,d p q “ r F ´ i,d p q “ t r , which is lessthan 1 if t r ă
1, i.e., the grid does not include the right end-point. In this case, p F d p t q and thus p T i,d p t q isproperly defined only for t P r , t r s . Also, r F ´ i,d p u q ď t r and equality holds iff u P p r F i,d p t r ´ q , s . Thus, p F d p t r q “ inf u : n ´ n ÿ i “ r F ´ i,d p u q ě t r + “ inf t u : r F ´ i,d p u q “ t r @ i “ , , . . . , n u“ inf t u : u P X ni “ p r F i,d p t r ´ q , su “ max ď i ď n r F i,d p t r ´ q . Then, p T i,d p t r q “ p F ´ i,d p p F d p t r qq “ p F ´ i,d p max ď j ď n r F j,d p t r ´ qq “ t r . One can then extend p T i,d p t q to the whole of r , s by, e.g., linearly interpolating between p t r , p T i,d p t r qq “ p t r , t r q and p , q . This practical modification, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations in case t r ă
1, enjoys the same asymptotic properties as the originally defined estimator (Section 4),since the effect of the modification is asymptotically negligible due to the homogeneity assumptions onthe grid.Similarly, p F ˚ d p u q “ n ´ ř ni “ r F ´ i,d p u q “ t r iff u P X ni “ p r F i,d p t r ´ q , s “ p max ď i ď n r F i,d p t r ´ q , s . So, incase t r ă
1, we have p T ˚ i,d p q “ p F ˚ d p r F i,d p qq “ p F ˚ d p q “ t r ă
1. This is not a problem since this estimatoris not used in the registration procedure and the problem disappears asymptotically anyway, just asdescribed above.We conclude this section by noting that, since the estimates p T i,d of the warp maps do not involveany smoothing and are obtained from compositions of step functions, the resulting registered curves willnot be very smooth. This will be particularly noticeable if the number of grid points is small. Note thateven in that case, the estimated mean function will be smoother if the sample size is moderately large.If one is interested in obtaining a smooth registration of the sample curves, the following proceduremay be adopted. First, we produce smooth versions of the p T i,d by some non-parametric smoothingprocedure, e.g., polynomial splines of a fixed degree m , and call these new estimates as p T i,s , say. Then,we plug-in these smoothed estimates of the warp functions and define the new registered observations as p X ˚ i p t q : “ X : i p p T i,s p t qq . It is well-known that a spline smoothed estimate of a smooth function converges tothat function in the L r , s sense provided the oscillations of the function go to zero as the number ofknots grows to infinity (see Theorem 6.27 in Schumaker (2007)). The latter holds for the p T i,d ’s since theylie in L r , s (see equation (2.121) in Theorem 2.59 in Schumaker (2007)). Thus, this modified estimatorwill also provide consistent registration. It can often happen that the discretely observed functional data be additionally contaminated by measure-ment error. In this case, one has to suitably adapt the registration procedure. In the presence of measure-ment error, we observe Y i,d “ r X i,d ` e i , where r X i,d was defined in Section 3.2, and e i “ p (cid:15) i, , (cid:15) i, , . . . , (cid:15) i,r q with the t (cid:15) i,j : j “ , , . . . , r, i “ , , . . . , n u being a collection of i.i.d. error variables with zero meanand variance σ , independent of the processes and warp maps.We will modify the registration procedure as follows. First, construct a non-parametric function esti-mator of r X i , which is the derivative of the warped process r X i , using the observation Y i,d for each i , andcall this estimator p X p q i,w p¨q . Define analogues of the r F i ’s as r F i,w p t q “ ż t | p X p q i,w p u q| du N ż | p X p q i,w p u q| du, t P r , s . Note that unlike the discrete observation case described in the previous section, we now have fullyfunctional versions of r X i for each i , which allows us to mimic the algorithm in the fully observed scenarioin Section 3.1.Step 1 ˚˚ : Set p F e “ ´ n ´ ř ni “ r F ´ i,w ¯ ´ .Under the identifiable regime (1), in particular, we have p F e estimates F φ .Step 2 ˚˚ : Predict the warp map T i by p T i,e “ r F ´ i,w ˝ p F e , and the registration map by p T ´ i,e .Step 3 ˚˚ : Construct non-parametric function estimators of the r X i ’s using the Y i,d ’s, and call them p X i,w p¨q ’s.Define p X ˚ i,e p t q “ p X i,w p p T i,e p t qq , i “ , , . . . , n to be the registered functional observations.If we suspect to be in the identifiable regime (1), we estimate the pairs t φ, ξ i u as follows:Step 4 ˚˚ : Write X e ˚ “ n ´ ř ni “ p X i,e for the mean of the registered observations and let x K e ˚ denote theirempirical covariance operator. Take its leading eigenfunction, denoted by p φ e ˚ , as the estimator of φ (assuming the same sign convention as earlier). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Step 5 ˚˚ : Finally, estimate ξ i by p ξ i ˚ ,e “ x p X i,e , p φ e ˚ y for each i ě k p¨q and bandwidth h p¨q for finding p X p q i,w . We will thenuse a local linear estimator with kernel k p¨q and bandwidth h p¨q for estimating p X i,w . These choices aremotivated by the advantages of local polynomial estimators in dealing with boundary effects (see, e.g.,Fan and Gijbels (1996) and Wand and Jones (1995) for further details on various smoothing techniques).More details on the choices of smoothing parameters are given in Remark 4 after Theorem 5.
4. Asymptotic Theory
We next study the asymptotic properties of the estimators obtained above. We develop separate re-sults for each of the three observation regimes considered (full observation, discrete observation, discreteobservation with measurement errors). In what follows, the space C r , s is equipped with the norm ||| f ||| “ || f || ` || f || , where || ¨ || is the usual sup-norm. The 2-Wasserstein distance between distri-butions G and G will be denoted by d W p G , G q “ bş ` G ´ p u q ´ G ´ p u q ˘ du . We first focus on the identifiable regime as given in Definition 1. Our first two results concern thefully observed case, as described in Section 3.1. Write µ “ E p X q “ E p ξ q φ , and K “ COV p X q “ E p X b X q ´ µ b µ , where p f b g q h “ x g, h y f for any triple f, g, h P L r , s . Let ||| ¨ ||| denote thetrace norm for operators on L r , s . The covariance kernel of X is denoted by K p¨ , ¨q and the empiricalcovariance kernel of the p X i ’s is denoted by p K r p¨ , ¨q . Theorem 2 (Strong Consistency – Fully Observed Case) . Further to the assumptions in Definition 1,assume also that φ is H¨older continuous with exponent α P p , s . Then, the estimators in Section 3.2satisfy the following asymptotic results, where convergence is always with probability one:(a) d W p p F , F φ q Ñ as n Ñ 8 .(b) || p T ´ i ´ T ´ i || Ñ and || p T i ´ T i || Ñ as n Ñ 8 for each i ě .(c) || p X i ´ X i || Ñ as n Ñ 8 for each i ě .(d) d W p p F i , F φ q Ñ as n Ñ 8 for each i ě , where p F i is the local variation measure associated with p X i .(e) || X r ´ µ || Ñ as n Ñ 8 , where X r “ n ´ ř ni “ p X i .(f ) ||| x K r ´ K ||| Ñ and || p K r ´ K || “ sup s,t Pr , s | p K r p s, t q ´ K p s, t q| Ñ as n Ñ 8 . Moreover, || p φ ´ φ || Ñ and | p ξ i ´ ξ i | Ñ as n Ñ 8 for each i ě .Furthermore, if we additionally assume that E p|| T || q ă 8 and inf t Pr , s T p t q ě δ ą almost surelyfor a deterministic constant δ (call this “Condition 1”), then the following stronger results hold withprobability one, in lieu of (b), (c), and (e):(b’) ||| p T ´ i ´ T ´ i ||| Ñ and ||| p T i ´ T i ||| Ñ as n Ñ 8 for each i ě .(c’) ||| p X i ´ X i ||| Ñ as n Ñ 8 for each i ě .(e’) ||| X r ´ µ ||| Ñ as n Ñ 8 , where X r “ n ´ ř ni “ p X i . Some remarks are in order:
Remark 7.
1. The strong consistency results in Theorem 2 do not require that ξ i and T i are inde-pendent.2. Uniformity:
It is observed from the proof of the uniform convergence of p T ´ i in part (b) of theabove theorem that max ď i ď n || p T ´ i ´ T ´ i || Ñ n Ñ 8 almost surely. Under Condition 1, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations the same conclusion is true now with the finer norm ||| ¨ ||| . The convergence in part (d) also holdsuniformly for all i “ , , . . . , n .3. Fisher Consistency:
It can be directly verified that p F ´ “ T ˝ F ´ φ so that p F “ F φ ˝ T ´ . Also, p T i “ T i ˝ T ´ , p T ´ i “ T ˝ T ´ i , and p X i “ ξ i φ ˝ T ´ for each i . Further, x K r “ n ´ ř ni “ p p X i ´ X r q b p p X i ´ X r q “ t n ´ ř ni “ ξ i ´ ξ up φ ˝ T ´ q b p φ ˝ T ´ q , where ξ “ n ´ ř ni “ ξ i . Thus, p φ “p φ ˝ T ´ q{||p φ ˝ T ´ q|| , and p ξ i “ x p X i , p φ y “ ξ i || φ ˝ T ´ || . Since all of the above estimators aremeasurable functions of the sample averages of the T i ’s, the ξ i ’s and the ξ i ’s, it follows that all ofthe above estimators are Fisher consistent for their population counterpart.4. An Example:
The condition inf t Pr , s T p t q ě δ ą δ can be relaxed to inf t Pr , s T p t q ě δ i almost surely for i.i.d. positive random variables δ i providedwe assume that E p δ ´ q ă 8 . An example of random warp functions that satisfy inf t Pr , s T p t q ě δ ą ζ p t q “ t and for k ‰ ζ k p t q “ t ´ sin p πkt q{p| k | πβ q for some β ą
0. If K is an integer-valued, symmetric randomvariable, then E p ζ K q “ Id . For a fixed J ě
2, let t K j u Jj “ be i.i.d. integer-valued, symmetricrandom variables, and t U j u J ´ j “ be i.i.d. U nif r , s random variables independent of the K j ’s.Define T p t q “ U p q ζ K p t q ` ř J ´ j “ p U p j q ´ U p j ´ q q ζ K j p t q ` p ´ U p J ´ q q ζ K J p t q . Then, T is a strictlyincreasing homeomorphism on r , s , T P C r , s surely, E p T q “ Id . Further, it can be easilyshown that inf t Pr , s T p t q ě ´ β ´ . Thus, the condition inf t Pr , s T p t q ě δ ą β “ p ´ δ q ´ .Further to strong consistency, we also derive weak convergence of the estimators: Theorem 3 (Weak Convergence – Fully Observed Case) . Further to assumptions in Definition 1, assumealso that φ is H¨older continuous with exponent α P p , s , that ξ i and T i are independent for each i , andthat E p|| T || q ă 8 . Then, the estimators in Section 3.1 satisfy the following asymptotic results,(a) nd W p p F , F φ q converges weakly as n Ñ 8 .(b) ? n p p T ´ i ´ T ´ i q and ? n p p T i ´ T i q converge weakly in the C r , s topology as n Ñ 8 for each i ě .(c) ? n p p X i ´ X i q converges weakly in the C r , s topology as n Ñ 8 for each i ě .(d) nd W p p F i , F φ q converges weakly as n Ñ 8 for each i ě .(e) ? n p X r ´ µ q converges weakly to a zero mean Gaussian distribution in the C r , s topology as n Ñ 8 .(f ) ? n p x K r ´ K q converges weakly in the topology of Hilbert-Schmidt operators, and ? n p p K r ´ K q converges weakly in the C pr , s q topology as n Ñ 8 . In both cases, the limits are zero meanGaussian distributions. Moreover, ? n p p φ ´ φ q converges weakly to a zero mean Gaussian distributionin the C r , s topology, and ? n p p ξ i ´ ξ i q converges weakly as n Ñ 8 for each i ě . Since C pr , s k q is a stronger topology than L pr , s k q for any finite k “ , , . . . , it follows that theweak convergence results in the above theorem which hold in the C pr , s k q topology also hold in the L pr , s k q topology by virtue of the continuous mapping theorem.We shall now study some the asymptotic properties of the estimators in the discrete observation setup(without measurement error). Theorem 4 (Limit Theory – Discretely Observed Case Without Measurement Error) . Further tothe conditions of Theorem 3, assume that φ P C r , s , ş | φ p u q| ´ (cid:15) ă 8 for some (cid:15) ą , and that inf t Pr , s T p u q ě δ ą almost surely for a deterministic constant δ . Define α “ (cid:15) {p ` (cid:15) q . Assume that ξ i and T i are independent for each i (only for the weak convergence statements). The kernel k p¨q is assumedto be supported on r´ , s . If h “ h p n q “ o p n ´ { q and r “ r p n q satisfies r ąą n { α as n Ñ 8 , then theestimators introduced in Section 3.2 satisfy(a) d W p p F ˚ d , F φ q Ñ as n Ñ 8 almost surely, and d W p p F ˚ d , F φ q “ O P p n ´ q as n Ñ 8 .(b) || p T ˚ i,d ´ T ´ i || Ñ and || p T i,d ´ T i || Ñ as n Ñ 8 almost surely. Further, ? n p p T ˚ i,d ´ T ´ i q and ? n p p T i,d ´ T i q converge weakly in the L r , s topology as n Ñ 8 for each i ě . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (c) || p X ˚ i ´ X i || Ñ as n Ñ 8 almost surely, and ? n p p X ˚ i ´ X i q converges weakly in the L r , s topology as n Ñ 8 for each i ě .(d) d W p p F ˚ i , F φ q Ñ as n Ñ 8 almost surely, and d W p p F ˚ i , F φ q “ O P p n ´ q as n Ñ 8 for each i ě .(e) || X r ˚ ´ µ || Ñ as n Ñ 8 almost surely, and ? n p X r ˚ ´ µ q converges weakly in the L r , s topology as n Ñ 8 .(f ) ||| x K r ˚ ´ K ||| Ñ as n Ñ 8 almost surely, and ? n p x K r ˚ ´ K q converges weakly in the topologyof Hilbert-Schmidt operators. Further, || p K r ˚ ´ K || Ñ as n Ñ 8 , and ? n p p K r ˚ ´ K q convergesweakly in the L pr , s q topology as n Ñ 8 . Moreover, || p φ ˚ ´ φ || Ñ as n Ñ 8 almost surely,and ? n p p φ ˚ ´ φ q converges weakly in the L r , s topology. Also, | p ξ i ˚ ´ ξ i | Ñ as n Ñ 8 almostsurely, and ? n p p ξ i ˚ ´ ξ i q converges weakly as n Ñ 8 for each i ě .In all the weak convergence results stated above, the limits are identical to the corresponding limitsobtained in the fully observed scenario in Theorem 3. Remark 8.
1. As in the fully observed setting in Theorem 2, the strong consistency results in thediscrete, noiseless observation setting in Theorem 4 do not require ξ i and T i to be independent.2. The asymptotic results remain valid in the case where the grid over which the r X i ’s are observed,say, 0 ď t i, ă t i, ă . . . ă t i,r i ď
1, differs with i . The proof, however, will be notationally quitecumbersome. In this case, the requirement on the grid will be as follows: max ď j ď r i ´ p t j ` ´ t j q “ O p r ´ i q as r i Ñ 8 for each i , and r r n : “ min ď i ď n r i satisfies r r n ąą n { α as n Ñ 8 .3. The choice of h in Theorem 4 is an under-smoothing choice. It is made on account of the absenceof measurement errors in the observations, which enables us to under-smooth the data withoutdamaging ? n -consistency. This is unlike what happens in classical non-parametric regression dueto the presence of errors in that scenario. Also, the boundary points inflate the bias of the Nadaraya-Watson estimator to an order of h (the same order as that obtained in Theorem 4 for all points).However, these issues are of no consequence in this scenario. It is also natural to under-smoothin this situation since appropriate under-smoothing retains the features of the curves better andallows estimation at a parametric rate even under non-parametric smoothing. If instead of theNadaraya-Watson estimator, one uses a local linear estimator with bandwidth h , then the bias isof order h (even at the boundaries). In this case, h has to be o p n ´ { q to achieve parametric ratesof convergence, which is again an under-smoothing choice. Thus, the choice of smoothing methoddoes not play a crucial role in this setup.4. Unlike Theorem 3, the weak convergence results are all in the L topology. This is because unlikethe fully observed case, the estimators involved are not continuous functions in r , s . We couldnot consider the weaker D r , s topology since not all estimators will be c`adl`ag functions. However,we still retain the strong consistency results in parts (b), (c) and (e) in the sup norm similar toTheorem 2. This is due to the fact that those estimators are uniformly bounded almost surely,and thus have finite sup-norm. Further, in all cases, there is no issue with the measurability of thesupremum.5. The condition φ P C r , s can be relaxed to requiring that φ is Lipschitz continuous. Moreover,the requirement ş | φ p u q| ´ (cid:15) ă 8 for some (cid:15) ą φ isbounded away from zero on r , s , in which case one can choose α “
1. Consider the case when φ P C r , s and let t P p , q be such that φ p t q “
0. If φ p t q ą
0, then we can choose an interval A δ “ p t ´ δ, t ` δ q Ă p , q such that inf u P A δ | φ p u q| ě β ą
0. Then, a first order Taylor expansionyields ş A δ | φ p t q| ´ (cid:15) dt ď β ´ (cid:15) ş A δ | t ´ t | ´ (cid:15) dt ă 8 for any (cid:15) ă
1. Here, we have used the fact that ş δ t ´ (cid:15) dt ă 8 for any δ ą (cid:15) ă
1. Thus, if none of the zeros of φ and φ coincide, then thecondition ş | φ p u q| ´ (cid:15) ă 8 holds for any (cid:15) ă
1. In general, if φ P C m r , s for some m ě
2, and m be the least integer between 2 and m such that none of the zeros of φ and φ p m q coincide, then ş | φ p u q| ´ (cid:15) ă 8 holds for any (cid:15) ă {p m ´ q .We finally study the asymptotic properties of the estimators in the modified registration procedureemployed when one has contamination by measurement error (described in Section 3.3). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Theorem 5 (Limit Theory – Measurement Error Case) . In addition to the assumptions of Theorem3, assume that φ P C r , s , ş | φ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą . Define α “ (cid:15) {p ` (cid:15) q . Assume that ξ i and T i are independent for each i . Suppose that T P C r , s a.s. and inf t Pr , s T p u q ě δ ą almostsurely for a deterministic constant δ . The kernels k p¨q and k p¨q are assumed to be supported on r´ , s ,symmetric and continuously differentiable. The errors t (cid:15) ij u are assumed to be a.s. bounded. Also assumethat E t| ξ | ´ α {p ´ α q u ă 8 as well as E p|| T p l q || q ă 8 for l “ , , . The bandwidths satisfy h , h Ñ , rh , rh Ñ 8 . Then, the estimators in Section 3.3 satisfy the following properties.(a) d W p p F e , F φ q “ O P p h α ` p rh q ´ α ` n ´ q as n Ñ 8 .(b) Both || p T ´ i,e ´ T ´ i || and || p T i,e ´ T i || are O P p h α ` p rh q ´ α { ` n ´ { q as n Ñ 8 .(c) || p X ˚ i,e ´ X i || “ O P p h α ` p rh q ´ α { ` h ` p rh q ´ { ` n ´ { q as n Ñ 8 .(d) || X e ˚ ´ µ || “ O P p h α ` p rh q ´ α { ` h ` p rh q ´ { ` n ´ { q as n Ñ 8 .(e) ||| x K e ˚ ´ K ||| “ O P p h α ` p rh q ´ α { ` h ` p rh q ´ { ` n ´ { q as n Ñ 8 . Consequently, || p φ e ˚ ´ φ || and | p ξ i ˚ ,e ´ ξ | have the same rates of convergence for each fixed i . Remark 9.
1. Analogous rates of convergence can also be obtained if one uses different non-parametricsmoothing techniques than the ones in the theorem. One may, e.g., use a Nadaraya-Watson estima-tor in Step 3** with boundary kernels to alleviate the boundary bias problem that is well-knownfor this estimator (see, e.g., Wand and Jones (1995)). Also, to estimate r X i , one may use higherorder local polynomials with even orders. However, these will be computationally more intensiveas well as need additional smoothness assumptions on the latent process and the warp maps.2. It is observed in the above theorem that the rates of convergence are slower than the parametricrates achieved in the earlier settings due to the non-parametric smoothing steps involved – especiallythe estimation of derivatives, which is known to have quite slow rates of convergence. Further, thecontributions of the two smoothing steps in the convergence rates are clear. It is well known in locallinear regression that the optimal rate for h is r ´ { and that for h is r ´ { . With these rates, wehave d W p p F e , F φ q “ O P p r ´ α { ` n ´ q , and the remaining quantities are O P p r ´ α { ` n ´ { q . Thus,parametric rates of convergence is achieved if r ą n { α .3. Let β “ α {p ´ α q and observe that β ă α ă
1. The condition E t| ξ | ´ β u ă 8 in Theorem5 is obviously satisfied if | ξ | is bounded away from zero. Suppose that ξ has a continuous density f ξ , say, either on r , or on p´8 , in which case it is assumed to be symmetric about zero. Ifsup y Pr ,a q f ξ p y q ă 8 for some a ą
0, then it is easy to show that E t| ξ | ´ β u ă 8 if β ă ô (cid:15) ă β P r , q , then this expectation is finiteif sup y Pr ,a q y ´ f ξ p y q ă 8 . As emphasized before (Section 3.1), our procedure can be used whether or not the latent process falls inthe identifiable regime of Definition 1. In this section, we carry out a theoretical analysis of the stabilityof our registration procedure when the distribution of the latent process deviates from the identifiableregime. Since identifiability is lost, it is clear that consistency is no longer achievable. However, we canquantify how much the estimators deviate from their population counterparts, at least asymptotically.Since the model is in general unidentifiable, strictly speaking there is no unique setting corresponding tothe law of the data. For this reason, as a convention, we will assume that a “true” underlying distributionis known and fixed. For simplicity of exposition, we focus on the rank two case. This will be seen to carrythe essence of the underlying effects, as we discuss in the third point of Remark 10. To obtain moretransparent results, we focus on the case where the underlying functions are completely observable ascontinuous objects.Let X i “ ξ i φ ` ξ i φ for i “ , , . . . , n , where ξ i and ξ i are uncorrelated. Let µ “ E p X q “ E p ξ q φ ` E p ξ q φ . Denote γ l “ V ar p ξ l q and Y il “ r ξ il ´ E p ξ il qs{ γ il for l “ ,
2. Then, X i “ µ ` γ Y i φ ` γ Y i φ (4) . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations gives the Karhunen-Lo`eve expansion of X i . The (random) local variation distribution induced by X i is F i p t q “ ş t | X i p u q| du { ş | X i p u q| du for t P r , s . Note that contrary to the rank one case, where µ didnot play a role in F i (due to cancellation of the term ξ from the numerator and the denominator), hereit cannot be neglected. We will later see that it will play a role in the performance of the estimators.Defining η “ γ { γ , which is the square root of the inverse of the condition number, it follows that F i p t q “ ş t | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du . The local variation distribution induced by the observed warped data r X i “ X i ˝ T ´ i is given by r F i p t q “ ş t | r X i p u q| du ş | r X i p u q| du “ ş T ´ i p t q | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du “ F i p T ´ i p t qq . The idea is that if under suitable conditions the F i ’s manifest small variability, then the registrationprocedure will work quite well. We will illustrate two different situations where this is the case. Theestimators of the population parameters will be the same as those considered earlier. The next theoremgives bounds on the estimation errors. Theorem 6.
In the setting of Model 4, define Z i “ ş | X i p u q ´ µ p u q| du { ş | X i p u q| du if µ ‰ , η ş | Y i φ p u q| du { ş | Y i φ p u q ` ηY i φ p u q| du if µ “ , for i “ , , . . . , n. If µ ‰ , assume that ş | µ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą , and if µ “ , assume that ş | φ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą . Set α “ (cid:15) {p ` (cid:15) q . Suppose that assumption (I2) from Definition (1) holds and thatfor each i “ , , φ i lie in C r , s with the derivative being α i -H¨older continuous for some α i P r , s .Assume that X i and T i are independent for each i . Also assume that E p Z α q ă 8 . Then:(a) lim sup n Ñ8 || p T ´ i ´ T ´ i || ď const. t E p Z α q` Z i u , and lim sup n Ñ8 || p T i ´ T i || ď const. || T i || t Z αi ` E α p Z α qu almost surely, where the constant term is uniform in i .(b) lim sup n Ñ8 || p X i ´ X i || ď O P p qt E p Z α q ` Z i u almost surely. Remark 10.
1. Theorem 6 reveals that if the Z i are small, the effect of misspecification is also small.Here are two such cases:(a) When µ ‰ , Z i “ ş t | Y i φ p u q ` ηY i φ p u q| du { ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du . So,in this case, if | γ ´ µ | has a large enough contribution compared to | Y i φ ` ηY i φ | for all i ,then the Z i ’s are small.(b) On the other hand, if µ “ , then if η is small, i.e., the condition number of the process is large(which essentially implies that the process is “close” to a rank one process provided E p ξ q “ ), then the Z i ’s are small. This can be compared to the minimum eigenvalue registrationprinciple of Ramsay and Silverman (2005), where one tries to find the warp function thatminimises the second eigenvalue of the cross-product matrix between the target function andthe registered function. Assume that E p ξ i q “ E p ξ i q “ and without loss of generality that γ “ . If in reality the true unobserved curves are rank one, i.e., the ξ i φ component, andwe observe warped versions of the rank two curves X i ’s, then (in the population case) correctregistration is achieved by T i if the minimum eigenvalue, namely γ “ η , of the expectedcross-product matrix equals zero. Thus, in the empirical case, if η is close to zero, we mayexpect p T i to be close to T i and consequently expect the registration procedure to have goodperformance.2. Bounds similar to those in (a) and (b) of Theorem 6 can also be obtained for the mean, the covari-ance, the γ l ’s and the φ l ’s as well as the principal components Y il ’s. We do not include them in thestatement of the theorem because they need more complicated conditions involving the parameters. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations
3. General (possibly infinite) rank situation: Let X i “ µ ` ř Mj “ γ j Y ij φ j for some ď M ď 8 , wherethe t Y ij : j “ , , . . . , M u are uncorrelated with zero mean and unit variance. Without loss of gen-erality, we assume that γ ą γ ą . . . ě . The errors in estimation when µ ‰ remain the same asin Theorem 6. When µ “ , then we define Z i “ η ş | Y i φ p u q` ř k ě δ k Y ik φ k p u q| du { ş | Y i φ p u q` η r Y i φ p u q` ř k ě δ k Y ik φ k p u qs| du for i “ , , . . . , n , where δ k “ γ k { γ for k ě . In this case, underthe conditions of Theorem 6, the bounds as in that theorem still hold true. Note that δ k ď for all k ě . So, in the general case, the performance of the registration procedure studied in the paperwill only depend on how small η is and does not in general depend on the values of the δ k ’s (or the γ j ’s for j ě ). In other words, only the behaviour of the second frequency component relative tothe first one matters (which elucidates the role of δ in the standard model, i.e. Equation 1, whoserole is precisely to tune this behaviour). Of course, the magnitude of the error in estimation forthe same value of η will now differ from the rank case because of the presence of the additionalterms. We have investigated these issues in a simulation study in Section 5.3 (see, in particular,Figure 6).4. In the setup of the infinite rank latent model considered in (3), we now compare the bounds obtainedin Theorem 6 to those obtained by Tang and M¨uller (2008). Denoting ř Mj “ γ j Y ij φ j “ κW i , it followsthat the latent model is exactly the same as considered in that paper (see p. 877 with δ there replacedby κ ). So, if µ ‰ , it follows that Z i “ κ ş | W i p u q| du { ş | µ p u q ` κW i p u q| du “ O P p κ q , whichis similar to the bound obtained in Tang and M¨uller (2008). Our analysis nevertheless refines theresults of Tang and M¨uller (2008) in the sense that it reveals the impact of µ on the asymptoticbias – larger magnitudes of µ yield smaller asymptotic bias. Further refinements can be offered bydifferentiating between the cases µ ‰ and µ “ . Specifically, when µ “ , it can be shown that Z i “ ş | W i p u q ´ Y i φ p u q| du { ş | W i p u q| du . Thus, in this case, the error bounds on the warp mapsin Theorem 6 do not depend on κ . This is to be expected for the following reason. Note that µ “ means that the latent process in this case is X p t q “ c ` κW p t q for a constant c , and hence, thewarped process is r X p t q “ c ` κW p T ´ p t qq . Thus, the warped version of the process X differs fromthe warped version of the process W only by a constant shift and a scale factor. Ideally, any properregistration procedure should be invariant with respect these transformations since they do not affectthe time scale. This is clearly true for our procedure. We should thus get the same estimates of thewarp maps if we work with the warped process W p T ´ p t qq (which does not involve κ ) instead of r X .
5. Numerical Experiments
We now carry out simulation experiments to probe the finite-sample performance of our registrationprocedure. First we treat the case of a well-specified identifiable regime without error, and then separatelythe case when there are measurement errors in the observations. Finally, we consider the setup whenthe rank of the latent process is more than one (departure from identifiability). In all cases, we havecompared the performance of the proposed registration method to the continuous monotone registration(CMR) method by Ramsay and Li (1998), the pairwise registration (PW) technique of Tang and M¨uller(2008) and registration using the Fisher-Rao metric (FMR) studied in Srivastava et al. (2011). The CMRprocedure is implemented using the “register.fd” function in the R package fda . The PW procedure isimplemented using the Matlab codes in the
PACE package. The FMR method is implemented usingthe “time warping” function in the R package fdasrvf . The tuning parameters in the PW method arealways chosen to be the default ones since the other choices were found to be computationally extremelyintensive. For the CMR procedure, we compared its performance by using different numbers of B-splinebasis functions in the structure of the warp maps (see Ramsay and Li (1998)). This varies their complexity.However, we found that the best performance was obtained when the warp maps are simple. As will beseen in the simulations, the registration procedures involving structural assumptions on warp maps andconsequently more tuning parameters (CMR and PW) encounter difficulties in several of the modelsconsidered, which is probably due to the mis-specification of the true warping mechanism. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Let X p t q “ ξφ p t q , t P r , s , and consider two models:Model 1: ξ „ N p . , q , φ p t q “ exp t cos p πt ´ π qu ;Model 2: ξ „ ` Beta p , q , φ p t q “ t ´ p t ´ . q u cos p πt q .In either case, the sample size is n “
50 and the curves are observed at r “
101 equally spaced pointsin r , s . The warp maps are chosen according to point (3) of Remark 7 with the parameters J “ K “ V V , where V „ P oisson p q , P p V “ ˘ q “ { V independent of V , and β “ . p T i,d ’s is theEpanechnikov kernel on r´ , s . For both the models, the bandwidths used in the registration procedurewere chosen to under-smooth the data so that the features (maxima, minima, etc.) are not smeared out.In order to provide smooth registered curves, we have smoothed the p T i,d ’s using cubic splines with 11equi-spaced knots on r , s , prior to synchronising the data.Figure 2 shows the plots of the true, warped and registered data curves; the true, warped and registeredmeans; and the true, warped and registered leading eigenfunctions under Model 1 and Model 2. Figure2 suggests that the procedure studied in this paper has been able to adequately register the discretelyobserved and warped sample curves. Moreover, it is clear that the cross-sectional mean and the leadingeigenfunction of the warped curves differ from the true mean and leading eigenfunction in either amplitudeor phase (under either model), while the registration procedure corrects the problem, and the resultingestimates (whether smoothed or raw) are very close to the true functions.Under both the models, it is seen that the estimates of the mean and the leading eigenfunctionobtained using the proposed registration procedure is closest to the true functions compared to all theother methods considered. This is more prominent under Model 2 (see the bottom two rows in Fig. 2),where the estimates of the leading eigenfunction obtained by all of other competing procedures consideredare far from the true eigenfunction. Also, the registered functions obtained using the CMR and the PWmethods do not resemble the true functions (see Figures 8 and 9). The above facts show that for smallsample sizes, even under no measurement error, some of the well-known registration procedures mayyield unsatisfactory results, while the proposed procedure works well in these cases. We now consider the situation when the warped observations under an identifable rank one modelhave been observed with measurement errors. As observed in our theoretical study in Section 4.1, therate of convergence will be much slower than the case when there is no measurement error. For oursimulations, we thus keep the same two models as in Section 5.1 but increase the sample size to n “ p´ . , . q while those under Model 2 are i.i.d.Unif p´ . , . q . The bandwidths for the smoothing steps involved in the registration procedure arechosen using built-in cross-validation bandwidth choice function “regCVBwSelC” in the locpol packagein the R software. Figures 3 and 4 show the plots of the unobserved true rank one curves, the warpedcurves that are observed with error and the registered curves. They also contain the plots of the meanfunction and the leading eigenfunction of the true, warped and registered data under the two models. Itis observed that even subject to measurement error contamination, the proposed registration procedureis able to adequately register the curves. In particular, under Model 2, the means as well as the leadingeigenfunction of the true and the registered curves are quite close. We also performed the registrationprocedure with a Nadaraya-Watson estimator (without boundary kernels) for obtaining an estimate of the r X i ’s (see Step 3**). The performance was not that different from the one using a local linear estimator.Only the FRM procedure fares similarly as the proposed one when estimating the leading eigenfunctionunder both models. However, the PW method yields quite similar estimates of the mean as the proposedand the FRM method under each of the two models. Both the CMR and the PW methods fail to produceadequately registered curves as is seen from Figures 10 and 11. The improvement in the performanceof the FRM technique under Model 2 with error compared to the case without error considered in the . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Original curves t 0.0 0.2 0.4 0.6 0.8 1.0
Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0
Registered curves t . . . . . Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . t TrueEstm.WarpPWCMRFRM − − − Original curves t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Registered curves t − − Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − − t TrueEstm.WarpPWCMRFRM Fig 2 . Plots of the true, warped and registered data curves (using our procedure) along with the estimated mean leadingeigenfunction under Model 1 (top two rows) and Model 2 (bottom two rows) without measurement error obtained using ourprocedure as well as some other methods.. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations previous subsection is perhaps due to the increased sample size, which compensates for the measurementerror. Original curves t 0.0 0.2 0.4 0.6 0.8 1.0
Warped curves with error t 0.0 0.2 0.4 0.6 0.8 1.0
Registered curves t . . . . . Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . t TrueEstm.WarpPWCMRFRM Fig 3 . Plots of the true, warped and registered data curves (using our procedure) along with the estimated mean leadingeigenfunction under Model 1 with measurement error obtained using our procedure as well as some other methods.
We next carry out experiments to probe the performance of the registration procedure in a rank 2 and arank 3 setting – these correspond to an unidentifiable regime. The model considered in the rank 2 case are X “ ξ φ ` ξ φ with ξ „ N p . , q , ξ „ N p´ . , . q , φ p t q “ ? p πt q and φ p t q “ ? p πt q , t Pr , s . In the rank 3 case, we consider X “ ξ φ ` ξ φ ` ξ φ with the same choices of ξ j and φ j as abovefor j “ , ξ „ N p . , p . q q and φ p t q “ ? p πt q . The warp maps are the same asthose considered in the simulation study in Section 5. The plots of the true curves, the warped curves andthe registered curves are provided in Figure 5 for the rank 2 and the rank 3 models. The unidentifiablesetting has to be interpreted as follows: in light of Theorem 1 and the ensuing counter-examples, theremay be other models that could have generated the (statistically) same data. Consequently, strictlyspeaking, we cannot really talk about good or bad performance, as we there may be several equally valid“ground truths” to compare to. But the way we have constructed the unidentifiable simulation settingis by means of a mild departure from an identifiable model. Therefore, we can arbitrarily consider thatthe latter identifiable model is the truth and investigate whether the registration procedure is stable tothe said mild departure. A more detailed investigation of stability is pursued later in this subsection.It is observed that the registration procedure performs quite well and aligns the peak (present inthe true curves) adequately under both models (see Figure 5). Further, the two smaller troughs nearthe end-points present in the rank 3 model are also reasonably aligned (see the plots in the third rowin Figure 5). However, except the FRM procedure, the other two competing methods completely failin registering the data curves (see Figures 12 and 13 in the Supplementary material). Also, unlike ourprocedure, the registered curves using the FRM procedure seems to lack the two troughs present in the . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − − − Original curves t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Warped curves with error t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Registered curves t − − Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − − t TrueEstm.WarpPWCMRFRM Fig 4 . Plots of the true, warped and registered data curves (using our procedure) along with the estimated mean leadingeigenfunction under Model 2 with measurement error obtained using our procedure as well as some other methods. original curves near the boundary points for the rank 3 model. For each of the two models, the meanseems to be estimated very well based on the registered curves using our procedure. The other proceduresfollow suit. A similar statement is also true for the first eigenfunction under these two models. However,there is more bias in the estimate of the second eigenfunction under the rank 2 model for all of theregistration procedures. Under the rank 3 model, the CMR and the PW methods are not fully able tocapture the shape of the second eigenfunction, while our procedure and the FRM method does. Thethird eigenfunction under this model is somewhat reasonably estimated only by our procedure.In order to probe the breakdown point of the proposed registration procedure in the rank ą L -error in estimation ofthe data curves, i.e, the median of || p X i ´ X i ||{|| X i || , i “ , , . . . , n , and consider a threshold of 10%error as a criterion for good performance. The models are generated similar to the earlier simulation.For the rank 2 case, let X “ ξ ,c φ ` ξ ,c,r φ , where ξ „ N p c, q , ξ „ N p´ c, r q , where c P r . , s and r P r . , . s . The choices of c and r ensure that we include both approximately rank 1 models ( c and r close to zero) as well as proper rank 2 models (large values of r ). Similarly, for the rank 3 case,let X “ ξ ,c φ ` ξ ,c,r φ ` ξ ,c,r φ , where ξ „ N p c, r q . Figure 6 shows a plot of the relative L -errorsunder these classes of models, for various combinations of the parameters c and r . It is seen that when c is large, the performance of the registration procedure is good, which conforms with our theoreticalarguments in Theorem 6. In fact, for this class of rank 2 models, the maximum L error does not exceed12 . c is small, the allowable range of r values for good performance is muchgreater in the rank 2 setup compared to the rank 3 setup (cf. (c) in Remark 10). In fact, in the rank 3setup, the error is more than 10% for all r in the range considered when c ď .
2. Further, the maximum L error is now 29 . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − Original curves t 0.0 0.2 0.4 0.6 0.8 1.0 − Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0 − Registered curves t0.0 0.2 0.4 0.6 0.8 1.0
Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . . t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − . − . − . − . . . . . t TrueEstm.WarpPWCMRFRM0.0 0.2 0.4 0.6 0.8 1.0 Original curves t 0.0 0.2 0.4 0.6 0.8 1.0
Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0
Registered curves t Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . . t TrueEstm.WarpPWCMRFRM0.0 0.2 0.4 0.6 0.8 1.0 − . − . . . . . t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − − t TrueEstm.WarpPWCMRFRM Fig 5 . Plots of the true, warped and registered data curves along with the means and eigenfunctions of the true, warpedand the registered data using our method and some other procedures under the rank (top two rows) and the rank models(bottom three rows).. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations rank 2 model c r rank 3 model c r Fig 6 . Level-plots of the relative L errors under the rank and the rank classes of models.
6. Data Analysis
In this section, we illustrate the performance of our registration procedure on a data set of growth curvesof
Tribolium beetle larvae, collected and analysed by Irwin and Carter (2013). Each curve representsthe mass measurement (in milligrams) as a function of the age of the larvae since hatching (in days).Their analysis of
Tribolium growth suggests that these beetles’ growth patterns differ from those ofother animals with determinate growth (that is, growth that is contained in certain life stages). Usually,the longer the growth period, the larger the maximal mass attained (see Irwin and Carter (2014), andreferences therein). In
Tribolium , however, it seems that beetles that tend to grow faster, and thus havea shorter growth period, also tend to attain larger size (e.g. Figure 7, top left). See Irwin and Carter(2013) for more details and background. This observation suggests that the
Tribolium data could be well-suited for a phase-amplitude analysis under a latent rank 1 model that has been warped: one expectsthat correcting for different “growth clocks” (phase variation) should yield curves that are roughly ofunimodal amplitude variation, due to final mass. Conversely, it suggests a potential latent model thatproduces rank 1 vertical variation related only to final mass, and horizontal variation due to growthtiming (i.e. how this total final mass is accumulated in time).For our analysis, we have only considered the part of the dataset where there were at least 10 discretemeasurements per individual curve, which results in a sample size of 159. Also, not all larvae wererecorded on the same day so that the number of observations differed across individuals. Since there arerelatively few measurements (maximum 12) per individual larvae, we smoothed each observation vectoras a pre-processing step. This was done using the built-in function splinefun in the R software withthe method monoH.FC that uses monotone Hermite spline interpolation proposed by Fritsch and Carlson(1980) (since the curves are expected to be approximately increasing).As is typically the case with growth curves, one expects that, if unaccounted for, the lurking phasevariation would give the impression of several modes of amplitude variation. The aim our analysis is thusto register the curves, estimate the warp maps, estimate the mean of the registered curves, and carryout an eigenanalysis of the registered data.It is indeed observed that prior to any registration, the data present at least two susbtantial modesof amplitude variation, with the first three principal components explaining 78 . .
85% ofthe total variation, respectively. However, after registration using our method, the empirical covarianceoperator is almost precisely of rank 1, with the leading principal component explaining 99 .
72% of the totalvariation. Interestingly, the mean of the registered data has the same shape as the leading eigenfunctionand is in fact roughly equal to 776 times the leading eigenfunction. This can be seen as a model diagnostic, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations l l l l l l l l l l Warped discrete data
Age l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l
Warped smoothed curves
Age 1 4 7 10 13 16 19 22
Registered curves
Age1 4 7 10 13 16 19 22
Warp maps
Age Estm. warpMean estm. warp 1 4 7 10 13 16 19 22
Mean
Age Estm. meanWarp. meanPWCMRFRM 1 4 7 10 13 16 19 22 . . . . . Age Estm. eigfunWarp. eigfunPWCMRFRM
Fig 7 . Plots in the first row are those of the Tribolium data, the smoothed curves and the registered curves using ourprocedure. The first plot in the second row shows the estimated warp maps, where the dotted line is the identity map. Theother two plots in the second row show the means and the leading eigenfunctions of the warped and the registered data usingour procedure and some other registration methods. corroborating the model: if the rank 1 model were correct, then after registration one would expect tohave a single mode of amplitude variation and a mean in the span of the corresponding eigenfunction(see the discussion after Counterexample 1).Figure 7 show the plots of the actual data, the monotone spline smoothed data and the registereddata, as well as the plot of the estimated warp maps and the average warp map, which is very closeto the identity. It also shows the plots of the mean and the leading eigenfunction of the warped andthe registered data. Although the means of the warped and the registered data are very close, there aresubstantial qualitative differences between the corresponding eigenfunctions. The eigenfunction of theregistered data shows that the variation in growth pattern essentially starts at about the 8 days afterhatching. Between ages 10 ´
16 days post hatching, there is a notable increase in the growth variation, andit somewhat recedes after that age. These periods are in fact compatible with biologically interpretablephases of growth: the larvae enter an “instar” (a distinct growth period between exoskeleton moults)characterised by exponential growth at around day 7-8; then, around day 17, they enter the “wanderingphase” and begin losing weight in preparation for pupation.The performance of the FRM technique is very similar to the proposed procedure and results inan almost rank one registration. However, the CMR and the PW procedures do not yield a rank oneregistration although the estimated means are very similar to that obtained by our procedure, which isobserved by comparing Figure 7 with Figure 14. However, the difference lies in the registered curves andthe estimate of the leading eigenfunction. The latter shows some artifacts which do not conform to thebiological explanation provided earlier, e.g., the presence of flat regions in the estimated eigenfunctionduring the “instar” phase of exponential growth as well as the growth spurt towards the end where thelarvae would actually enter the “wandering phase”. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Appendix – Proofs of Formal Statements
Proof of Lemma 1.
Since X p t q “ ξφ p t q , t P r , s , we have F p t q “ ż t | X p u q| du { ż | X p u q| du “ ż t | φ p u q| du { ż | φ p u q| du “ F φ p t q by Definition 2. Next, r X p t q “ ξφ p T ´ p t qq so that r X p t q “ ξφ p T ´ p t qq{ T p T ´ p t qq . Thus, using the strictmonotonicity of T , we have r F p t q “ ż t | r X p u q| du { ż | r X p u q| du “ "ż t | φ p T ´ p u qq|{ T p T ´ p u qq du * { "ż t | φ p T ´ p u qq|{ T p T ´ p u qq du * . A standard change-of-variable argument and the fact that T is a bijection with T p q “ T p q “ r F p t q “ ş T ´ p t q | φ p u q| du { ş | φ p u q| du “ F φ p T ´ p t qq . So, r F “ F φ ˝ T ´ , equivalently, T “ r F ´ ˝ F φ Ø T ˝ F ´ φ “ r F ´ . Using the assumption that E p T q “ Id , we now have E p r F ´ q “ F ´ φ . Proof of Theorem 1.
Note that f : C r , s ÞÑ f P p C r , s , || ¨ || q is a Lipschitz map. Thus, r X d “ r X implies that r X d “ r X . Consider the random probability measure given byΨ p A q “ ż A | r X p u q| du { ż r , s | r X p u q| du for A in the Borel σ -field of r , s . Similarly, Ψ p A q “ ş A | r X p u q| du { ş r , s | r X p u q| du . We equip the space P of diffuse probability measures on r , s with the L -Wasserstein metric (see, e.g., Villani (2003)) given by d W p µ, ν q “ || F ´ ν ´ F ´ µ || , where F µ and F ν are the distribution functions associated with the probabilitymeasures µ and ν . Now for any f , f P C r , s satisfying ş | f i p u q| du ą i “ ,
2, consider themeasure µ i with density | f i p s q|{ ş | f i p u q| du for i “ ,
2. The condition ş | f p u q| du ą f ‰ const. . Since µ and µ are supported on the bounded set r , s , it follows from Proposition 7.10 inVillani (2003) that d W p µ , µ q ď c d T V p µ , µ q for a constant c ą
0, where d T V p¨ , ¨q is the total variationdistance. It now follows that d W p µ , µ q ď c ż ˇˇˇˇˇ | f p s q| ş | f p u q| du ´ | f p s q| ş | f p u q| du ˇˇˇˇˇ ds ď c ż ˇˇˇˇˇ | f p s q| ş | f p u q| du ´ | f p s q| ş | f p u q| du ˇˇˇˇˇ ds ` c ż ˇˇˇˇˇ | f p s q| ş | f p u q| du ´ | f p s q| ş | f p u q| du ˇˇˇˇˇ ds ď c ş | f p s q ´ f p s q| ds ş | f p s q| ds ď c || f ´ f || ş | f p s q| ds ď c ||| f ´ f ||| ş | f p s q| ds Thus, the embedding H : f ÞÑ µ f is continuous when the domain, say, A is restricted to the set ofall non-constant functions on C r , s . But the set A c is a one dimensional linear subspace spanned bythe constant function f ”
1, and this implies that A c is a Borel measurable subset of C r , s . So, A is a Borel measurable subset of C r , s . Equip A with the Borel σ -field induced from C r , s . Since P p r X P A c q “
0, we have that H p r X q is a valid random probability measure on r , s . Note that for anyBorel subset A of r , s , we have H p r X qp A q “ Ψ p A q . Thus, for any Borel subset B of P , we have P p H p r X q P B q “ P p r X P H ´ p B qq “ P p r X P H ´ p B qq “ P p H p r X q P B q . The first equality follows from the continuity of H on A and the fact that P p r X P A c q “ r X and r X have the same distributions by assump-tion. So, H p r X q d “ H p r X q as random probability measures. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Next, note that the random measures H p r X i q , i “ ,
2, have strictly increasing cdfs almost surely.Proposition 2 in Panaretos and Zemel (2016) states that for each i “ ,
2, the map γ Ñ E t d W p H p r X i q , γ qu admits a unique minimizer given by E t r F ´ i u , where r F Ψ i is the random distribution function of the ran-dom measure H p r X i q . Since r X i “ ξ i φ i p T ´ i q with T i being a strictly increasing homeomorphism on r , s , itfollows from the change-of-variable formula that H p r X i qp A q “ Ψ i p A q “ ş T ´ i p A q | φ i p u q| du { ş r , s | φ i p u q| du .Thus, r F Ψ i “ F φ i ˝ T ´ i , equivalently, r F ´ i “ T i ˝ F ´ φ i , where F φ i is the cdf associated with the (deter-ministic) probability measure Φ i p A q “ ş A | φ i p u q| du { ş r , s | φ i p u q| du .Note that F φ i has a continuous and strictly increasing cdf since φ i is zero only on a countable setfor i “ ,
2. Since E p T i q “ Id , it follows that the minimizer E t r F ´ i u “ F φ i for i “ ,
2. But since H p r X q d “ H p r X q , it now follows that F φ “ F φ . Also, T i “ r F ´ i ˝ F φ i , equivalently, T ´ i “ F ´ φ i ˝ r F Ψ i .Using the above facts and the result obtained in the previous paragraph, it now follows that T d “ T .We next claim that the joint distributions of p r X i , T ´ i q , i “ , H : f ÞÑ p f, H p f qq defined from A to A b P with the latter being equipped with the inducedproduct topology and the induced product σ -field. It follows from the same arguments used to prove thecontinuity of H that H is continuous. Thus, for Borel subsets G and G of C r , s , we have P p r X P G , T ´ P G q “ P p r X P G , F ´ φ ˝ r F Ψ P G q “ P p r X P G , r F Ψ P F φ p G qq“ P p H p r X q P G ˆ F φ p G qq “ P p r X P H ´ p G ˆ F φ p G qqq“ P p r X P H ´ p G ˆ F φ p G qqq [since F φ “ F φ ] “ P p H p r X q P G ˆ F φ p G qq “ P p r X P G , r F Ψ P F φ p G qq“ P p r X P G , F ´ φ ˝ r F Ψ P G q “ P p r X P G , T ´ P G q . Next, note that X i “ r X i ˝ T i is the true unobserved process. It is easy to show that the map p f, g q ÞÑ f ˝ g from C r , sb C r , s into C r , s is continuous. Thus, using the observation in the previous paragraph,we have X d “ X as random elements in C r , s . It follows from the equality of distributions that theircovariance operators are equal, and thus the corresponding eigenfunctions are equal. Now, the covarianceoperator of X i is given by V ar p ξ i q φ i b φ i . Since X i “ ξ i φ i is a rank one process, the equality of thecovariance operators implies that φ “ ˘ φ (since || φ || “ || φ || “ X d “ X implies that ξ “ x X , φ y d “ x X , φ y “ x X , ˘ φ y “ ˘ ξ . Proof of Theorem 2.
First observe that the T i ’s are also i.i.d. random elements in C r , s . Moreover,since T is strictly increasing and positive, we have E p|| T || q “ E p T p qq “ ă 8 . Thus, by the stronglaw for Banach space valued random elements (see, e.g, Theorem 2.4 in Bosq (2000)), it follows that T Ñ E p T q “ Id as n Ñ 8 almost surely. In addition, if E p|| T || q ă 8 implying that E p||| T ||| q ă 8 ,then the almost sure convergence T Ñ E p T q “ Id holds in C r , s .(a) Since p F ´ “ T ˝ F ´ φ , using Theorem 2.18 in Villani (2003), we get that d W p p F , F φ q “ || p F ´ ´ F ´ φ || “ ż ˇˇˇ p F ´ p F φ p t qq ´ t ˇˇˇ F φ p dt q“ ż ˇˇ T p t q ´ t ˇˇ F φ dt ď || T ´ Id || Ñ n Ñ 8 . (b) Since each T i is a strictly increasing bijection on r , s , we have || p T ´ i ´ T ´ i || “ sup t Pr , s ˇˇ T p T ´ i p t qq ´ T ´ i p t q ˇˇ “ || T ´ Id || Ñ n Ñ 8 . Since both p T ´ i and T ´ i are strictly increasing homeomorphisms, the uniform convergence of p T i to T i follows as a consequence of the above uniform convergence. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Suppose now that Condition 1 holds. We have discussed towards the beginning of the proof that in thiscase ||| T ´ Id ||| Ñ n Ñ 8 almost surely. In view of the first half of part (b) of the theorem alongwith the definition of the ||| ¨ ||| norm, it is enough to show the uniform convergence of the derivatives.Since each T i is a strictly increasing bijection on r , s , so is T for every n ě
1. First note that ||p p T ´ i q ´ p T ´ i q || “ sup t Pr , s |p T ˝ T ´ i q p t q ´ p T ´ i q p t q| “ sup t Pr , s ˇˇˇˇˇ T p T ´ i p t qq T i p T ´ i p t qq ´ T i p T ´ i p t qq ˇˇˇˇˇ “ sup t Pr , s ˇˇˇˇˇ T p t q ´ T i p t q ˇˇˇˇˇ ď δ ´ || T ´ || , where is the constant function taking value 1. It thus follows from an earlier bound that |||p p T ´ i q ´ p T ´ i q ||| ď || T ´ Id || ` δ ´ || T ´ || ď max p , δ ´ q||| T ´ Id ||| Ñ n Ñ 8 . Next note that T p t q “ n ´ ř ni “ T i p t q ě n ´ ř ni “ inf s Pr , s T i p t q “ δ so that inf t Pr , s T p t q ě δ ą || p T i ´ T i || “ sup t Pr , s |p T i ˝ T ´ q p t q ´ T i p t q| “ sup t Pr , s ˇˇˇˇˇ T i p T ´ p t qq T p T ´ p t qq ´ T i p t q ˇˇˇˇˇ “ sup t Pr , s ˇˇˇˇˇ T i p t q T p t q ´ T i p T p t qq ˇˇˇˇˇ ď sup t Pr , s ˇˇˇˇˇ T i p t q T p t q ´ T i p T p t qq T p t q ˇˇˇˇˇ ` sup t Pr , s ˇˇˇˇˇ T i p T p t qq T p t q ´ T i p T p t qq ˇˇˇˇˇ ď δ ´ sup t Pr , s ˇˇ T i p t q ´ T i p T p t qq ˇˇ ` δ ´ || T i || || T ´ || . Since T i is continuous on r , s , it is uniformly continuous. This and the fact that || T ´ Id || Ñ n Ñ 8 almost surely implies that sup t Pr , s ˇˇ T i p t q ´ T i p T p t qq ˇˇ Ñ n Ñ 8 almost surely. Combiningthis fact with the uniform convergence of T to , we get that ||| p T i ´ T i ||| Ñ n Ñ 8 almost surely.(c) Note that || p X i ´ X i || “ | ξ i | sup t Pr , s | φ p T ´ p t qq ´ φ p t q| “ | ξ i | sup t Pr , s | φ p T p t qq ´ φ p t q| Ñ n Ñ 8 , since || T ´ Id || Ñ n Ñ 8 almost surely, and φ is continuous on r , s and hence uniformlycontinuous.Suppose now that Condition 1 holds. Then, as before, || p X i ´ X i || “ | ξ i | sup t Pr , s ˇˇˇˇˇ φ p T ´ p t qq T p T ´ p t qq ´ φ p t q ˇˇˇˇˇ “ | ξ i | sup t Pr , s ˇˇˇˇˇ φ p t q T p t q ´ φ p T p t qq ˇˇˇˇˇ ď | ξ i | sup t Pr , s ˇˇˇˇˇ φ p t q T p t q ´ φ p T p t qq T p t q ˇˇˇˇˇ ` | ξ i | sup t Pr , s ˇˇˇˇˇ φ p T p t qq T p t q ´ φ p T p t qq ˇˇˇˇˇ ď | ξ i | δ ´ sup t Pr , s | φ p t q ´ φ p T p t qq| ` | ξ i | || φ || δ ´ || T ´ || . Using similar arguments as earlier, we conclude that || p X i ´ X i || Ñ ||| p X i ´ X i ||| Ñ n Ñ 8 almost surely.(d) Observe that since p X i “ ξ i φ ˝ T ´ “ X i ˝ T ´ , it follows from the change-of-variable formula that p F i “ F φ ˝ T ´ . Thus, d W p p F i , F φ q “ || p F ´ i ´ F ´ φ || “ || T ˝ F ´ φ ´ F ´ φ || “ ż ˇˇ T p t q ´ t ˇˇ F φ p dt q ď || T ´ Id || Ñ n Ñ 8 . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (e) Observe that || X r ´ µ || “ || n ´ n ÿ i “ p p X i ´ X i q ` n ´ n ÿ i “ X i ´ µ || ď n ´ n ÿ i “ || p X i ´ X i || ` || n ´ n ÿ i “ X i ´ µ || . Since the X i ’s are i.i.d. random elements in C r , s with E p|| X || q “ E p| ξ |q|| φ || ă 8 , we concludefrom the strong law for Banach space valued random elements that || n ´ ř ni “ X i ´ µ || Ñ n Ñ 8 almost surely. Also, from the proof of part (c), we have that n ´ n ÿ i “ || p X i ´ X i || “ sup t Pr , s | φ p T p t qq ´ φ p t q| ˆ n ´ n ÿ i “ | ξ i | “ sup t Pr , s | φ p T p t qq ´ φ p t q| ˆ t E p| ξ |q ` o p qu as n Ñ 8 almost surely. Thus, using similar arguments as in part (c) of the theorem, we obtain n ´ ř ni “ || p X i ´ X i || Ñ n Ñ 8 almost surely. Combining the above facts, we conclude || X r ´ µ || Ñ n Ñ 8 almost surely.Note that since X i “ ξ i φ , it follows that || n ´ ř ni “ X i ´ µ || Ñ n Ñ 8 almost surely. Now,suppose that Condition 1 holds. A similar decomposition as above yields || X r ´ µ || ď n ´ n ÿ i “ || p X i ´ X i || ` || n ´ n ÿ i “ X i ´ µ || . The proof of part (c) implies that n ´ n ÿ i “ || p X i ´ X i || ď δ ´ ˜ n ´ n ÿ i “ | ξ i | ¸ sup t Pr , s | φ p t q ´ φ p T p t qq| ` || φ || || T ´ || + . The right-hand term above converges to zero as n Ñ 8 almost surely. The result is now established uponcombining the above facts.(f) Straightforward algebraic manipulations yield x K r “ n ´ n ÿ i “ p p X i ´ X r q b p p X i ´ X r q“ n ´ n ÿ i “ p X i ´ X q b p X i ´ X q ` n ´ n ÿ i “ p p X i ´ X i q b p p X i ´ X i q ´ p X ´ X r q b p X ´ X r q` n ´ n ÿ i “ tp p X i ´ X i q b p X i ´ X q ` p X i ´ X q b p p X i ´ X i qu . Denote x K “ n ´ ř ni “ p X i ´ X q b p X i ´ X q . Then, ||| x K r ´ x K ||| ď n n ÿ i “ || p X i ´ X i || || X i ´ X || ` n n ÿ i “ || p X i ´ X i || ` || X ´ X r || . Using the Cauchy-Schwarz inequality, we have n ´ ř ni “ || p X i ´ X i || || X i ´ X || ď t n ´ ř ni “ || p X i ´ X i || u { t n ´ ř ni “ || X i ´ X || u { , and n ´ ř ni “ || X i ´ X || “ O p q as n Ñ 8 almost surely. It followsfrom the arguments in the proof of part (c) of the theorem that n ´ n ÿ i “ || p X i ´ X i || ď n ´ n ÿ i “ || p X i ´ X i || ď sup t Pr , s | φ p T p t qq ´ φ p t q| ˜ n ´ n ÿ i “ | ξ i | ¸ , and the right hand side is o p q as n Ñ 8 almost surely since E p| ξ | q ă 8 . Further, || X ´ X r || “ o p q as n Ñ 8 almost surely. Thus, ||| x K r ´ x K ||| “ o p q as n Ñ 8 almost surely. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations The proof of the uniform convergence of p K r p s, t q to K p s, t q is obtained by use of a decomposition of p K r p s, t q similar to the one used above, noting that p K p s, t q converges uniformly to K p s, t q (by the stronglaw of large numbers in C pr , s q ), and the fact that all the other bounds hold in the supremum norm.Next, note that p φ p t q “ p λ ´ ş p K r p s, t q p φ p s q ds and φ p t q “ λ ´ ş K p s, t q φ p s q ds for all t P r , s , where | p λ ´ λ | ď ||| x K r ´ K ||| Ñ n Ñ 8 almost surely. Also, || p φ ´ φ || ď ? λ ´ ||| x K r ´ K ||| Ñ n Ñ 8 almost surely. So, | p φ p t q ´ φ p t q| ď ˇˇˇˇp λ ´ ż p K r p s, t q p φ p s q ds ´ p λ ´ ż K p s, t q p φ p s q ds ˇˇˇˇ ` ˇˇˇˇp λ ´ ż K p s, t q p φ p s q ds ´ p λ ´ ż K p s, t q φ p s q ds ˇˇˇˇ ` ˇˇˇˇp λ ´ ż K p s, t q φ p s q ds ´ λ ´ ż K p s, t q φ p s q ds ˇˇˇˇ ď p λ ´ || p K r ´ K || ` p λ ´ || K || || p φ ´ φ || ` ˇˇˇ p p λ ´ ´ λ ´ q λφ p t q ˇˇˇ ď p λ ´ ` o p qqt|| p K r ´ K || ` || K || || p φ ´ φ || u ` | λ ´ p λ |p λ ´ ` o p qq ´ || φ || as n Ñ 8 almost surely. Thus, || p φ ´ φ || Ñ n Ñ 8 almost surely.Finally, | p ξ i ´ ξ i | “ |x p X i , p φ y ´ x X i , φ y| ď |x p X i ´ X i , p φ y| ` |x X i , p φ ´ φ y| ď || p X i ´ X i || ` || p φ ´ φ || Ñ n Ñ 8 almost surely.
Proof of Theorem 3.
We have | T p t q ´ T p s q| ď || T || | s ´ t | and by assumption E p|| T || q ă 8 . So,by the CLT for i.i.d. C r , s valued random elements (see, e.g, Theorem 2.4 Bosq (2000)), we have ? n p T ´ Id q d Ñ Y for a zero mean Gaussian random element Y in C r , s .(a) From the proof of part (a) of Theorem 2, one has that d W p p F , F φ q “ ş | T p t q ´ t | F φ p dt q . Now, itis easy to check that the map C r , s Q f Ñ ş | f p t q| F φ p dt q is continuous. The result follows from thecontinuous mapping theorem.(b) Note that for each fixed i ě
1, we have ? n p p T ´ i ´ T ´ i q “ U n ˝ V n , where U n “ ? n p T ´ Id q and V n “ T ´ i . We will first derive the weak limit conditional on T i “ t i . From the previous paragraph, itfollows that conditional on T i “ t i , U n “ ? n p n ´ t i ` n ´ ř j ‰ i T j ´ Id q d Ñ Y , and V n , being a constantsequence, converges conditionally in probability to t ´ i as n Ñ 8 . So, by Theorem 4.4 in Billingsley(1968), conditional on T i “ t i , we have p U n , V n q d Ñ p
Y, t ´ i q in the C r , s topology. Using the fact thatthe map p f, g q ÞÑ f ˝ g is continuous in C pr , s q (see, e.g., p. 155 in Billingsley (1968)), it follows fromthe continuous mapping theorem that conditional on T i “ t i , ? n p p T ´ i ´ T ´ i q d Ñ Y ˝ t ´ i as n Ñ 8 for each fixed i ě
1. Thus, by the Dominated Convergence Theorem, the unconditional distribution of ? n p p T ´ i ´ T ´ i q converges weakly as n Ñ 8 for each fixed i ě ? n p p T i ´ T i q “ ? n p T i ˝ T ´ ´ T i q , we will as earlier first derive itsweak limit conditional on T i “ t i . Now, using the fact that T i P C r , s almost surely, we have p T i p s q ´ t i p s q “ t i p T ´ p s qq ´ t i p s q “ t i p s ` T ´ p s q ´ s q ´ t i p s q“ p T ´ p s q ´ s q ˆ t i p s ` β p T ´ p s q ´ s qq for some β P r , s (possibly depending on s and i ). Thus, ? n p p T i ´ t i q “ t? n p T ´ ´ Id qu ˆ t i p¨ ` o P p qq “ t? n p Id ´ T q ˝ T ´ u ˆ t i p¨ ` o P p qq where the o P p q term is uniform in s since || T ´ ´ Id || Ñ n Ñ 8 almost surely. Using similararguments as in the above proof and noting that || T ´ Id || as n Ñ 8 almost surely, we deduce that ? n p p T i ´ t i q d Ñ Y ˆ t i as n Ñ 8 . Thus, by the Dominated Convergence Theorem, the unconditional . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations distribution of ? n p p T i ´ T i q converges weakly as n Ñ 8 for each fixed i ě i ě p X i p s q ´ X i p s q “ ξ i t φ p T ´ p s qq ´ φ p s qu “ ξ i tp T ´ p s q ´ s q φ p s ` β p T ´ p s q ´ s qquñ ? n p p X i ´ X i q “ ξ i t? n p Id ´ T q ˝ T ´ u ˆ φ p¨ ` o P p qq , where β P r , s , and the o P p q term is uniform in s as earlier. Similar arguments as in part (b) aboveyield ? n p p X i ´ X i q d Ñ ξ i Y ˆ φ as n Ñ 8 for each fixed i ě ? n p X r ´ µ q “ ? n n ´ n ÿ i “ ξ i φ ˝ T ´ ´ E p ξ q φ + “ ? n n ´ n ÿ i “ p ξ i ´ E p ξ qq + φ ˝ T ´ ` E p ξ q? n ! φ ˝ T ´ ´ φ ) d Ñ N p , V ar p ξ qq φ ` E p ξ q Y ˆ φ , which follows from similar arguments as in part (c) and the independence of the ξ i ’s and the T i ’s.(f) For the first part, note that x K r “ n ´ n ÿ i “ p p X i ´ X r q b p p X i ´ X r q“ n ´ n ÿ i “ p p X i ´ µ q b p p X i ´ µ q ´ p X r ´ µ q b p X r ´ µ q“ S ` S , say . Now, some straightforward manipulations yield S “ n ´ n ÿ i “ t ξ i φ ˝ T ´ ´ E p ξ q φ u b t ξ i φ ˝ T ´ ´ E p ξ q φ u“ n ´ n ÿ i “ t ξ i ´ E p ξ qu p φ ˝ T ´ q b p φ ˝ T ´ q ` E p ξ qp φ ˝ T ´ ´ φ q b p φ ˝ T ´ ´ φ q` n ´ E p ξ q n ÿ i “ t ξ i ´ E p ξ qu ” p φ ˝ T ´ q b p φ ˝ T ´ ´ φ q ` p φ ˝ T ´ ´ φ q b p φ ˝ T ´ q ı . So, ? n p S ´ K q“ ? n n ´ n ÿ i “ t ξ i ´ E p ξ qu p φ ˝ T ´ q b p φ ˝ T ´ q ´ K + “ ? n n ´ n ÿ i “ t ξ i ´ E p ξ qu p φ ˝ T ´ q b p φ ˝ T ´ q ´ V ar p ξ q φ b φ + “ ? n n ´ n ÿ i “ “ t ξ i ´ E p ξ qu ´ V ar p ξ q ‰ p φ ˝ T ´ q b p φ ˝ T ´ q` V ar p ξ q ” p φ ˝ T ´ q b p φ ˝ T ´ q ´ φ b φ ı ` E p ξ qp φ ˝ T ´ ´ φ q b p φ ˝ T ´ ´ φ q . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ` n ´ E p ξ q n ÿ i “ t ξ i ´ E p ξ qu ” p φ ˝ T ´ q b p φ ˝ T ´ ´ φ q ` p φ ˝ T ´ ´ φ q b p φ ˝ T ´ q ı+ The first term on the right hand side of the above equality converges in distribution to N p , E t ξ ´ E p ξ qu q φ b φ since T Ñ Id as n Ñ 8 almost surely. For the latter reason, the third and the fourthterms converge to zero in probability as n Ñ 8 . For the second term, note that p φ ˝ T ´ q b p φ ˝ T ´ q ´ φ b φ “ p φ ˝ T ´ ´ φ q b φ ` p φ ˝ T ´ q b p φ ˝ T ´ ´ φ q . Thus, by similar arguments as in part (c) earlier, and the continuity of the mapping p f, g q ÞÑ f b g from L pr , s q to the space of Hilbert Schmidt operators, we have that the second term converges indistribution to V ar p ξ qtp Y ˆ φ q b φ ` φ b p Y ˆ φ qu . Combining the above observations and the factthat ? nS Ñ ? n p x K r ´ K q d Ñ N p , E t ξ ´ E p ξ qu q φ b φ ` V ar p ξ qtp Y ˆ φ q b φ ` φ b p Y ˆ φ qu as n Ñ 8 .In order to prove the weak convergence of the empirical process t? n p p K r p s, t q ´ K p s, t qq : s, t P r , su in C pr , s q , we follow the same decomposition as in the proof of the weak convergence of the operatorsin the Hilbert Schmidt topology. Now, note that the proof of part (c) of the theorem implies that theempirical process t? n p φ p T ´ p t qq ´ φ p t qq : t P r , su in C r , s converges in distribution to the process t Y p t q φ p t q : t P r , su in C r , s . This fact and the same arguments as in part (f) yield t? n p p K r p s, t q ´ K p s, t qq : s, t P r , su d Ñ t Zφ p s q φ p t q ` V ar p ξ qr Y p s q φ p s q φ p t q ` Y p t q φ p t q φ p s qs : s, t P r , su as n Ñ 8 , where Z „ N p , E t ξ ´ E p ξ qu q does not depend on s, t .For the weak convergence of p φ , first note that x K r “ n ´ ř ni “ p ξ i ´ ξ q p φ ˝ T ´ q b p φ ˝ T ´ q . Thus, p φ “ p φ ˝ T ´ q{|| φ ˝ T ´ || . Now, p φ ´ φ “ φ ˝ T ´ || φ ˝ T ´ || ´ φ “ φ ˝ T ´ ´ φ || φ ˝ T ´ || ´ φ p|| φ ˝ T ´ || ´ q|| φ ˝ T ´ || “ φ ˝ T ´ ´ φ || φ ˝ T ´ || ´ φ p|| φ ˝ T ´ || ´ q|| φ ˝ T ´ || p|| φ ˝ T ´ || ` q“ φ ˝ T ´ ´ φ || φ ˝ T ´ || ´ φ p|| φ ˝ T ´ ´ φ || ` x φ ˝ T ´ ´ φ, φ yq|| φ ˝ T ´ || p|| φ ˝ T ´ || ` q . Using the weak convergence of ? n p φ ˝ T ´ ´ φ q to Y ˆ φ in the C r , s topology, we have that ? n p p φ ´ φ q d Ñ Y ˆ φ ´ ˆ x Y ˆ φ , φ y φ “ Y ˆ φ ´ x Y ˆ φ , φ y φ as n Ñ 8 in the C r , s topology.Finally, for the weak convergence of the p ξ i ’s, observe that ? n p p ξ i ´ ξ i q “ ? n tx p X i ´ X i , p φ ´ φ y ` x p X i ´ X i , φ y ` x X i , p φ ´ φ yu“ ? n t ξ i xp φ ˝ T ´ ´ φ q , p p φ ´ φ qy ` ξ i xp φ ˝ T ´ ´ φ q , φ y ` ξ i x φ, p p φ ´ φ qyu . Using the independence of ξ i and the T j ’s, and using the asymptotic distributions obtained above andin part (c), it follows that ? n p p ξ i ´ ξ i q d Ñ ξ i tx Y ˆ φ , φ y ` x φ, p Y ˆ φ ´ ´ t|| Y ˆ φ ` φ || ´ u φ qyu as n Ñ 8 . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations In order to prove Theorem 4, we will first prove a few crucial results.
Proposition 1.
Assume that φ P C r , s and inf t Pr , s T p u q ě δ ą almost surely for a deterministicconstant δ . Then, for each i ě , we have ř r ´ j “ | φ p s i,j ` q ´ φ p s i,j q| “ ş | φ p u q| du ` B ,r almost surely,where B ,r “ O p r ´ q almost surely with the O p q term being uniform in i . Further, ř j P I t | φ p s i,j ` q ´ φ p s i,j q| “ ş T ´ i p t q | φ p u q| du ` B ,r p t q for all t P r , s almost surely, where || B ,r || “ O p r ´ q almost surelywith the O p q term being uniform in i . Consequently, we have ř r ´ j “ | φ p t j ` q ´ φ p t j q| “ ş | φ p u q| du ` B ,r and ř j P I t | φ p t j ` q ´ φ p t j q| “ ş t | φ p u q| du ` B ,r p t q for all t P r , s almost surely, where B ,r “ O p r ´ q and || B ,r || “ O p r ´ q almost surely.Proof of Proposition 1. First, let us define t “ t r ` “ t ą t r ă
1. Then, t t j : 0 ď j ď r ` u is a partition of r , s . Consider the sum S i “ ř rj “ | φ p s i,j ` q ´ φ p s i,j q| and note thatby a Taylor expansion, S i “ ř rj “ p s i,j ` ´ s i,j q| φ p r s i,j q| , where r s i,j P r s i,j , s i,j ` s . The right hand side isa Riemann sum approximation of ş | φ p u q| du with t s i,j “ T ´ i p t j q : 0 ď j ď r ` u as the partition of r , s , since T i is a strictly increasing bijection. Thus, writing ∆ “ max ď j ď r p s i,j ` ´ s i,j q , we have | S i ´ ż | φ p u q| du | ď sup t| | φ p t q| ´ | φ p s q| | : s, t P r , s and | t ´ s | ď ∆ uď sup t| φ p t q ´ φ p s q| : s, t P r , s and | t ´ s | ď ∆ uď || φ || ∆ . Now for any 0 ď j ď r , we have s i,j ` ´ s i,j “ T ´ i p t j ` q ´ T ´ i p t j q “ p t j ` ´ t j q{ T i p T ´ i p r t j qq , for some r t j P r t j , t j ` s . Using the assumption in the theorem and that on the grid, it now follows that∆ “ max ď j ď r p s i,j ` ´ s i,j q ď δ ´ O p r ´ q uniformly on i . Thus, | S i ´ ş | φ p u q| du | ď || φ || δ ´ O p r ´ q .To complete the first part of the proof, note that ř r ´ j “ | φ p s i,j ` q ´ φ p s i,j q| differs from S i by at most twoterms, and both of these terms are O p r ´ q uniformly over i by the same arguments as those for S i .For the second part, fix any t P r , s . Defining B ,r p q “
0, there is nothing to prove when t “
0. For t ą
0, define t “
0. If j ˚ is the largest j for which t j ` ď t , define t j ˚ ` “ t if t j ˚ ` ă t . Note that j ˚ depends on t . Then, t t j : 0 ď j ď j ˚ ` u is a partition of r , t s , and hence t s i,j “ T ´ i p t j q : 0 ď j ď j ˚ ` u is a partition of r , T ´ i p t qs . Define R i p t q “ ř j ˚ j “ | φ p s i,j ` q´ φ p s i,j q| . Then, by similar arguments as earlier,we have ˇˇˇˇˇ R i p t q ´ ż T ´ i p t q | φ p u q| du ˇˇˇˇˇ ď || φ || δ ´ max ď j ď j ˚ p s i,j ` ´ s i,j q “ B ,r p t q , say . Thus, || B ,r || ď O p r ´ q uniformly over i . The proof is completed upon noting that R i p t q differs from ř j P I t | φ p s i,j ` q ´ φ p s i,j q| by at most two terms, and both of them are O p r ´ q uniformly over i by thesame argument as before.The last statement of the proposition is an immediate corollary for the case T “ Id almost surely.Note that the B l,r ’s are not continuous functions, but we can still define their || ¨ || norms as all ofthem are uniformly bounded functions on r , s . The following corollary is a consequence of Proposition1 and the fact that ş | φ p u q| du P p , . Corollary 1.
Under the assumptions of Proposition 1, we have r F i,d p t q “ r F i p t q ` C ,r p t q for all t P r , s almost surely for each i ě , where || C ,r || “ O p r ´ q almost surely uniformly over i . Further, F d p t q “ F φ p t q ` C ,r p t q for all t P r , s , where || C ,r || “ O p r ´ q . Lemma 2.
Assume that ş | φ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą . Then, | F ´ φ p s q ´ F ´ φ p t q| ď C φ | t ´ s | (cid:15) {p ` (cid:15) q ,where C ` (cid:15)φ “ ş | φ p u q| ´ (cid:15) du . In other words, F ´ φ is α -H¨older continuous for α “ (cid:15) {p ` (cid:15) q . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Proof of Lemma 2.
Note that the assumption in the statement of the lemma implies that φ ą r , s . This fact along with Zarecki’s theorem on theinverse of an absolutely continuous function (see, e.g., p. 271 in Natanson (1955)) applied to the function F φ yields that F ´ φ is absolutely continuous on r , s . Thus, F ´ φ p t q “ ş t r F φ p F ´ φ p u qqs ´ du . Now, usingH¨older’s inequality and some algebraic manipulations, we obtain | F ´ φ p s q ´ F ´ φ p t q| ď || φ || | t ´ s | { p ˆż | φ p u q| ´ q ` du ˙ { q . To complete the proof, choose q “ ` (cid:15) , which implies that p “ p ` (cid:15) q{ (cid:15) . Proposition 2.
Assume that the conditions of Proposition 1 and Lemma 2 hold. Let α “ (cid:15) {p ` (cid:15) q asin Lemma 2. Then, for each i ě ,(a) r F ´ i is α -H¨older continuous almost surely.(b) r F ´ i,d p t q “ r F ´ i p t q ` || T i || D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almostsurely uniformly over i .Proof of Proposition 2. (a) Using the definition of r F i , it follows that | r F ´ i p s q ´ r F ´ i p t q| “ | T i p F ´ φ p s qq ´ T i p F ´ φ p t qq| ď || T i || | F ´ φ p s q ´ F ´ φ p t q| ď || T i || C φ | s ´ t | α , where the last inequality follows from Lemma 2. This completes the proof of part (a).(b) As mentioned earlier, r F i,d is a c`adl`ag step function with maximum jump discontinuities given by A i,r .Thus, if t P p r F i,d p t j q , r F i,d p t j ` qs for any 1 ď j ď r ´
1, it follows that r F i,d p r F ´ i,d p t qq “ r F i,d p t j ` q “ t ` q i,j,r p t q ,where q i,j,r p t q “ r F i,d p t j ` q ´ t . So, | q i,j,r p t q| ď r F i,d p t j ` q ´ r F i,d p t j q ď A i,r , where A i,r is the maximumstep size of r F i,d defined earlier. Now, from arguments similar to those used in Proposition 1, it followsthat A i,r “ O p r ´ q uniformly in i . Thus, r F i,d p r F ´ i,d p t qq “ t ` Q i,r p t q for all t P r , s almost surely, where || Q r || “ O p r ´ q almost surely uniformly over i .From Proposition 1, we know that r F i,d p s q “ r F i p s q ` C ,r p s q for all s P r , s almost surely, where || C ,r || “ O p r ´ q almost surely uniformly over i . Letting s “ r F ´ i,d p t q , we now have t ` Q r p t q “ r F i p r F ´ i,d p t qq ` C ,r p r F ´ i,d p t qq for all t almost surely. Re-arranging terms, we obtain r F ´ i,d p t q “ r F ´ i p t ` Q ,r p t qq for all t P r , s almost surely, where Q ,r p t q “ Q r p t q ´ C ,r p r F ´ i,d p t qq . Thus, || Q ,r || “ O p r ´ q almostsurely uniformly over i . Now, using part (a), we can conclude that r F ´ i,d p t q “ r F ´ i p t q ` || T i || D ,r p t q for all t P r , s almost surely, where D ,r p t q “ C φ | Q ,r p t q| α satisfies || D ,r || “ O p r ´ α q almost surely uniformlyover i . Proof of Theorem 4. (a) Note that p F ˚ d p t q “ n ´ n ÿ i “ r F ´ i,d p t q “ n ´ n ÿ i “ t r F ´ i p t q ` || T i || D ,r p t qu “ p F ´ p t q ` ˜ n ´ n ÿ i “ || T i || D ,r p t q ¸ “ p F ´ p t q ` D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely since || D ,r || “ O p r ´ α q almostsurely and n ´ ř ni “ || T i || “ E p|| T || q ` o p q almost surely. Thus, it follows from Theorem 2.18 inVillani (2003) that d W p p F d , F φ q “ || p F ˚ d ´ F ´ φ || ď || p F ´ ´ F ´ φ || ` || D ,r || ď d W p p F , F φ q ` O p r ´ α q almost surely. Combining the above statement with part (a) of Theorem 2 and 3 completes the proof ofpart (a) of Theorem 4.(b) Next, note that p T ˚ i,d p t q “ n ´ ÿ l “ r F ´ l,d p r F i,d p t qq “ n ´ n ÿ l “ ! r F ´ l p r F i,d p t qq ` || T i || D ,r p r F i,d p t qq ) . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations “ n ´ n ÿ l “ r F ´ l p r F i p t q ` C ,r p t qq ` n ´ n ÿ i “ || T i || D ,r p r F i,d p t qq“ n ´ n ÿ l “ ” r F ´ l p r F i p t qq ` ! r F ´ l p r F i p t q ` C ,r p t qq ´ r F ´ l p r F i p t qq )ı ` n ´ n ÿ i “ || T i || D ,r p r F i,d p t qq“ p T ´ i p t q ` n ´ n ÿ l “ ! r F ´ l p r F i p t q ` C ,r p t qq ´ r F ´ l p r F i p t qq ) ` n ´ n ÿ i “ || T i || D ,r p r F i,d p t qq , for all t P r , s almost surely. By part (a) of Proposition 2, we have |t r F ´ l p r F i p t q` C ,r p t qq´ r F ´ l p r F i p t qqu| ď|| T i || D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely uniformly over i . Thus, sup t Pr , s n ´ ř ni “ |t r F ´ l p r F i p t q ` C ,r p t qq ´ r F ´ l p r F i p t qqu| ď t E p|| T || q ` o p qu O p r ´ α q almostsurely. Similar arguments yield sup t Pr , s n ´ ř ni “ || T i || | D ,r p r F i,d p t qq| ď t E p|| T || q ` o p qu O p r ´ α q al-most surely. Thus, p T ˚ i,d p t q “ p T ´ i p t q ` D ,r p t q , (5)for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely uniformly over i . Consequently, || p T ˚ i,d ´ T ´ i || ď || p T ´ i ´ T ´ i || ` O p r ´ α q almost surely, where the O p q term is uniform over i . This along with part (b) of Theorem 2 shows that || p T ˚ i,d ´ T ´ i || Ñ n Ñ 8 almost surely for all i ě
1. Equation (5) implies that ? n p p T ˚ i,d ´ T ´ i q “? n p p T ´ i ´ T ´ i q ` O p? nr ´ α q in L r , s . This in conjunction with part (b) of Theorem 3 proves that ? n p p T ˚ i,d ´ T ´ i q has the same asymptotic distribution as ? n p p T ´ i ´ T ´ i q in the L r , s topology.Next we consider p T i,d p t q “ r F ´ i,d p p F d p t qq “ r F ´ i p p F d p t qq ` || T i || D ,r p p F d p t qq for all t P r , s almostsurely (from part (b) of Proposition 2). Note that p F d p t q “ t n ´ ř nl “ r F ´ l,d u ´ p t q “ t G n ` D ,r u ´ p t q , where G n p s q “ n ´ ř nl “ r F ´ l p s q and D ,r p s q “ n ´ ř nl “ || T l || D ,r p s q . Thus, || D ,r || “ O p r ´ α q . Also notethat G n is a strictly increasing homeomorphism on r , s . Define r G n,r “ G n ` D ,r “ n ´ ř nl “ r F ´ l,d sothat r G n,r is an increasing function (not necessarily strictly increasing) from r , s onto r , s . In fact,since each r F ´ l,d is left continuous and has right limits (being the generalized inverse of the c`adl`ag function r F l,d ), r G n,r is also left continuous and has right limits.If t P p r G n,r p v q , r G n,r p v `qs for some v P r , s with r G n,r p v `q ą r G n,r p v q , then r G n,r p p F d p t qq “ r G n,r p r G ´ n,r p t qq “ r G n,r p v q “ t ` p r G n,r p v q ´ t q . Now, | r G n,r p v q ´ t | ď | r G n,r p v `q ´ r G n,r p v q| “ | G n p v `q ´ G n p v q ` D ,r p v `q ´ D ,r p v q| “ | D ,r p v `q ´ D ,r p v q| “ O p r ´ α q uniformly in t almost surely, where the penultimate equalityfollows from the continuity of G n . So, in these cases, G n p p F d p t qq “ r G n,r p p F d p t qq ´ D ,r p p F d p t qq “ t ` O p r ´ α q uniformly in t almost surely, i.e., t “ G n p p F d p t qq ` O p r ´ α qq uniformly in t almost surely.Next, suppose that for some v ă v , we have r G n,r p v q “ r G n,r p v q , r G n,r p v q ă r G n,r p v q for v ă v and r G n,r p v q ą r G n,r p v q for v ą v . If t “ r G n,r p v q “ r G n,r p v q , then r G n,r p p F d p t qq “ t if v is a continuity pointof r G n,r . If not, then this is already taken care of in the previous paragraph. In the former case, we have t “ G n p p F d p t qq ` O p r ´ α q uniformly over t almost surely.Finally, if t is a point of both continuity and strict increment of r G n,r , then r G n,r p p F d p t qq “ t as well,which implies that t “ G n p p F d p t qq ` O p r ´ α q uniformly over t almost surely. Thus, all possibilities areexhausted. Let us denote the O p r ´ α q term by D ,r p¨q .Now note that G ´ n “ p n ´ ř nl “ r F ´ l q ´ “ p n ´ ř nl “ T l ˝ F ´ φ q ´ “ F φ ˝ T ´ . Thus, it follows from ourwork above that p F d p t q “ F φ t T ´ p t ´ D ,r p t qqu . Recall that r F ´ i “ T i ˝ F ´ φ and that p T i,d p t q “ r F ´ i p p F d p t qq`|| T i || D ,r p p F d p t qq for all t P r , s almost surely as obtained earlier. Since p F d p t q “ F φ t T ´ p t ´ D ,r p t qqu , itfollows from the decomposition of p T i,d p t q that p T i,d p t q “ T i t T ´ p t ´ D ,r p t qqu ` || T i || D ,r p p F d p t qq for all t Pr , s almost surely. Since inf t Pr , s T p t q ě δ ą
0, it follows that inf t Pr , s T p t q ě n ´ ř nl “ inf t Pr , s T l p t q ě . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations δ ą
0. So, by Taylor expansion, we have T i t T ´ p t ´ D ,r p t qqu “ T i p T ´ p t qq ` || T i || D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely, where the O p q term is uniform over i .Combining the above findings, we arrive at p T i,d p t q “ r F ´ i p p F d p t qq ` || T i || D ,r p p F d p t qq “ r F ´ i p G ´ n p t q ` D ,r p t qq ` || T i || D ,r p p F d p t qq“ T i p T ´ p t qq ` || T i || D ,r p t q ` || T i || D ,r p p F d p t qq , where the last equality follows from the discussion in the previous paragraph. Since || D ,r || “ O p r ´ α q almost surely uniformly over i , we obtain p T i,d p t q “ p T i p t q ` || T i || D ,r p t q for all t P r , s almost surely, where || D r, || “ O p r ´ α q almost surely uniformly over i . Consequently, || p T i,d ´ T i || ď || p T i ´ T i || ` O p q r ´ α , almost surely. Combined with part (b) of Theorem 2, this shows that || p T i,d ´ T i || Ñ n Ñ 8 almostsurely for all i ě
1. Equation (6) implies that ? n p p T i,d ´ T i q “ ? n p p T i ´ T i q ` O p? nr ´ α q in L r , s . This inconjunction with part (b) of Theorem 3 proves that ? n p p T i,d ´ T i q has the same asymptotic distributionas ? n p p T i ´ T i q in the L r , s topology. This completes the proof of part (b) of Theorem 4.(c) Next we register the warped functional observations. As mentioned earlier, since the warped observa-tions are only recorded over a discrete grid, the registration algorithm in the fully observed case will notwork. So, as a pre-processing step, we need to first smooth the warped discrete observations. We do thisby using the Nadaraya-Watson kernel regression estimator as follows. Let k p¨q be any kernel supportedon r´ , s and choose a bandwidth parameter h ą
0. Then, the smooth version of p X i,d is given by X : i p t q “ ř rj “ k ´ t ´ t j h ¯ r X i p t j q ř rj “ k ´ t ´ t j h ¯ “ ξ i ř rj “ k ´ t ´ t j h ¯ φ p T ´ i p t j qq ř rj “ k ´ t ´ t j h ¯ , t P r , s . Now, note that | X : i p t q ´ r X i p t q| “ ˇˇˇˇˇˇ ξ i ř rj “ k ´ t ´ t j h ¯ t φ p T ´ i p t j qq ´ φ p T ´ i p t qqu ř rj “ k ´ t ´ t j h ¯ ˇˇˇˇˇˇ ď || φ || δ ´ | ξ i | ř rj “ k ´ t ´ t j h ¯ | t j ´ t | ř rj “ k ´ t ´ t j h ¯ ď c | ξ i | h, for all t P r , s almost surely, where c is a constant not depending on i and t . The first inequality abovefollows from arguments similar to those used in the proof of Theorem 1. The second inequality followsform the fact that k p¨q is supported on r´ , s so that only those j ’s in the numerator for which | t j ´ t | ď h will contribute to the sum. Thus, || X : i ´ r X i || ď c | ξ i | h almost surely.We register the warped discrete observation r X i,d by defining p X ˚ i “ X : i ˝ p T i,d for each 1 ď i ď n .Observe that | p X ˚ i p t q ´ p X i p t q| ď | p X ˚ i p t q ´ r X i p p T i,d p t qq| ` | r X i p p T i,d p t qq ´ p X i p t q|ď || X : i ´ r X i || ` | ξ i | | φ p T ´ i p p T i,d p t qqq ´ φ p T ´ i p p T i p t qqq|ď c | ξ i | h ` | ξ i | | φ p T ´ i p p T i p t q ` || T i || D ,r p t qqq ´ φ p T ´ i p p T i p t qqq|ď c | ξ i | h ` | ξ i | || T i || | D ,r p t q| || φ || δ ´ ď O p q| ξ i |p h ` || T i || r ´ α q (6)for all t P r , s almost surely, where the O p q term is uniform in i and t . The last two inequalities abovefollow from a first order Taylor expansion and the fact that || D ,r || “ O p r ´ α q almost surely uniformlyover i . Hence, || p X ˚ i ´ p X i || “ O p q| ξ i ||p h ` || T i || r ´ α q . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations almost surely. In conjunction with part (c) of Theorem 2, this shows that || p X ˚ i ´ X i || Ñ n Ñ 8 almost surely for all i ě
1. Equation (6) implies that ? n p p X ˚ i ´ X i q “ ? n p p X i ´ X i q ` O p? n p h ` r ´ α qq in L r , s . Invoking part (c) of Theorem 3 thus establishes that ? n p p X ˚ i ´ X i q has the same asymptoticdistribution as ? n p p X i ´ X i q in the L r , s topology. This completes the proof of part (c) of Theorem 4.(d) Next, define the random measure induced by p X ˚ i as p F ˚ i p t q “ ÿ j P I t | p X ˚ i p t j ` q ´ p X ˚ i p t j q| N r ´ ÿ j “ | p X ˚ i p t j ` q ´ p X ˚ i p t j q|“ ÿ j P I t | p X : i p p T i,d p t j ` qq ´ p X : i p p T i,d p t j qq| N r ´ ÿ j “ | p X : i p p T i,d p t j ` qq ´ p X : i p p T i,d p t j qq|“ $&% ÿ j P I t | r X i p p T i,d p t j ` qq ´ r X i p p T i,d p t j qq| ` O p h q| ξ i | ,.- N r ´ ÿ j “ | r X i p p T i,d p t j ` qq ´ r X i p p T i,d p t j qq| ` O p h q| ξ i | + for all t P r , s almost surely, where the O p q term is uniform in i and t , and the last equality followsfrom the fact that || X : i ´ r X i || ď c | ξ i | h almost surely. Also note that by definition of r X i , the term | ξ i | cancels from the numerator and the denominator.Using the fact that p T i,d p t q “ p T i p t q`|| T i || D ,r p t q with || D ,r || “ O p r ´ α q almost surely, and argumentssimilar to those used in the proof of Proposition 1, one obtains p F ˚ i p t q “ p F p t q ` O p qp h ` || T i || r ´ α q for all t P r , s almost surely, where the O p q term is uniform in i and t almost surely. Now, using Lemma2 and arguments similar to those used in the proof of part (b) of Proposition 2, we have p p F ˚ i q ´ p t q “ p F ´ p t q ` O p q r ´ α p h ` || T i || r ´ α q for all t P r , s almost surely, where the O p q term is uniform in i and t almost surely. Thus, d W p p F ˚ i , F φ q “ ||p p F ˚ i q ´ ´ F ´ φ || ď || p F ´ ´ F ´ φ || ` O p q r ´ α p h ` r ´ α q“ d W p p F , F φ q ` O p q r ´ α p h ` r ´ α q almost surely. Combining the above statement with part (d) of Theorems 2 and 3 completes the proofof part (d) of Theorem 4.(e) Next, define X r ˚ “ n ´ ř ni “ p X ˚ i . Since || p X ˚ i ´ p X i || “ O p q| ξ i ||p h ` || T i || r ´ α q almost surely, itfollows that ||p X r ˚ ´ µ q ´ p X r ´ µ q|| ď n ´ n ÿ i “ || p X ˚ i ´ p X i || ď O p qt h ` r ´ α n ´ n ÿ i “ || T i || uď O p qp h ` r ´ α q (7)almost surely since E p|| T || q ă 8 . Along with part (e) of Theorem 2, this shows that || X r ˚ ´ µ || Ñ n Ñ 8 almost surely. Equation (7) implies that ? n p X r ˚ ´ µ q “ ? n p X r ´ µ q ` O p? n p h ` r ´ α qq in L r , s . So by part (e) of Theorem 3 we see that ? n p X r ˚ ´ µ q has the same asymptotic distribution as ? n p X r ´ µ q in the L r , s topology, and the proof of part (e) of Theorem 4 is complete.(f) Next, we consider the empirical covariance operator of the p X ˚ i ’s which we will denote by x K r ˚ “ n ´ ř ni “ p p X ˚ i ´ X r ˚ q b p p X ˚ i ´ X r ˚ q . Recall S “ n ´ ř ni “ p p X i ´ µ q b p p X i ´ µ q from the proof of part (f)of Theorem 3. Now, some straightforward manipulations yield x K r ˚ “ S ` n ´ n ÿ i “ p p X ˚ i ´ p X i q b p p X ˚ i ´ p X i q ´ p X r ˚ ´ µ q b p X r ˚ ´ µ q . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ` n ´ n ÿ i “ tp p X ˚ i ´ p X i q b p p X i ´ µ q ` p p X i ´ µ q b p p X ˚ i ´ p X i qu“ S ` W ´ W ` W , say . Note that ||| W ||| ď n ´ ř ni “ || p X ˚ i ´ p X i || ď O p qt h n ´ ř ni “ | ξ i | ` r ´ α n ´ ř ni “ || T i || u “ O p qp h ` r ´ α q almost surely. Next, from the previous paragraph, it follows that ||| W ||| ď || X r ˚ ´ µ || ď O p qp h ` r ´ α q ` || X r ´ µ || . Moreover, ||| W ||| ď n ´ ř ni “ || p X ˚ i ´ p X i || || p X i ´ µ || ď O p q n ´ ř ni “ t h | ξ i | `|| T i || r ´ α u|| p X i ´ µ || almost surely. Observe that n ´ n ÿ i “ | ξ i | || p X i ´ µ || “ n ´ n ÿ i “ | ξ i | || ξ i φ ˝ T ´ ´ E p ξ q φ || ď n ´ n ÿ i “ | ξ i | | ξ i ´ E p ξ q| || φ ˝ T ´ || ` n ´ n ÿ i “ | ξ i | | E p ξ q| || φ ˝ T ´ ´ φ || . Since || φ ˝ T ´ ´ φ || Ñ O p q almost surely, andthe second term is o p q almost surely. Similar arguments show that n ´ ř ni “ || T i || || p X i ´ µ || “ O p q almost surely. Thus, ||| W ||| ď O p qp h ` r ´ α q almost surely. Also, S in the proof of part (f) of Theorem3 satsifies ||| S ||| “ O P p n ´ q . Combining the above facts and using the decomposition of x K r in the proofof part (f) of Theorem 3, it follows that x K r ˚ “ S ` O p qp h ` r ´ α ` || X r ´ µ || q “ x K r ` O p qp h ` r ´ α ` || X r ´ µ || q (8)almost surely. This along with part (f) of Theorem 2 shows that ||| x K r ˚ ´ K ||| Ñ n Ñ 8 almostsurely. By part (e) of Theorem 3, it follows that ? n || X r ´ µ || “ O P p q as n Ñ 8 . So, equation (8)implies that ? n p x K r ˚ ´ K q “ ? n p x K r ´ K q ` O p? n p h ` r ´ α qq in L r , s . This in conjunction with part(f) of Theorem 3 proves that ? n p x K r ˚ ´ K q has the same asymptotic distribution as ? n p x K r ´ K q inthe Hilbert-Schmidt topology.For the convergence of the empirical covariance kernel p K r ˚ p s, t q “ n ´ ř ni “ r p X ˚ i p s q ´ X r ˚ p s qsr p X ˚ i p t q ´ X r ˚ p t qs , we follow the same decomposition as above for the case of the operator. Noting the all thebounds used for that proof remain valid in the sup-norm and using the same arguments, we arrive that p K r ˚ p s, t q “ p K r p s, t q ` O p qp h ` r ´ α ` || X r ´ µ || q (9)for all s, t P r , s almost surely, where the O p q term is uniform in s, t almost surely. This along withpart (f) of Theorem 2 shows that || p K r ˚ ´ K || Ñ n Ñ 8 almost surely. Equation (9) implies that t? n p p K r ˚ p s, t q´ K p s, t qq : s, t P r , su “ t? n p p K r p s, t q´ K p s, t qq : s, t P r , su` O p? n p h ` r ´ α qq in L r , s with the O p q term being uniform in s, t . This in conjunction with part (f) of Theorem 3 proves that t? n p p K r ˚ p s, t q ´ K p s, t qq : s, t P r , su has the same asymptotic distribution as t? n p p K r p s, t q ´ K p s, t qq : s, t P r , su in the L pr , s q topology.To prove the strong consistency and the weak convergence of the estimated eigenfunction, we will useperturbation bounds for compact operators (see, e.g., Ch. 5 of Hsing and Eubank (2015)). The leadingeigenfunction p φ ˚ of x K r ˚ satisfies the inequality || p φ ˚ ´ φ || ď ? λ ´ ||| x K r ˚ ´ K ||| Ñ n Ñ 8 almostsurely. Further, Theorem 5.1.8 of Hsing and Eubank (2015), specifically equation (5.27), implies that ? n p p φ ˚ ´ φ q has the same asymptotic distribution (in L r , s ) as that of S ? n p x K r ˚ ´ K q φ , where, inour setup, S “ ´ λ ´ p I ´ φ b φ q with λ “ V ar p ξ q being the leading eigenvalue of K , and I being theidentity operator. Thus, from the results already establishes, it follows that the asymptotic distributionof ? n p p φ ˚ ´ φ q is that of ´ λ ´ p I ´ φ b φ q? n p x K r ´ K q φ . Using the expression of the asymptoticdistribution of ? n p x K r ´ K q obtained in part (f) of Theorem 3 and some simple calculations, it followsthat the asymptotic distribution of ? n p p φ ˚ ´ φ q is that of Y ˆ φ ´ x Y ˆ φ , φ y φ , which is the same as inTheorem 3.The proof of the strong consistency and the weak convergence of p ξ i ˚ follows in direct analogy to thatof p ξ i upon using part (c) and the above facts. The proof of part (f) of Theorem 4 is now complete. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Proof of Theorem 5.
First observe that | r F i,w p t q ´ r F i p t q| ď ˇˇˇˇˇ ş t | p X p q i,w p u q| du ş | p X p q i,w p u q| du ´ ş t | p X p q i,w p u q| du ş | r X i p u q| du ˇˇˇˇˇ ` ˇˇˇˇˇ ş t | p X p q i,w p u q| du ş | r X i p u q| du ´ ş t | r X i p u q| du ş | r X i p u q| du ˇˇˇˇˇ ď ş | p X p q i,w p u q ´ r X i p u q| du ş | r X i p u q| du ď || p X p q i,w ´ r X i || | ξ i | ş | φ p u q| du “ d φ | ξ i | ´ A i,r , say . ñ || r F i,w ´ r F i || ď d φ | ξ i | ´ A i,r . (10)Since the term A i,r will be key for our proof, we will first bound E t A i,r u . To achieve this, we willfirst provide bounds on E t A i,r | ξ i , T i u using standard tools from non-parametric regression. So, we willhave to estimate the MSE for the regression problem Y ij “ ξ i φ p T ´ i p t j qq ` (cid:15) ij and integrate this MSEover u P r , s , when ξ i and T i are fixed. The expression for the MSE in the deterministic design caseis the same as the conditional MSE (given design points) in the random design case with the designdistribution being uniform on r , s . Next, observe that V ar p p X i,w p u q| ξ i , T i q does not depend on ξ i and T i and is thus uniform over i (since the (cid:15) ij ’s are i.i.d.). For u P r h , ´ h s , the expression of thisvariance is given in p. 137 in Wand and Jones (1995) and equals O pp rh q ´ q , where the O p q termdepends on k , is bounded and is uniform over u P r h , ´ h s . Next, we have to take into account theboundary points. Let u “ αh for some α P r , q . It follows from a similar analysis that even in this case, V ar p p X i,w p u q| ξ i , T i q “ O pp rh q ´ q , where the O p q term is integrable over α P r , q (see, e.g. pp. 244-247in Schimek (2000)). Similar estimates also hold for t P r ´ h , s , say t “ ´ αh . Hence, we get that V ar p p X i,w p u q| ξ i , T i q “ O pp rh q ´ q for all u P r , s with the O p q term being integrable over u P r , s .Next we consider the bias. In our case the degree of the fitted polynomial is one more than the degreeof derivative estimated. Thus, applying Taylor’s formula and using the expressions in Thm. 9.1 andpp. 244-247 in Schimek (2000), we have | Bias p p X i,w p u q| ξ i , T i q| “ || r X p q i || O p h q ` || r X p q i || o p h q for all u P r , s . Here, the O p q and o p q terms are non-random and are integrable in u P r , s . So, using themoment assumptions on the sup-norm of the derivatives of T , the independence of the ξ i ’s and the T i ’salong with the assumption that inf t Pr , s T p t q ě δ ą
0, it follows that E t A i,r u “ O p h q ` O pp rh q ´ q (11)where the O p q terms are bounded and do not depend on i (the r X i ’s are i.i.d). This also implies (usingMarkov’s inequality) that n ´ n ÿ i “ A i,r “ O P p h ` p rh q ´ q (12)We will now proceed with the rest of the proof. First, let u i,t “ r F ´ i,w p t q . From (10), it follows that r F i p u i,t q “ t ´ r A i,r p t q , where || r A i,r || ď d φ | ξ i | ´ A i,r . Thus, using part (a) of Proposition 2, it follows that | r F ´ i,w p t q´ r F ´ i p t q| “ | u i,t ´ r F ´ i p t q| “ r F ´ i p t ´ r A i,r p t qq´ r F ´ i p t q| ď || T i || c φ | ξ i | ´ α A αi,r for a constant c φ . So, || r F ´ i,w ´ r F ´ i || ď || T i || c φ | ξ i | ´ α A αi,r . Thus, p F ´ e “ n ´ ř ni “ r F ´ i,w “ n ´ ř ni “ r F ´ i ` r B r “ p F ´ ` r B r , where || r B r || ď c φ n ´ ř ni “ || T i || | ξ i | ´ α A αi,r . Define R r “ n ´ ř ni “ || T i || | ξ i | ´ α A αi,r . By H¨older’s inequality,the law of large numbers, independence of T i ’s and ξ i ’s, and (12), we get that R r ď « n ´ n ÿ i “ || T i || {p ´ α q8 | ξ i | ´ α {p ´ α q ff ´ α { « n ´ n ÿ i “ A i,r ff α { ñ R r “ O P p h α ` p rh q ´ α { q (13)(a) Since d W p p F e , F φ q “ || p F ´ e ´ F ´ φ || ď || p F ´ e ´ p F ´ || ` || p F ´ ´ F ´ φ || ď R r ` d W p p F , F φ q , theproof follows using part (a) of Theorem 3 and (13). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (b) Note that p T ´ i,e p t q “ p F ´ e p r F i,w p t qq “ p F ´ p r F i,w p t qq ` r B r p r F i,w p t qq using statements proved earlier. Now,arguments in the proof of part (b) of Theorem 3 along with (10) yield p F ´ p r F i,w p t qq “ p T ´ i p t q ` r C r p t q ,where || C r || ď const.R r . Thus, r T ´ i,e “ r T ´ i ` r C ,r , where || r C ,r || ď const.R r . The proof of the firststatement in part (b) of this theorem now follows using part (b) of Theorem 3 and (13).Next consider p T i,e p t q “ r F ´ i,w p p F e p t qq “ r F ´ i p p F e p t qq ` r C ,r,i p t q , where || r C ,r,i || ď || T i || c φ | ξ i | ´ α A αi,r from statements proved earlier. Note that if p F e p t q “ v then t “ p F ´ e p v q “ p F ´ p v q ` r C ,r p v q , where || r C ,r || ď R r . So, p F e p t q “ v “ p F p t ´ r C ,r p v qq “ F φ p T ´ p t ´ r C ,r p v qqq . Noting that r F ´ i “ T i ˝ F ´ φ , weget that r F ´ i p p F e p t qq “ T i p T ´ p t ´ r C ,r p v qqq “ T i p T ´ p t qq ` || T i || r C ,r p v q “ r F ´ i p p F p t qq ` || T i || r C ,r p v q “ p T i p t q ` || T i || r C ,r p v q , where || r C ,r || ď R r . This follows from arguments similar to those used earlierusing the smoothness of T and the assumption that inf t Pr , s T p t q ě δ ą
0. Thus, we finally have || p T i,e ´ p T i || ď const. t|| T i || R r ` || T i || | ξ i | ´ α A αi,r u . (14)The proof of the second statement of part (b) of this theorem is now completed via part (b) of Theorem3, (11) and (13).For proving part (c) of the theorem we will first have to control E t|| p X i,w ´ r X i || | ξ i , T i u for each i . Recallthat p X i,w p t q “ r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q Y ij p s p t ; h q p s p t ; h q ´ p s p t ; h q , where k ,h p u q “ h ´ k p u { h q and p s l p t ; h q “ r ´ ř rj “ p t j ´ t q l k ,h p t j ´ t q for l “ , ,
2. Call thedenominator p f p t q , which is deterministic. We will first analyse the term q Y i,w p t q which is defined like p X i,w p t q but with r X i p t j q in place of Y ij . Define q Z i,w p t q “ p X i,w p t q ´ q Y i,w p t q .Using Taylor’s formula, we get that r X i p t j q “ r X i p t q ` p t j ´ t q r X i p t q ` ´ p t j ´ t q r X i p t q ` ´ p t j ´ t q r X p q i p r t i,j q , where r t i,j lies between t and t j . Plugging-in this expansion in the definition of q Y i,w p t q , wehave q Y i,w p t q “ r X i p t q ` r X i p t q p s p t ; h q ´ p s p t ; h q p s p t ; h q p s p t ; h q p s p t ; h q ´ p s p t ; h q` r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t qp t j ´ t q r X p q i p r t i,j q p s p t ; h q p s p t ; h q ´ p s p t ; h q“ r X i p t q ` Q i, p t ; h q ` Q i, p t ; h q , sayfor all t P r , s . Note that the term involving r X i p t q vanishes, which plays a crucial role in putting thelocal linear estimator at an advantage over other standard non-parametric regression estimators nearthe boundary of the data set. Thus, | p X i,w p t q ´ r X i p t q| ď | q Y i,w p t q ´ r X i p t q| ` | q Z i,w p t q| ď | Q i, p t ; h q| `| Q i, p t ; h q| ` | q Z i,w p t q| .By approximations of Riemann sums, we have p s l p t ; h q “ h l ş ´ u l k p u q du ` O pp rh q ´ q uniformly for t P r h , ´ h s . Also, for t P r , h q , say, t “ αh with α P r , q , we have p s l p t ; h q “ h l ş ´ α u l k p u q du ` O pp rh q ´ q uniformly for α P r , q . The same estimate also holds for t P p ´ h , s , say, t “ ´ αh . Define µ l,α “ ş ´ α u l k p u q du for l “ , ,
2. These estimates imply that for t P r h , ´ h s , wehave | Q i, p t ; h q| ď ´ || r X i || t h ş ´ u k p u q du ` O pp rh q ´ qu . Further, for boundary points, we have | Q i, p t ; h q| ď ´ || r X i || t h | B α | ` O pp rh q ´ qu for α P r , q , where B α “ r µ ,α ´ µ ,α µ ,α s{r µ ,α µ ,α ´ µ ,α s . In both case, the O p q terms are non-random (hence does not depend on i ) and uniform over choicesof t . Note that the leading term in the squared bias term obtainable from the previous bias expressionis an upper bound for the coefficient of the squared bias term in the general result obtained in Thm. 3.3in Fan and Gijbels (1996). It can be shown using similar arguments that | Q i, p t ; h q| ď || r X p q i || o p h q ,where the o p q term is non-random and uniform over t P r , s . Note that for α “
1, which correspond to . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations t P r h , ´ h s , we have B α “ ş ´ u k p u q du by the symmetry of the kernel. Further, it can be shown thatthe denominator (which is positive by the Cauchy-Schwarz inequality) in the definition of B α is a strictlyincreasing function of α P r , s and hence its infimum is achieved at α “
0, where it takes the value ş u k p u q du ş k p u q du ´p ş uk p u q du q “ : a ą k . Thus sup α Pr , s | B α | ď sup α Pr , s | µ ,α ´ µ ,α µ ,α |{ a ă 8 as the numerator is uniformlybounded in α . Hence, || q Y i,w ´ r X i || ď ´ || r X i || t h sup α Pr , s | B α | ` O pp rh q ´ qu ` || r X p q i || o p h q ď|| r X i || t O p h q ` O pp rh q ´ qu ` || r X p q i || o p h q , where the O p q and the o p q terms are non-random (andhence do not depend on i ).We next control E t|| q Z i,w || u . Observe that this does not depend on r X i and hence does not dependon i (the errors are i.i.d.). Now, E $&% sup t Pr , s ˇˇˇˇˇ r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q (cid:15) ij p s p t ; h q p s p t ; h q ´ p s p t ; h q ˇˇˇˇˇ ,.- ď E sup t Pr , s r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q (cid:15) ij r p s p t ; h q p s p t ; h q ´ p s p t ; h qs + ` E $&% r ÿ j ‰ j (cid:15) ij (cid:15) ij sup t Pr , s t p s p t ; h q ´ p s p t ; h qp t j ´ t qut p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q k ,h p t j ´ t qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs ,.- ď M r ´ sup t Pr , s r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs “ M p rh q ´ sup t Pr , s p s p t ; h q r s p t ; h q ` p s p t ; h q r s p t ; h q ´ p s p t ; h q p s p t ; h q r s p t ; h qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs . (15)The second term on the right hand side of the first inequality vanishes due to the uncorrelatedness ofthe errors and the fact that the t j ’s are non-random. The bound for the first term follows from the a.s.boundedness of the errors, say with bound M . Here, r s l p t ; h q “ r ´ ř rj “ p t j ´ t q l h ´ k tp t j ´ t q{ h u , whichis a definition similar to p s l p t ; h q but with a new “kernel” k . As earlier, by Riemann sum approximations,we have r s l p t ; h q “ h l ş ´ α u l k p u q du ` O pp rh q ´ q for α P r , s with the O p q term being uniform on t P r , s . Define ν l,α “ ş ´ α u l k p u q du . Then, p s p t ; h q r s p t ; h q ` p s p t ; h q r s p t ; h q ´ p s p t ; h q p s p t ; h q r s p t ; h qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs “ µ ,α ν ,α ` µ ,α ν ,α ´ µ ,α µ ,α ν ,α r µ ,α µ ,α ´ µ ,α s ` O pp rh q ´ q “ C α ` O pp rh q ´ q , say , for all α P r , s , where the O p q term is uniform over t P r , s . Note that the expression of C α is the sameas the coefficient of the variance term in the general result obtained in Thm. 3.3 in Fan and Gijbels (1996)(with necessary adaptations). Using (15), it now follows that E t|| q Z i,w || u ď M t sup α Pr , s C α up rh q ´ ` o pp rh q ´ q “ O pp rh q ´ q . Hence, using the assumptions in the theorem and the bounds on || q Y i,w ´ r X i || obtained earlier as well as the previous bound, it follows that E t|| p X i,w ´ r X i || u “ O p h q ` O pp rh q ´ q , (16)where the O p q terms are bounded and do not depend in i . Thus, using Markov’s inequality, we have n ´ n ÿ i “ || p X i,w ´ r X i || “ O P t h ` p rh q ´ { u . (17) . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (c) Recall that p X ˚ i,e p t q “ p X i,w p p T i,e p t qq . Thus, using (14) we have | p X ˚ i,e p t q ´ p X i p t q| ď | p X i,w p p T i,e p t qq ´ r X i p p T i,e p t qq| ` | r X i p p T i,e p t qq ´ r X i p p T i p t qq|ď || p X i,w ´ r X i || ` || r X i || || p T i,e ´ p T i || ñ || p X ˚ i,e ´ p X i || ď || p X i,w ´ r X i || ` const. | ξ i | || T i || t R r ` | ξ i | ´ α A αi,r u . (18)The proof of part (c) of this theorem now follows from (11), (13), (16) and part (c) of Theorem 3.(d) Observe that by (18), we have || X e ˚ ´ n ´ n ÿ i “ p X i || ď n ´ n ÿ i “ || p X i,w ´ r X i || ` const. R r ˜ n ´ n ÿ i “ | ξ i | || T i || ¸ ` n ´ n ÿ i “ | ξ i | ´ α || T i || A αi,r + . The third term on the right hand side can be bounded using H¨older’s inequality and (12) as earlier. Thebounds on the first two terms are given by (17) and (13), respectively. The proof of this part of thetheorem is now completed upon using these bounds along with part (e) of Theorem 3.(e) For the proof of this part of theorem, we will use a decomposition of x K e ˚ similar to that of x K r in the proof of part (f) of Theorem 3. In the same notation, we obtain the following boundson W , W and W . First, note that ||| W ||| ď n ´ ř ni “ || p X ˚ i,e ´ p X i || ď n ´ ř ni “ || p X i,w ´ r X i || ` const.n ´ ř ni “ ξ i || T i || t R r ` | ξ i | ´ α A αi,r u . Applying H¨older’s inequality and using (12), (13) and (16),we get that ||| W ||| “ O P t h `p rh q ´ ` h α `p rh q ´ α u . Next, using part (d) of this theorem and part (e)of Theorem 3, it follows that ||| W ||| ď || X e ˚ ´ µ || ď || X e ˚ ´ n ´ ř ni “ p X i || ` || n ´ ř ni “ p X i ´ µ || “ O P t h α `p rh q ´ α ` h `p rh q ´ ` n ´ u . In a similar manner, ||| W ||| ď n ´ ř ni “ || p X ˚ i,e ´ p X i || || p X i ´ µ || “ O P t h ` p rh q ´ { ` h α ` p rh q ´ α { u by the Cauchy-Schwarz inequality and the bounds obtained earlier.So, using part (f) of Theorem 3, we have ||| x K e ˚ ´ K ||| “ O P t h ` p rh q ´ { ` h α ` p rh q ´ α { ` n ´ { u .The bounds for the leading eigenvalue and eigenfunction follow directly by standard bounds in the theoryof perturbation of operators. Proof of Theorem 6.
First assume that µ ‰
0. Then, define G p t q “ ş t | γ ´ µ p u q| du { ş | γ ´ µ p u q| du “ ş t | µ p u q| du { ş | µ p u q| du and r G i p t q “ G p T ´ i p t qq for t P r , s and i “ , , . . . , n . Some algebraic manipu-lations yield | F i p t q ´ G p t q|ď ş t | Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ` ˇˇˇˇˇ ş t | γ ´ µ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ´ ş t | γ ´ µ p u q| du ş | γ ´ µ p u q| du ˇˇˇˇˇ ď ş | Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du “ Z i . Thus, || F i ´ G || ď Z i almost surely for each i . So || r F i ´ r G i || “ sup t Pr , s | F i p T ´ i p t qq ´ G p T ´ i p t qq| “ sup t Pr , s | F i p t q ´ G p t q| ď Z i , where the last equality holds because T i is a bijection on r , s .Next, let c i “ F ´ i p t q and c “ G ´ p t q . So, t “ F i p c i q “ G p c q . Also, G p c q ´ G p c i q “ G p c q ´ F i p c i q ` F i p c i q ´ G p c i q “ F i p c i q ´ G p c i q so that | G p c q ´ G p c i q| ď || F i ´ G || ď Z i . The conditions of the theoremand arguments as in Lemma 2 earlier show that G ´ is α -H¨older continuous for α “ (cid:15) {p ` (cid:15) q . Thus, fora finite, positive constant C µ , we have | F ´ i p t q ´ G ´ p t q| “ | c i ´ c | “ | G ´ p G p c i qq ´ G ´ p G p c qq| ď C µ | G p c i q ´ G p c q| α ď C µ Z αi . Thus, || F ´ i ´ G ´ || ď C µ Z αi almost surely. Consequently, || r F ´ i ´ r G ´ i || “ sup t Pr , s | T i p F ´ i p t qq ´ T i p G ´ p t qq| ď || T i || || F ´ i ´ G ´ || ď C µ || T i || || Z αi almost surely. Further, || p F ´ ´ p G ´ || ď n n ÿ i “ || r F ´ i ´ r G ´ i || ď C µ n n ÿ i “ || T i || Z αi ď C µ E p|| T || q E p Z α q , . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations as n Ñ 8 almost surely. Here, the last inequality follows from the moment assumptions in the theorem,the Cauchy-Schwarz inequality, the strong law of large numbers and the fact that the Y il ’s (and hencethe X i ’s) are independent of the T i ’s. Thus, | p T ´ i p t q ´ T i p t q| “ | p F ´ p F i p T ´ i p t qqq ´ T ´ i p t q|ď | p F ´ p F i p T ´ i p t qqq ´ p G ´ p F i p T ´ i p t qqq| ` | p G ´ p F i p T ´ i p t qqq ´ p G ´ p G p T ´ i p t qqq|` | p G ´ p G p T ´ i p t qqq ´ T ´ i p t q|ď || p F ´ ´ p G ´ || ` | T p G ´ p F i p T ´ i p t qqqq ´ T p G ´ p G p T ´ i p t qqqq|` | T p G ´ p G p T ´ i p t qqqq ´ T ´ i p t q|ď || p F ´ ´ p G ´ || ` || T || C µ | F i p T ´ i p t qq ´ G p T ´ i p t qq| α ` | T p T ´ i p t qq ´ T ´ i p t q|ď || p F ´ ´ p G ´ || ` C µ n ´ n ÿ j “ || T j || + || F i ´ G || ` || T ´ Id || ď const. (cid:32) E p Z α q ` Z i ` || T ´ Id || ( , ñ || p T ´ i ´ T ´ i || ď const. (cid:32) E p Z α q ` Z i ` || T ´ Id || ( as n Ñ 8 almost surely, where the constant term is uniform in i .Next, let t “ p F ´ p u q . Then, n ´ ř ni “ T i p F ´ i p u qq “ t . Let t ˚ “ n ´ ř ni “ T i p G ´ p u qq “ T p G ´ p u qq “ p G ´ p u q so that u “ p G p t ˚ q . Note that p F p t q ´ p G p t q “ p F p t q ´ p G p t ˚ q ` p G p t ˚ q ´ p G p t q “ p G p t ˚ q ´ p G p t q “ G p T ´ p t ˚ qq ´ G p T ´ p t qq . Thus, using the assumptions in the theorem and arguments similar to thoseused in the proof of part (b) of Theorem 2, we have | p F p t q ´ p G p t q| “ | G p T ´ p t ˚ qq ´ G p T ´ p t qq| ď || G || | T ´ p t ˚ q ´ T ´ p t q|ď || G || δ ´ | t ˚ ´ t |ď || G || δ ´ n ´ n ÿ i “ ˇˇ T i p F ´ i p u qq ´ T i p G ´ p u qq ˇˇ ď || G || δ ´ C µ n ´ n ÿ i “ || T i || Z αi ď const.E p|| T || q E p Z α qñ || p F ´ p G || ď const.E p Z α q as n Ñ 8 almost surely. Therefore, | p T i p t q ´ T i p t q| “ | r F ´ i p p F p t qq ´ T i p t q|ď | r F ´ i p p F p t qq ´ r G ´ i p p F p t qq| ` | r G ´ i p p F p t qq ´ r G ´ i p p G p t qq| ` | r G ´ i p p G p t qq ´ T i p t q|ď || r F ´ i ´ r G ´ i || ` | T i p G ´ p p F p t qq ´ T i p G ´ p p G p t qq| ` | T i p G ´ p p G p t qq ´ T i p t q|ď || r F ´ i ´ r G ´ i || ` || T i || C µ | p F p t q ´ p G p t q| α ` | T i p T ´ p t qq ´ T i p t q|ď || r F ´ i ´ r G ´ i || ` || T i || C µ || p F ´ p G || α ` || T i || || T ´ ´ Id || “ || r F ´ i ´ r G ´ i || ` || T i || C µ || p F ´ p G || α ` || T i || || T ´ Id || ď const. || T i || (cid:32) Z αi ` E α p Z α q ` || T ´ Id || ( ñ || p T i ´ T i || ď const. || T i || (cid:32) Z αi ` E α p Z α q ` || T ´ Id || ( as n Ñ 8 almost surely, where the constant term is uniform in i .Next, note that p X i “ r X i ˝ p T i “ X i ˝ T ´ i ˝ p T i “ µ ˝ T ´ i ˝ p T i ` γ Y i φ ˝ T ´ i ˝ p T i ` γ Y i φ ˝ T ´ i ˝ p T i . So, | p X i p t q ´ X i p t q| ď | µ p T ´ i p p T i p t qqq ´ µ p t q| ` γ | Y i | | φ p T ´ i p p T i p t qqq ´ φ p t q| . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ` γ | Y i | | φ p T ´ i p p T i p t qqq ´ φ p t q|ď | T ´ i p p T i p t qq ´ t | (cid:32) || µ || ` γ | Y i | || φ || ` γ | Y i | || φ || ( ñ || p X i ´ X i || ď || p T ´ i ´ T ´ i || (cid:32) || µ || ` γ | Y i | || φ || ` γ | Y i | || φ || ( ď O P p q (cid:32) E p Z α q ` Z i ` || T ´ Id || ( as n Ñ 8 almost surely, where the O P p q term is independent on n .Next, consider the case when µ “
0. Then, define G p t q “ ş t | φ p u q| du { ş | φ p u q| du . Some algebraicmanipulations yield | F i p t q ´ G p t q| “ ˇˇˇˇˇ ş t | Y i φ p u q ` ηY i φ p u q| du ş | Y i φ p u q ` ηY i φ p u q| du ´ ş t | φ p u q| du ş | φ p u q| du ˇˇˇˇˇ ď η ş | Y i φ p u q| du ş | Y i φ p u q ` ηY i φ p u q| du “ Z i . Similar arguments as in the case of µ ‰ Acknowledgements
We are grateful to Dr. Kristen Irwin (EPFL) for kindly sharing and discussing her
Triboleum data set.
References
Billingsley, P. (1968).
Convergence of probability measures . John Wiley & Sons, Inc., New York-London-Sydney. MR0233396
Bosq, D. (2000).
Linear processes in function spaces . Lecture Notes in Statistics . Springer-Verlag,New York Theory and applications. MR1783138
Claeskens, G. , Silverman, B. W. and
Slaets, L. (2010). A multiresolution approach to time warpingachieved by a Bayesian prior-posterior transfer fitting strategy.
J. R. Stat. Soc. Ser. B Stat. Methodol. Fan, J. and
Gijbels, I. (1996).
Local polynomial modelling and its applications . Monographs on Statisticsand Applied Probability . Chapman & Hall, London. MR1383587 Fritsch, F. N. and
Carlson, R. E. (1980). Monotone piecewise cubic interpolation.
SIAM J. Numer.Anal. Gervini, D. and
Gasser, T. (2004). Self-modelling warping functions.
J. R. Stat. Soc. Ser. B Stat.Methodol. Gervini, D. and
Gasser, T. (2005). Nonparametric maximum likelihood estimation of the structuralmean of a sample of curves.
Biometrika Hadjipantelis, P. Z. , Aston, J. A. D. , Mller, H. G. and
Evans, J. P. (2015). Unifying Ampli-tude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-EffectsModeling of Mandarin Chinese.
J. Amer. Statist. Assoc.
H¨ardle, W. and
Marron, J. S. (1990). Semiparametric comparison of regression curves.
Ann. Statist. Hsing, T. and
Eubank, R. (2015).
Theoretical foundations of functional data analysis, with an in-troduction to linear operators . Wiley Series in Probability and Statistics . John Wiley & Sons, Ltd.,Chichester. MR3379106
Irwin, K. and
Carter, P. (2013). Constraints on the evolution of function-valued traits: a study ofgrowth in Tribolium castaneum.
Journal of evolutionary biology Irwin, K. and
Carter, P. (2014). Artificial selection on larval growth curves in Tribolium: correlatedresponses and constraints.
Journal of evolutionary biology James, G. M. (2007). Curve alignment by moments.
Ann. Appl. Stat. Kneip, A. and
Gasser, T. (1992). Statistical tools to analyze data representing a sample of curves.
Ann. Statist. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Kneip, A. and
Ramsay, J. O. (2008). Combining registration and fitting for functional models.
J.Amer. Statist. Assoc.
Kneip, A. , Li, X. , MacGibbon, K. B. and
Ramsay, J. O. (2000). Curve registration by local regres-sion.
Canad. J. Statist. Lila, E. and
Aston, J. A. D. (2017). Functional and Geometric Statistical Analysis of TexturedSurfaces with an application to Medical Imaging. Tech. Report arXiv:1707.00453v1.
Liu, X. and
M¨uller, H.-G. (2004). Functional convex averaging and synchronization for time-warpedrandom curves.
J. Amer. Statist. Assoc. Marron, J. S. , Ramsay, J. O. , Sangalli, L. M. and
Srivastava, A. (2015). Functional data analysisof amplitude and phase variation.
Statist. Sci. Mø rken, K. (1991). Some identities for products and degree raising of splines.
Constr. Approx. Natanson, I. P. (1955).
Theory of functions of a real variable . Frederick Ungar Publishing Co., NewYork Translated by Leo F. Boron with the collaboration of Edwin Hewitt. MR0067952
Panaretos, V. M. and
Zemel, Y. (2016). Amplitude and phase variation of point processes.
Ann.Statist. Pigoli, D. , Hadjipantelis, P. Z. , Coleman, J. S. and
Aston, J. A. D. (2017). The statisticalanalysis of acoustic phonetic data: exploring differences between spoken Romance languages. Tech.Report arXiv:1507.07587v2.
Ramsay, J. O. and
Li, X. (1998). Curve registration.
J. R. Stat. Soc. Ser. B Stat. Methodol. Ramsay, J. O. and
Silverman, B. W. (2005).
Functional data analysis , second ed.
Springer Series inStatistics . Springer, New York. MR2168993
Rønn, B. B. (2001). Nonparametric maximum likelihood estimation for shifted curves.
J. R. Stat. Soc.Ser. B Stat. Methodol. Schimek, M. G. , ed. (2000).
Smoothing and regression . Wiley Series in Probability and Statistics: Ap-plied Probability and Statistics . John Wiley & Sons, Inc., New York Approaches, computation, andapplication, A Wiley-Interscience Publication. MR1795148
Schumaker, L. L. (2007).
Spline functions: basic theory , third ed.
Cambridge Mathematical Library .Cambridge University Press, Cambridge. MR2348176
Srivastava, A. , Wu, W. , Kurtek, S. , Klassen, E. and
Marron, J. S. (2011). Registration offunctionl data using Fisher-Rao metric. Tech. Report arXiv:1103.3817v2.
Tang, R. and
M¨uller, H.-G. (2008). Pairwise curve synchronization for functional data.
Biometrika Villani, C. (2003).
Topics in optimal transportation . Graduate Studies in Mathematics . AmericanMathematical Society, Providence, RI. MR1964483 Wand, M. P. and
Jones, M. C. (1995).
Kernel smoothing . Monographs on Statistics and AppliedProbability . Chapman and Hall, Ltd., London. MR1319818 Wang, K. and
Gasser, T. (1997). Alignment of curves by dynamic time warping.
Ann. Statist. Wang, K. and
Gasser, T. (1999). Synchronizing sample curves nonparametrically.
Ann. Statist. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations SUPPLEMENTARY MATERIAL
CMR t 0.0 0.2 0.4 0.6 0.8 1.0
FRM t 0.0 0.2 0.4 0.6 0.8 1.0 PW t Fig 8 . Plots of the registered data curves using some other procedures under Model without measurement error. − − − CMR t 0.0 0.2 0.4 0.6 0.8 1.0 − − − FRM t 0.0 0.2 0.4 0.6 0.8 1.0 − − − PW t Fig 9 . Plots of the registered data curves using some other procedures under Model without measurement error. CMR t 0.0 0.2 0.4 0.6 0.8 1.0
FRM t 0.0 0.2 0.4 0.6 0.8 1.0 PW t Fig 10 . Plots of the registered data curves using some other procedures under Model in the presence of measurement error.. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − − − CMR t 0.0 0.2 0.4 0.6 0.8 1.0 − − − FRM t 0.0 0.2 0.4 0.6 0.8 1.0 − − − PW t Fig 11 . Plots of the registered data curves using some other procedures under Model in the presence of measurement error. − CMR t 0.0 0.2 0.4 0.6 0.8 1.0 − FRM t 0.0 0.2 0.4 0.6 0.8 1.0 − PW t Fig 12 . Plots of the registered data curves using some other procedures under the rank model. CMR t 0.0 0.2 0.4 0.6 0.8 1.0
FRM t 0.0 0.2 0.4 0.6 0.8 1.0 PW t Fig 13 . Plots of the registered data curves using some other procedures under the rank model. CMR
Age 1 4 7 10 13 16 19 22
FRM
Age 1 4 7 10 13 16 19 22 PW Age