[PDF] Functional Registration and Local Variations: Identifiability, Rank, and Tuning

Abstract

We develop theory and methodology for the problem of nonparametric registration of functional data that have been subjected to random deformation (warping) of their time scale. The separation of this phase variation ("horizontal" variation) from the amplitude variation ("vertical" variation) is crucial in order to properly conduct further analyses, which otherwise can be severely distorted. We determine precise nonparametric conditions under which the two forms of variation are identifiable. These show that the identifiability delicately depends on the underlying rank. By means of several counterexamples, we demonstrate that our conditions are sharp if one wishes a genuinely nonparametric setup; and in doing so we caution that popular remedies such as structural assumptions or roughness penalties can easily fail. We then propose a nonparametric registration method based on a "local variation measure", the main element in elucidating identifiability. A key advantage of the method is that it is free of any tuning or penalisation parameters regulating the amount of alignment, thus circumventing the problem of over/under-registration often encountered in practice. We provide asymptotic theory for the resulting estimators under the identifiable regime, but also under mild departures from identifiability, quantifying the resulting bias in terms of the amplitude variation's spectral gap.

Full PDF

FFunctional Registration and Local Variations:Identiﬁability, Rank, and Tuning

Anirvan Chakraborty and Victor M. Panaretos

Institut de Math´ematiques,´Ecole Polytechnique F´ed´erale de Lausannee-mail: [email protected] ; [email protected] Abstract:

We develop theory and methodology for nonparametric registration of functional datathat have been subjected to random deformation of their time scale. The separation of this phase(“horizontal”) variation from the amplitude (“vertical”) variation is crucial for properly conductingfurther analyses, which otherwise can be severely distorted. We determine precise nonparametricconditions under which the two forms of variation are identiﬁable, and this delicately depends onthe underlying rank. Using several counterexamples, we show that our conditions are sharp if onewishes a truly nonparametric setup. We show that contrary to popular belief, the problem canbe severely unidentiﬁable even under structural assumptions (such as assuming the synchroniseddata are cubic splines) or roughness penalties (smoothness of the registration maps). We thenpropose a nonparametric registration method based on a “local variation measure”, the main el-ement in elucidating identiﬁability. A key advantage of the method is that it is free of tuning orpenalisation parameters regulating the amount of alignment, thus circumventing the problem ofover/under-registration often encountered in practice. We carry out detailed theoretical investiga-tion of the asymptotic properties of the resulting functional estimators, establishing consistencyand rates of convergence, when identiﬁability holds. When deviating from identiﬁability, we give acomplementary asymptotic analysis quantifying the unavoidable bias in terms of the spectral gap ofthe amplitude variation, establishing stability to mild departures from identiﬁability. Our methodsand theory cover both continuous and discrete observations with and without measurement error.Simulations demonstrate the good ﬁnite sample performance of our method compared to othermethods in the literature, and this is further illustrated by means of a data analysis.

Keywords : Identiﬁability, Phase Variation, Synchronisation, Warping

Contents a r X i v : . [ s t a t . M E ] O c t . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

1. Background and Contributions

Background

Functional observations can ﬂuctuate around their mean structure in broadly two ways: (a) amplitudevariation, and (b) phase variation. The ﬁrst type of variation is analysed using functional principalcomponent analysis, which stratiﬁes the variation in amplitude (or variation in the “vertical axis”)across the diﬀerent eigenfunctions of the covariance operator of the underlying distribution. The secondkind of variation, if present, is more subtle and can drastically distort the analysis of a functional dataset.It typically manifests itself in functional data representing physiological processes or physical motion,and consists in deformations of the time scale of the functional data (or variation in the “horizontalaxis”), associating to each observation its own unobservable time scale resulting from a transformationof the original time scale by a time warp. Speciﬁcally, instead of observing curves t X i p t q : r , s Ñ R u ni “ , one actually observes warped versions r X i “ X i ˝ T ´ i , where the T i ’s are unobservable (random)homeomorphisms termed warp maps . In the presence of phase variation, the mean of the warped dataconditional on the warping, E p X i | T i q “ µ ˝ T ´ i , is a distortion of the true mean µ by the warp map. Failingto account for the time transformation will yield deformed mean estimates, converging to E r µ ˝ T ´ i s ratherthan µ . More dramatic still will be the eﬀect on the estimation of the covariance of the latent process,inﬂating its essential rank, and yielding uninterpretable principal components. We refer to Section 2 inPanaretos and Zemel (2016) for a detailed discussion of these eﬀects. Consequently, in the presence ofphase variation in the data, the natural ﬁrst step in the analysis should be to register the data, i.e., tosimultaneously transform/synchronise the curves back to the objective time scale.Owing to the rather complex nature of the registration problem, a variety of diﬀerent assumptions onthe latent process X i and the warp maps T i have been considered, and correspondingly a multitude ofmethods have been investigated: landmark based registration (Kneip and Gasser, 1992); template/targetbased registration (Ramsay and Li, 1998); registration using dynamic time warping (Wang and Gasser,1997, 1999); registration based on local regression (Kneip et al., 2000); a “self-modelling” approachby Gervini and Gasser (2004) for warp maps expressible as linear combinations of B-splines; relatedregistration procedures under assumptions on functional forms of the warp maps that result in a ﬁnitedimensional family of deformations (Rønn, 2001; Gervini and Gasser, 2005); a functional convex synchro-nization approach to registration (Liu and M¨uller, 2004); registration using “moments” of the data curves(James, 2007); registration based on a parsimonious representation of the registered observations by theprincipal components (Kneip and Ramsay, 2008); pairwise registration of the warped functional dataunder monotone piecewise-linear warp maps (Tang and M¨uller, 2008); a joint amplitude-phase analysiswith this pairwise registration procedure but considering step-function (thus ﬁnite dimensional) approx-imations of the warp maps using ﬁnite diﬀerence of their log-derivatives (centered log-ratio transform)(Hadjipantelis et al., 2015); registration when the warp maps are generated as compositions of elemen-tary “warplets” (Claeskens, Silverman and Slaets, 2010); and registration using a warp-invariant metricbetween curves when the warp functions are diﬀeomorphisms on an interval (Srivastava et al., 2011). Theabove list is not exhaustive and we refer to Marron et al. (2015) for an oveview and comparison of someof the registration procedures mentioned above. More recently, Pigoli et al. (2017) applied the pairwiseregistration procedure of Tang and M¨uller (2008) for two-dimensional curves, where the warping is inonly one of the dimensions, while Lila and Aston (2017) generalized the pairwise registration method formanifold valued data.Several of the above contributions consider the case when the warp maps are themselves random, andin such cases, a canonical set of assumptions is usually required:(a) T is a strictly increasing homeomorphism with probability one, and(b) E p T q “ Id , where Id is the identity map, Id p x q “ x .The ﬁrst assumption rules out “time-reversal” or “time-jumps”, while the second disallows an overallspeed-up or slow-down of time. Further to these natural assumptions, most of the above cited papersimpose additional smoothness and structural assumptions on the warp maps, which require tuning pa- . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations rameters to be selected. However, it is unclear whether these additional assumptions are either necessaryor indeed suﬃcient for identiﬁability to hold. It is an open problem to determine what assumptions mustone minimally impose on the latent functional data generating process so that the registration problembe identiﬁable under conditions (a) and (b) on the warp maps. This is of importance to understand since,in practice, one rarely has more detailed insights regarding the underlying warping phenomenon.Consider the model X i p t q “ ξ i φ p t q ` δ(cid:15) i p t q , i “ , , . . . , n (1)for the latent process, with φ a unit norm deterministic function, ξ i random scalars, and (cid:15) i p t q zero-meanrandom functions of unit variance (i.e. E || (cid:15) i || “ δ is unrestricted, the model (1) spans anypossible functional datum. The value of δ then regulates the balance between an (eﬀectively) low rankmodel ( δ ! var t ξ i u ) or a higher rank model (larger δ „ var t ξ i u ). When one has exactly δ “ δ is small relative to var t ξ i u (for this reason,and for ease of reference, we thus henceforth refer to Model (1) as the “standard model”). In other words,it is postulated that if it were not for phase variation, important landmark features such as peaks andvalleys of the latent process would not drastically change from realisation to realisation. In eﬀect, thereseems to be a a certain concordance that identiﬁability (and hence consistency in the usual sense) restscrucially on an implicit assumption that the amplitude variation of the syncrhonised functions is of lowrank . In other words, that phase variation is dominant over amplitude variation.Observe that the dominating component ξ i φ p T ´ i p t qq in the warped process X i p T ´ i p t qq obtained bywarping model (1) forms a sub-class of the so-called general non-linear shift models (NLSM). These mod-els ﬁnd extensive use in comparison of semi-parametric regression models (see, e.g., H¨ardle and Marron(1990)), and have been studied in the context of landmark and dynamic registration techniques by Kneipand Gasser (1992) and Wang and Gasser (1997, 1999). Also note that the landmark principle of regis-tration essentially stipulates that the true curves have similar shape (thus having the same landmarks)but possibly diﬀer in their amplitude component. Although some of the earlier papers, e.g., Ramsay andLi (1998), Kneip et al. (2000), Kneip and Ramsay (2008), Claeskens, Silverman and Slaets (2010) con-sider higher rank models for the latent process corresponding to nontrivial δ (with additional structuralassumptions on warp maps), it is not known whether these procedures are truly identiﬁcable/consistent.Indeed, Kneip and Ramsay (2008) (see p. 1160) acknowledged the fact that for such higher rank models,one can have diﬀerent valid registrations based on the degree of complexity of the warp maps that oneallows (cf. Counterexample 5). Further, as hinted in Tang and M¨uller (2008), who consider model (1),identiﬁable (consistent) registration appears not to be guaranteed unless one lets δ Ñ n Ñ 8 . Our Contributions

We contribute to the nonparametric synchronisation problem with theory, methodology, and asymptotics,and corroborate our ﬁndings with simulations and a data analysis:1. Firstly, we provide a comprehensive study of the issue of identiﬁability, which is notorious infunctional registration but to date remained largely open. In particular, we provide sharp conditionsfor the standard model 1 to be identiﬁable, elucidating the role of the parameter δ that controls theeﬀective rank of the synchronised process (Section 2). Speciﬁcally, we prove that the registrationproblem is identiﬁable when the amplitude variation is exactly of rank 1, i.e. δ “ sharp . It cannot be relaxed while rescuing nonparametric identiﬁability, even undercircumstances that were informally expected to suﬃce: spline models for the synchronised process,smoothness restrictions on the warp maps, rank restrictions on the warp maps, or a combinationthereof. Indeed, so reliant is identiﬁability on the rank 1 assumption, that even rank 2 models failto be identiﬁable. Our ﬁndings serve as a word of caution to practitioners, and it appears that a . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations tentative conclusion is that low rank (or at least approximately low rank) assumption is eﬀectively necessary .2. Secondly, we develop methodology to address the problem of nonparametric and consistent recoveryof the warp maps from discretely warped curves, without structural assumptions on the warp maps further to (a) and (b), and without any penalisation or tuning parameters related to the warp mapsthemselves. Minimal structural assumptions are particularly desirable since, in practice, one rarelyhas more detailed insights regarding the underlying warping phenomenon. And, circumventingpenalisation/tuning has two crucial practical advantages: there is no danger of “over-registering”(overﬁtting) the data, on account of the tuning of a penalty on the registration maps (cf. the discus-sion in the paragraph before this subsection); and, there is no arbitrary pre-processing choice madein the registration analysis, so that any further statistical analyses/conclusions are not contingenton tuning choices. Our methodology is adapted to cover all three standard observation settings: complete observation , discrete observation , and discrete observation with measurement error .3. We carry out a complete asymptotic analysis in all three observation settings. In all cases, and underthe identiﬁable regime, we prove that the nonparametric estimators obtained are consistent as thenumber of observations grows, and the measurement grid becomes dense, and additionally deriverates of convergence and weak convergence for all the quantities involved (Section 4, Theorems 2,3, 4, 5). We also investigate in detail the setting when the model is unidentiﬁable. Consistentlymakes no sense in this setting, of course, but in Section 4.2 we derive theoretical results quantifyingthe amount of asymptotic bias incurred in the registration procedure in terms of the spectral gapof the amplitude variation (Theorem 6).4. We probe the ﬁnite sample performance of our methodology (Section 5), for all possible observationregimes, and compare to other popular registration techniques. In particular, we numerically probethe impact of departurting from the identiﬁable regime, and observe a noteworthy stability of ourmethod to mild such departures. The method is further illustrated by analysis of a functionaldataset of Triboleum beetle larvae growth curves (Section 6), yielding biologically interpretableresults. Here, too, we compare to other registration procedures.The key to our results is the novel use of a criterion that measures the local amount of deformationof the time scale (Section 3). Speciﬁcally, we introduce the local variation measure of X , with associatedcumulative distribution J X p t q “ ş t | X p u q| du , which reﬂects how the total amount of variation of thecurve is distributed on the real axis. The simple but consequential insight is that by a change-of-variableargument, the total variation measure remains invariant under any strictly increasing deformation T ofthe time scale of X , namely, J X p q “ J r X p q , where r X “ X ˝ T ´ . However, it is the local amount of de-formation that provides the information about the warping mechanism. This allows us to track the eﬀectof the time deformation on the local variation distribution and has a transparent interpretation in termsof transportation of measure. Our approach exploits this connection in order to deduce identiﬁabilityand to estimate the unobservable warp maps and register the functional data. Indeed, it is precisely thestructure of optimal transportation that exempts us from the need of additional smoothness/structuralconditions on the warp maps T , and consequently from the need to introduce registration tuning param-eters – even when the curves are observed over a discrete grid . This connection also guides us in theconstruction of counterexamples, illustrating where caution should be taken. Although our procedureinvolves derivatives, we actually do not need to estimate any derivatives from discretely observed data ifthere is no measurement error, as we can exploit an equivalent deﬁnition of total variation using ﬁnitediﬀerences over partitions of the domain. If there is measurement error, a pre-processing smoothing stepis required, but no additional penalisation of the registration maps is necessary (a smoothing step wouldanyway be eventually be required when observing discrete data under measurement error). Of course, once the warp maps are estimated, one would have to smooth the warped discrete data in order to registerthem, since the warped data are not observed at all points of their domain. And, if there is measurement error in theobservations, then some pre-smoothing will be needed. But in either case, this smoothing will be on the data itself (eitheras a pre-processing or post-processing step), and no smoothing penalties or structural assumptions will be required on theregistration maps themselves. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

2. Identiﬁability and Counterexamples

Recall that the standard model for the latent/synchronised process prior to warping (Equation 1) takesthe general form X p t q “ ξφ p t q ` δ(cid:15) p t q . This, depending on the constraints imposed on the random variable ξ and the scalar δ , can be of arbitrarilylarge rank, and indeed can span any functional datum. Usually var t ξ u is expected to be the dominanteﬀect relative to δ (i.e. δ ! var t ξ u ), corresponding to an eﬀectively low rank model. We now givesuﬃcient conditions on the standard model for that identiﬁability will hold in a genuine nonparametricsense. In simple terms, the process must be exactly of rank 1 (i.e. δ “ (cid:15) p t q P span t φ p t quq . Theorem 1 (Identiﬁability) . Let t X , X u be a random elements in C r , s of rank one, i.e., X i p t q “ ξ i φ i p t q for deterministic functions φ i with || φ i || “ , and with φ i vanishing on at most a countable set.Assume that t T , T u are strictly increasing homeomorphisms in C r , s , and such that E p T i q “ Id .Write r X i “ X i p T ´ i p t qq . Then, r X d “ r X ðñ ! T d “ T , φ “ ˘ φ , ξ “ ˘ ξ ) . The assumption that φ does not vanish except perhaps on a countable set excludes the possibility ofconstant functions, in which case the problem is vacuous and identiﬁability trivially fails. Note that theidentiﬁability result in Theorem 1 does not require that ξ and T be independent. Remark 1.

Further to being evidently natural, the assumption E p T q “ Id in the above theorem cannotbe dropped as in shown by the following counterexample. Suppose that E p T q “ f with f ‰ Id and f being a strictly increasing homeomorphism on r , s . Deﬁne S “ T ˝ f ´ . It follows that E p S q “ Id . Now r X “ ξφ ˝ T ´ “ ξφ ˝ f ´ ˝ S ´ “ ξφ ˝ S ´ , where φ “ φ ˝ f ´ . Let c “ || φ || . Deﬁne ξ “ c ξ and φ “ φ { c . Then, || φ || “ . So the resulting processes are equal but have been generated using diﬀerentwarp maps S and T , which do not have the same distribution as they have diﬀerent means. In this case,one can estimate φ (using the algorithm given in Section 3), and thus register the warped observationsto the new time scale given by f , i.e., get an estimate of X i ˝ f ´ instead of the true X i . Of course, if f is known, then these registered observations can be re-registered to the original time scale. So the essenceof the assumption E p T q “ Id is that the objective time scale be known , and not so much that it be theidentity. One might understandably argue that the rank 1 assumption in the previous theorem is restrictive.Perhaps surprisingly, though, the condition can be seen to be sharp. We construct a series of counterex-amples below, demonstrating how badly identiﬁability can fail with higher ranks (even rank 2). Theseillustrate that the situation cannot be rectiﬁed at a genuinely nonparametric level, not even by assumingspeciﬁc classes of models on the synchronised processes (such as splines or trigonometric functions) orimposing qualitative non-parametric constraints, e.g., roughness penalties, Sobolev norm bounds or rankrestrictions on the warp maps (or combinations of these). It looks as though, if one wishes to maintainidentiﬁability at a genuinely non-parametric level, a rank 1 assumptions is essentially necessary . Counterexample 1.

Our ﬁrst counterexample shows that the same rank 2 process can arise either aswarped rank 1 process, or as a syncrhonised rank 2 process. Both the process itself and the warp mapscan be taken to be of rank at most 2 (notice that a rank 1 warp map would need to be the identityalmost surely). Deﬁne f p t q “ p t ` t q{ g p t q “ p t ´ t q{ , t P r , s . Take ξ to be a standardGaussian random variable and φ p t q “ t {? t P r , s . Now deﬁne a random warp map T suchthat P r T “ f s “ P r T “ g s “ {

2. Then T satisﬁes (a) and (b). Now deﬁne r X “ ξφ ˝ T ´ “ ξ T ´ “ ξ p f ´ U ` g ´ p ´ U qq , where U is a Bernoulli random variable with success probability 1 { ξ “ ξ {? V “ ξ U and W “ ξ p ´ U q so that r X “ V f ` W g , where f p t q “ f ´ p t q “ p? ` t ´ q{ g p t q “ g ´ p t q “ p ´ ? ´ t q{ t P r , s . Since f and g are C , and f and g are boundedaway from zero on r , s , so are their inverses. Also, the inverses are C as well. It is easy to check that . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − . − . . . Y_1 (rank two) t 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . T_1 t 0.0 0.2 0.4 0.6 0.8 1.0 − . − . . . Y_2 (rank two) t 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . T_2 t Fig 1 . Plots of some sample paths of the rank two latent processes Y and Y in part (A) of Counterexample 2 along withthe warp maps T and T mentioned there, which warp them into the same rank one process. Cov p V, W q “

0. Further, it is easy to show that f and g are linearly independent. Consequently, wemay deﬁne a new process Y “ V f ` W g , which is a rank two process. Deﬁne r Y “ Y ˝ Id ´ “ Y . Then, r X d “ r Y (in fact r X “ r Y ) but they have been generated using two diﬀerent C latent processes, namely X and Y , and C warp maps, namely T and Id , which of course do not have the same distribution. Counterexample 2.

We will give two constructions demonstrating that the same rank one process canarise in one of inﬁnitely many ways: (i) as a rank one analytic process with no warping, and (ii) one ofan inﬁnite collection of rank two analytic processes subjected to warping by one of an inﬁnite collectionof non-trivial analytic warp maps T satisfying (a) and (b).(A) First take the latent model class to consist of linear combinations of trigonometric functions andpolynomials. Deﬁne µ p t q “ t ´ φ k p t q “ sin pp k ´ q πt q{rp k ´ q π s , t P r , s for some k ě

1. Let T k p t q “ t ´ p U k ´ q φ k p t q , where U k „ Unif p a, b q . Here a “ p { qp ´ M ´ q and b “ p { qp ` M ´ q with M satisfying M ą

1. It can be checked that T k satisﬁes (a) and (b) for all k ě

1. Let ξ be arandom variable independent of U k . Deﬁne X p t q “ ξµ p t q and Y k p t q “ ξµ p t q ` ξ p ´ U k q φ k p t q . It canbe checked that X “ r Y k : “ Y k ˝ T ´ k for all k ě

1. Since ξ an U k are independent, it follows thatCov p ξ, ξ p ´ U k qq “

0. Also, since x µ, φ k y “ Y k given above is infact its Karhunen-Lo´eve (KL) expansion, which is of rank 2, and this holds for all k ě

1. The plots ofsample paths of Y and Y along with the warp maps T and T are shown in Figure 1.(B) For the second construction, we take the latent model class to consist of linear combinations ofpolynomials only. Deﬁne µ p t q “ t . Fix R P N and any ﬁnite subset t k , k , . . . , k R u of N . Also, ﬁx reals a , a , . . . , a R satisfying ř Rl “ a l “

0. Consider the Legendre polynomials P k l ` on r´ , s . Since thesesatisfy P k l ` p´ t q “ P k l ` p t q for t P r , s , it follows that ş tP k l ` p t q dt “ p { q ş ´ tP k l ` p t q dt “ φ p t q “ ř Rl “ a l P k l ` p t q and T p t q “ t ´ p U ´ q φ p t q , where U „ Unif p a, b q , where M ą || φ || : “ sup t Pr , s | φ p t q| . The above construction ensures that T p q “ T p q “

1, and T satisﬁes (a). It is clearthat T satisﬁes (b). Let X p t q “ ξt and Y p t q “ ξt ´ ξ p U ´ q φ p t q , where ξ is as in the ﬁrst construction.Then, it can be shown that X “ r Y : “ Y ˝ T ´ . Also, Y is rank 2, and the above form is in fact its KLexpansion because Cov p ξ, ξ p U ´ qq “ x µ, φ y “

0, which follows as earlier.By taking ξ to be a constant random variable, this counterexample also shows that one cannot extendthe identiﬁable regime from ξφ p t q to µ p t q ` ξφ p t q , where µ R span t φ u . Counterexample 3.

We will show that even if one penalises the warp maps, e.g., by one or bothof ş E pr T p t q ´ t s q dt and ş E rp T p t qq s dt , still one can get inﬁnitely many possible solutions for theregistration problem. Under the setup of (A) in Counterexample 2, ş E pr T p t q ´ t s q dt “ r? M π p k ´ qs ´ and ş E rp T p t qq s dt “ p k ´ q π {p M q . For (B) in the previous counterexample, it can be shownusing the orthogonality of the Legendre polynomials that ş E pr T p t q´ t s q dt “ t ř Rl “ a l {p k l ` qu{p M q and ş E rp T p t qq s dt “ || ř Rl “ a l P k l ` || {p M q , where || ¨ || denotes the L r , s norm. Thus, in bothcases, for any (cid:15) ą

0, the sum of the two penalty terms can be made arbitrarily small by choosing largeenough M (depending on the choices of the other parameters – k , R , k l ’s and a l ’s). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations The above facts imply that if one wants to carry out the registration using the penalization proceduremin h P T ş E tr W h p t q ´ X p h p t qqs ` λ r T p t q ´ t s ` λ p T p t qq u dt , where T is a class of C warp maps,and W h takes values in an appropriate synchronized space S of linear combinations of C functions,then we have inﬁnitely many registrations valid registrations as follows:(i) under setup (A) – if we allow T to include monotone homoemorphisms on r , s whose deviationfrom the identity is a trigonometric function, and even if S is restricted to linear combinations of linearand trigonometric functions (both X and Y k belong to this class).(ii) under setup (B) – even if we allow T and S to only include polynomials.Note that for both (i) and (ii), the “ﬁt” term E tr W h p t q ´ X p h p t qqs becomes zero. Counterexample 4.

Our next counterexample shows that structural restrictions on the latent synchro-nised process, such as spline models, will also fail if the rank is higher than 1. We will consider cubicsplines but one can similarly construct more elaborate counterexamples involving higher order splinesand more knots. Let φ be a cubic spline with a single knot at a P p , q , i.e., φ p t q “ ř i “ θ i t i ` δ p t ´ a q ` ,and deﬁne s p t q “ c p a ´ a q ´ p t ´ a q I t a ď t ď a u ` c p ´ a q ´ p ´ t q I t a ă t ď u , t P r , s , where c P R and a P p a , q are ﬁxed. Let X p t q “ ξφ p t q and T p t q “ t ´ p U ´ q s p t q with U and ξ as before,and choose M ą | c |{ min tp a ´ a q , p ´ a qu . This ensures that T satisﬁes (a) and (b). Deﬁne Y p t q “ ξφ p t q ` V s p t qt θ ` θ t ´ θ t ´ δ p t ´ a q ` u ` V s p t qt θ ` θ t ` δ p t ´ a q ` u ` V s p t q , where V “ ξ p ´ U q , V “ ξ p U ´ q and V “ ξ p ´ U q p θ ` δ q . Note that s is a linear spline withknots at a and a . Also, p p t q : “ θ ` θ t ´ θ t ´ δ p t ´ { q ` and p p t q : “ θ ` θ t ` δ p t ´ { q ` aresplines (quadratic and linear, respectively) with knots at 1 {

2. Hence, these can be considered as elementsof the cubic spline space S with knots at a and a . So, by repeated application of Theorem 3.1 inMø rken (1991), the functions φ , sp , s p and s are elements of the space S of cubic splines with aﬁnite set of knots (including a and a ). So, both X and Y lie in S Ą S . If we assume that φ p q ‰ φ is linearly independent of sp , s p and s (since these three functions equal zeroat t “ Y is of rank at least two. Now, it can be checked that r Y p t q : “ Y p T ´ p t qq “ X p t q . Thus,two distinct processes X and Y can be warped (by the maps Id and T , respectively) to produce thesame process.If we choose a “

0, i.e., take φ to be a cubic polynomial (which also lies in S trivially), then we canchoose s to be a spline on r , s of degree ě M ą || s || . Then, forthe same Y , the conclusion of the above counterexample holds. Counterexample 5.

Our last counterexample illustrates that even a priori knowledge of landmarksdoes not help rectify identiﬁability if the rank 1 condition is violated. Let X p t q “ ξt p ´ t q , t P r , s sothat the latent process has a unique maximum at t “ {

2. A priori knowledge of existence of a uniquemaximum in synchronized space can be utilized to carry out a landmark/peak alignment of the warpedcurves. Let us denote the vector space of functions with unique maximum at t “ { U , and the vectorspace of functions proportional to the bell-shaped curve f p t q “ t p ´ t q by S f . Obviously, X P S f Ă U .Let T be any warp map independent of ξ and satisfying (a) and (b). Deﬁne a new warp map S as follows: S p t q “ tT p { q I t ď t ď { u ` T p { q ` p t ´ qr ´ T p { qs I t { ď t ď u . Note that S satisﬁes (a)and (b). Deﬁne Y p t q “ ξT ´ p S p t qqr ´ T ´ p S p t qqs , t P r , s . It can be checked that the process Y hasa unique maximum at t , where t satisﬁes T ´ p S p t qq “ {

2, equivalently, t “ S ´ p T p { qq . However,from the construction of S , it is easy to check that S ´ p T p { qq “ {

2. So, Y P U . Deﬁning r X “ X ˝ T ´ and r Y “ Y ˝ S ´ , it follows that r X “ r Y although X and Y are diﬀerent processes. Further, although X P S f , it holds that Y R S f provided S ‰ T , and Y has rank at least two. This counterexample(without explicit constructions of the latent processes or of the warp maps) is mentioned in Kneip andRamsay (2008).What we learn from these counterexamples is that identiﬁability crucially rests upon constructing asynchronised space of processes S (contained within continuous processes on r , s ) and a warp map . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations space of processes T (contained within strictly monotone homeomorphisms onto r , s with identityexpectation) such that:(I) Warping causes the latent process to exit the synchronised space, i.e. X P S but r X R S .(II) There exists a unique process X P S such that r X “ X ˝ T ´ for some random T P T .Theorem 1 informs us that such a construction is possible by taking S to essentially be C rank 1 non-constant processes, and otherwise not restricting T except for a C assumption. The counterexamplesdemonstrate that allowing higher ranks can have severe eﬀect on identiﬁability, even if S is modeled moreconcretely, or indeed if T is restricted to be smoother. In light of this, we will introduce the terminologyof “identiﬁable regime” to mean the pair p S , T q implied by the context of Theorem 1. Deviations fromthis regime will be generally termed as an “unidentiﬁable regime”: Deﬁnition 1 (Identiﬁable Regime) . We deﬁne the identiﬁable regime to involve latent synchronisedprocesses X P S , warp maps T P T , and warped processes r X p t q “ X p T ´ p t qq , where:(I1) The synchronised process space is S “ t X P C r , s : X p t q “ ξϕ p t qu , for ξ a real-valued randomvariable of ﬁnite variance and ϕ P C pr , sq is a deterministic function of unit L -norm, whosederivative vanishes at most on a countable subset of r , s .(I2) The warp map space is T “ t T P C r , s : E r T s “ Id & T strictly increasing homeomorphism u . With identiﬁability clariﬁed, we now turn to nonparametric methods of estimation. Our goal will beto construct methods that perform well in the identiﬁable regime, remain stable under small departures(e.g. eﬀectively rank 1 rather than precisely rank 1 models), and do not rely on tuning (which adds alayer of arbitrariness and in any case was seen to be unavailing). For these, we will require the notion of local variation measure , introduced in the next section.

3. Tuning-Free Methodology

Recall that the total variation of a continuous function h p x q : r , s Ñ R measures the total distancesweeped by the ordinate y “ h p x q of its graph, as the abscissa x moves from 0 to 1. By distortingfunctions “in the x -domain” through an increasing homeomorphism, phase variation will not aﬀect thetotal amount of variation accrued over the interval r , s . However, it will redistribute this total variationover the subintervals of r , s . This redistribution can be measured by focussing on local variation : Deﬁnition 2 (Local Variation Distribution) . Given any real function h P C pr , sq , we deﬁne J h p t q “ sup K P K t | K | ÿ k “ | h p τ k ` q ´ h p τ k q| (2) where K t “ t τ , τ , . . . , τ | K | u is a partition of r , t s and K t is the collection of all ﬁnite partitions of r , t s .Noting that J h p q is the total variation of h , deﬁne the local variation distribution as F h p t q “ J h p t q L J h p q . Remark 2.

Recall that when h P C pr , sq , it holds that J h p t q “ ş t | h p u q| du . The general deﬁnitioncomes handy under discrete observation, this one under continuous observation. We now show that, in the identiﬁable regime, warping aﬀects the local variation of the underlyingprocess in a rather predictable manner – one that can be used to motivate estimators. We will write r F “ F r X and F “ F X for simplicity. Lemma 1 (Local Variations and Warp Maps) . When r X “ X ˝ T ´ fall under the Identiﬁable Regime(1), F and r F are strictly monotone almost surely, E ! r F ´ ) “ F ´ “ F ´ φ , and T “ r F ´ ˝ F “ r F ´ ˝ F φ . Remark 3.

Even under the unidentiﬁable regime, we have T “ r F ´ ˝ F . However, in this case, F isnot deterministic unlike the identiﬁable regime, and we have E ! r F ´ ˇˇˇ X ) “ F ´ almost surely so that E ! r F ´ ) “ E (cid:32) F ´ ( . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Remark 4.

In the language of transportation of measure, Lemma 1 says that the warp map pushesforward the original local variation distribution to the warped local variation distribution, in fact optimally so in terms of quadratic transportation cost; and that the synchronised local variation measure is the

Fr´echet mean of the (random) warped local variation measure in Wasserstein distance.

Remark 5.

The local variation measure can also be seen through the prism of area-under-the-curvecriteria discussed by Liu and M¨uller (2004). These authors use these criteria to assign the time syn-chronization maps by utilizing the observed warped data. They derive a registration procedure based ondata-driven parametric modelling of the warp maps. We, on the other hand, aim to extract the timesynchronization maps from the observed warped data by using the local variation measure. Thus, nomodelling of the warp maps is necessary – our goal is a method that is fully data-driven and completelynon-parametric.

Now suppose we have an i.i.d. sample t r X i : i “ , , . . . , n u of randomly warped functional data thatwe wish to register, i.e. we wish to construct nonparametric estimators of the t X i u ni “ and the t T i u ni “ onthe basis of t Ă X i u ni “ . If we expect the data to (at least approximately) conform to the identiﬁable regime(1), we can rely on Lemma (1) as inspiration for tuning-free methodology. We would like to emphasizethat this methodology will be applicable whatever the “true model”, of course, but the point is for it tobe accurate under the identiﬁable regime, and stable when mildly departing from identiﬁability. We con-struct such methodology under all three diﬀerent observation regimes on t Ă X i u ni “ : complete observations(Section 3.1), discrete noiseless observations (Section 3.2), and discrete observations with measurementerror (Section 3.3). We then study the performance under identiﬁability/unidentiﬁability theoretically inSection 4 and numerically in Section 5. Assuming the functions t Ă X i u are fully observed, we may proceed as follows:Step 1: Set p F “ ˜ n ´ n ÿ i “ r F ´ i ¸ ´ , noting that the t r F i u are immediately available by complete observation of the t r X i u .Note that under the identiﬁable regime (1), p F estimates F φ .Step 2: Estimate the warp map T i by p T i “ r F ´ i ˝ p F , and the registration map T ´ i by p T ´ i .Step 3: Register the observed warped functional data, by means of p X i “ r X i ˝ p T i .If we suspect to be in the identiﬁable regime (1), we may also want to estimate the pairs t φ, ξ i u . In thiscase, the obvious additional steps will be:Step 4: Compute the empirical covariance operator, say, x K r of the registered data t p X i u and estimate φ bythe leading eigenfunction p φ of x K r (as a convention, assume that this estimator is aligned with thetrue φ , i.e., x p φ, φ y ě ξ i by p ξ i “ x p X i , p φ y . Remark 6.

The above algorithm can be viewed as a non-parametric version of the pairwise registrationprocedure by Tang and M¨uller (2008) albeit at the level of local variation measures rather than the originalcurves. Consider the data to be r F , . . . , r F n . Since r F i “ F i ˝ T ´ i , we have a standard warping problemat the level of variation measures. Now suppose that we apply the pairwise registration procedure to thisnew data set as follows: p g ji “ arg min h P C ż ” r F j p h p t qq ´ r F i p t q ı dt, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations where the minimization is conditional on r F i and r F j , and C is the set of strictly monotone homeomor-phisms on r , s . This corresponds to choosing the shape penalty parameter λ “ (see p. 878 in Tangand M¨uller (2008)) and not placing any structural assumption of the pairwise warping function g ji , i.e.,the above minimization is non-parametric. It is now easy to see that p g ji “ r F ´ j ˝ r F i . So, by equation (7)in Tang and M¨uller (2008), it follows that the pairwise registration estimator of T i is p T i,p “ ˜ n ´ n ÿ j “ p g ji ¸ ´ “ ˜ n ´ n ÿ j “ r F ´ j ˝ r F i ¸ ´ “ r F ´ i ˝ p F , which is precisely the estimator in the previous algorithm.

In the discretely observed setting, the r X i ’s are not fully observed. Instead, we observe point evaluations r X i,d “ p r X i p t q , r X i p t q , . . . , r X i p t r qq , i “ , ..., n. Here, 0 ď t ă t ă . . . ă t r ď r , s , assumed asymptotically homogeneous inthat max ď j ď r ´ p t j ` ´ t j q “ O p r ´ q as r Ñ 8 . The latent discrete process is denoted by X i,d “p X i p t q , X i p t q , . . . , X i p t r qq .Our strategy will be to mimic Steps 1–5 from the fully observed setup. Since the X i ’s are no longerfully observed, though, in order to have versions of the F i and r F i , we will draw inspiration from thegeneral deﬁnition of the local variation distribution (Equation 2 in Deﬁnition 2). First, deﬁne F i,d p t q “ ÿ j P I t | X i p t j ` q ´ X i p t j q| N r ´ ÿ j “ | X i p t j ` q ´ X i p t j q| for t P r , s and each i “ , , . . . , n , where I t is the set of all j ’s satisfying t j ` ď t . Note that becausewe only observe each curve over the grid 0 ď t ă t ă . . . ă t r ď

1, we have replaced the supremumover all grids in Equation 2 of Deﬁnition 2 by just this one (the ﬁnest grid we get to observe). Clearly, F d has jump discontinuities at the grid points t j ’s, is c`adl`ag, and satisﬁes F d p t q “ t P r , t q and F d p t q “ t P r t r , s .For the (discretely) observable warped process, we deﬁne r F i,d p t q “ ÿ j P I t | r X i p t j ` q ´ r X i p t j q| N r ´ ÿ j “ | r X i p t j ` q ´ r X i p t j q| , (3)The r F i,d ’s also have jump discontinuities at the grid points, and are c`adl`ag.Under the identiﬁable regime, in particular, we would have F i,d p t q “ F d p t q for all i “ , , . . . , n , where F d p t q “ ÿ j P I t | φ p t j ` q ´ φ p t j q| N r ´ ÿ j “ | φ p t j ` q ´ φ p t j q| . Its jumps are at most of size a r “ max ď j ď r ´ | φ p t j ` q ´ φ p t j q|{ ř r ´ j “ | φ p t j ` q ´ φ p t j q| . Moreover, in theidentiﬁable regime, r F i,d p t q “ ÿ j P I t | φ p s i,j ` q ´ φ p s i,j q| N r ´ ÿ j “ | φ p s i,j ` q ´ φ p s i,j q| , where s i,j “ T ´ i p t j q for each i and j are unobserved random variables. The maximum jump size of r F i,d is A i,r “ max ď j ď r ´ | φ p s i,j ` q ´ φ p s i,j q|{ ř r ´ j “ | φ p s i,j ` q ´ φ p s i,j q| . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations With the general deﬁnitions of F i,d and r F i,d in place, we can now adapt Steps 1–5 to the discrete case.In what follows, the generalized inverse of a function G is denoted by G ´ , i.e., G ´ p t q “ inf t u : G p u q ě t u .The ﬁrst two steps will remain invariant, except for the fact that they will now employ the discretelocal variation measures. This means that we will not require any tuning parameters or smoothnessassumptions to estimate the warp and registration maps. The registration itself (the last three steps)will require some smoothing, of course, if it is to make sense:Step 1 ˚ : Set p F d “ t n ´ ř ni “ r F ´ i,d u ´ and p F ˚ d “ n ´ ř ni “ r F ´ i,d .Note that under the identiﬁable regime (1), p F d mimics F d .Step 2 ˚ : Predict the random warp map T i by p T i,d “ r F ´ i,d ˝ p F d and the registration map T ´ i by p T ˚ i,d “ p F ˚ d ˝ r F i,d “ t n ´ ř ni “ r F ´ i,d u ˝ r F i,d .Step 3 ˚ : Since the r X i ’s are observed discretely, we do not have information about their values betweengrid points. Thus, we ﬁrst smooth each of the r X i,d using the Nadaraya-Watson kernel regressionestimator for an appropriately chosen kernel k and bandwidth h , denoting resulting smoothedfunctions by X : i , X : i p t q “ r ÿ j “ k ˆ t ´ t j h ˙ r X i p t j q N r ÿ j “ k ˆ t ´ t j h ˙ . Deﬁne p X ˚ i p t q “ X : i p p T i,d p t qq , i “ , , . . . , n to be the registered functional observations and write X r ˚ “ n ´ ř ni “ p X ˚ i for their mean.As in the fully observed situation, if we suspect to be in the identiﬁable regime (1), we estimate the pairs t φ, ξ i u as follows:Step 4 ˚ : Compute the empirical covariance operator x K r ˚ of the registered curves p X ˚ i , and use its leadingeigenfunction p φ ˚ as the estimator of φ (again, assume the convention that the sign is correctlyidentiﬁed, i.e., x p φ ˚ , φ y ě ˚ : Finally, estimate ξ i by p ξ i ˚ “ x p X ˚ i , p φ ˚ y for each i ě r X i ’s are observed, say, 0 ď t i, ă t i, ă . . . ă t i,r i ď

1, diﬀers with i . The reasonfor this compatibility is the fact that our approach considers only one curve at a time. We formulate itin the notationally simpler case of a common grid, in order to alleviate the notation in the statement ofour asymptotic results in Section 4. As mentioned earlier, r F i,d is a step function with jump discontinuities at the grid points. In particular, r F i,d p t q “ t P r , t q and r F i,d p t q “ t P r t r , s . Thus, r F ´ i,d p q “ r F ´ i,d p q “ t r , which is lessthan 1 if t r ă

1, i.e., the grid does not include the right end-point. In this case, p F d p t q and thus p T i,d p t q isproperly deﬁned only for t P r , t r s . Also, r F ´ i,d p u q ď t r and equality holds iﬀ u P p r F i,d p t r ´ q , s . Thus, p F d p t r q “ inf u : n ´ n ÿ i “ r F ´ i,d p u q ě t r + “ inf t u : r F ´ i,d p u q “ t r @ i “ , , . . . , n u“ inf t u : u P X ni “ p r F i,d p t r ´ q , su “ max ď i ď n r F i,d p t r ´ q . Then, p T i,d p t r q “ p F ´ i,d p p F d p t r qq “ p F ´ i,d p max ď j ď n r F j,d p t r ´ qq “ t r . One can then extend p T i,d p t q to the whole of r , s by, e.g., linearly interpolating between p t r , p T i,d p t r qq “ p t r , t r q and p , q . This practical modiﬁcation, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations in case t r ă

1, enjoys the same asymptotic properties as the originally deﬁned estimator (Section 4),since the eﬀect of the modiﬁcation is asymptotically negligible due to the homogeneity assumptions onthe grid.Similarly, p F ˚ d p u q “ n ´ ř ni “ r F ´ i,d p u q “ t r iﬀ u P X ni “ p r F i,d p t r ´ q , s “ p max ď i ď n r F i,d p t r ´ q , s . So, incase t r ă

1, we have p T ˚ i,d p q “ p F ˚ d p r F i,d p qq “ p F ˚ d p q “ t r ă

1. This is not a problem since this estimatoris not used in the registration procedure and the problem disappears asymptotically anyway, just asdescribed above.We conclude this section by noting that, since the estimates p T i,d of the warp maps do not involveany smoothing and are obtained from compositions of step functions, the resulting registered curves willnot be very smooth. This will be particularly noticeable if the number of grid points is small. Note thateven in that case, the estimated mean function will be smoother if the sample size is moderately large.If one is interested in obtaining a smooth registration of the sample curves, the following proceduremay be adopted. First, we produce smooth versions of the p T i,d by some non-parametric smoothingprocedure, e.g., polynomial splines of a ﬁxed degree m , and call these new estimates as p T i,s , say. Then,we plug-in these smoothed estimates of the warp functions and deﬁne the new registered observations as p X ˚ i p t q : “ X : i p p T i,s p t qq . It is well-known that a spline smoothed estimate of a smooth function converges tothat function in the L r , s sense provided the oscillations of the function go to zero as the number ofknots grows to inﬁnity (see Theorem 6.27 in Schumaker (2007)). The latter holds for the p T i,d ’s since theylie in L r , s (see equation (2.121) in Theorem 2.59 in Schumaker (2007)). Thus, this modiﬁed estimatorwill also provide consistent registration. It can often happen that the discretely observed functional data be additionally contaminated by measure-ment error. In this case, one has to suitably adapt the registration procedure. In the presence of measure-ment error, we observe Y i,d “ r X i,d ` e i , where r X i,d was deﬁned in Section 3.2, and e i “ p (cid:15) i, , (cid:15) i, , . . . , (cid:15) i,r q with the t (cid:15) i,j : j “ , , . . . , r, i “ , , . . . , n u being a collection of i.i.d. error variables with zero meanand variance σ , independent of the processes and warp maps.We will modify the registration procedure as follows. First, construct a non-parametric function esti-mator of r X i , which is the derivative of the warped process r X i , using the observation Y i,d for each i , andcall this estimator p X p q i,w p¨q . Deﬁne analogues of the r F i ’s as r F i,w p t q “ ż t | p X p q i,w p u q| du N ż | p X p q i,w p u q| du, t P r , s . Note that unlike the discrete observation case described in the previous section, we now have fullyfunctional versions of r X i for each i , which allows us to mimic the algorithm in the fully observed scenarioin Section 3.1.Step 1 ˚˚ : Set p F e “ ´ n ´ ř ni “ r F ´ i,w ¯ ´ .Under the identiﬁable regime (1), in particular, we have p F e estimates F φ .Step 2 ˚˚ : Predict the warp map T i by p T i,e “ r F ´ i,w ˝ p F e , and the registration map by p T ´ i,e .Step 3 ˚˚ : Construct non-parametric function estimators of the r X i ’s using the Y i,d ’s, and call them p X i,w p¨q ’s.Deﬁne p X ˚ i,e p t q “ p X i,w p p T i,e p t qq , i “ , , . . . , n to be the registered functional observations.If we suspect to be in the identiﬁable regime (1), we estimate the pairs t φ, ξ i u as follows:Step 4 ˚˚ : Write X e ˚ “ n ´ ř ni “ p X i,e for the mean of the registered observations and let x K e ˚ denote theirempirical covariance operator. Take its leading eigenfunction, denoted by p φ e ˚ , as the estimator of φ (assuming the same sign convention as earlier). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Step 5 ˚˚ : Finally, estimate ξ i by p ξ i ˚ ,e “ x p X i,e , p φ e ˚ y for each i ě k p¨q and bandwidth h p¨q for ﬁnding p X p q i,w . We will thenuse a local linear estimator with kernel k p¨q and bandwidth h p¨q for estimating p X i,w . These choices aremotivated by the advantages of local polynomial estimators in dealing with boundary eﬀects (see, e.g.,Fan and Gijbels (1996) and Wand and Jones (1995) for further details on various smoothing techniques).More details on the choices of smoothing parameters are given in Remark 4 after Theorem 5.

4. Asymptotic Theory

We next study the asymptotic properties of the estimators obtained above. We develop separate re-sults for each of the three observation regimes considered (full observation, discrete observation, discreteobservation with measurement errors). In what follows, the space C r , s is equipped with the norm ||| f ||| “ || f || ` || f || , where || ¨ || is the usual sup-norm. The 2-Wasserstein distance between distri-butions G and G will be denoted by d W p G , G q “ bş ` G ´ p u q ´ G ´ p u q ˘ du . We ﬁrst focus on the identiﬁable regime as given in Deﬁnition 1. Our ﬁrst two results concern thefully observed case, as described in Section 3.1. Write µ “ E p X q “ E p ξ q φ , and K “ COV p X q “ E p X b X q ´ µ b µ , where p f b g q h “ x g, h y f for any triple f, g, h P L r , s . Let ||| ¨ ||| denote thetrace norm for operators on L r , s . The covariance kernel of X is denoted by K p¨ , ¨q and the empiricalcovariance kernel of the p X i ’s is denoted by p K r p¨ , ¨q . Theorem 2 (Strong Consistency – Fully Observed Case) . Further to the assumptions in Deﬁnition 1,assume also that φ is H¨older continuous with exponent α P p , s . Then, the estimators in Section 3.2satisfy the following asymptotic results, where convergence is always with probability one:(a) d W p p F , F φ q Ñ as n Ñ 8 .(b) || p T ´ i ´ T ´ i || Ñ and || p T i ´ T i || Ñ as n Ñ 8 for each i ě .(c) || p X i ´ X i || Ñ as n Ñ 8 for each i ě .(d) d W p p F i , F φ q Ñ as n Ñ 8 for each i ě , where p F i is the local variation measure associated with p X i .(e) || X r ´ µ || Ñ as n Ñ 8 , where X r “ n ´ ř ni “ p X i .(f ) ||| x K r ´ K ||| Ñ and || p K r ´ K || “ sup s,t Pr , s | p K r p s, t q ´ K p s, t q| Ñ as n Ñ 8 . Moreover, || p φ ´ φ || Ñ and | p ξ i ´ ξ i | Ñ as n Ñ 8 for each i ě .Furthermore, if we additionally assume that E p|| T || q ă 8 and inf t Pr , s T p t q ě δ ą almost surelyfor a deterministic constant δ (call this “Condition 1”), then the following stronger results hold withprobability one, in lieu of (b), (c), and (e):(b’) ||| p T ´ i ´ T ´ i ||| Ñ and ||| p T i ´ T i ||| Ñ as n Ñ 8 for each i ě .(c’) ||| p X i ´ X i ||| Ñ as n Ñ 8 for each i ě .(e’) ||| X r ´ µ ||| Ñ as n Ñ 8 , where X r “ n ´ ř ni “ p X i . Some remarks are in order:

Remark 7.

1. The strong consistency results in Theorem 2 do not require that ξ i and T i are inde-pendent.2. Uniformity:

It is observed from the proof of the uniform convergence of p T ´ i in part (b) of theabove theorem that max ď i ď n || p T ´ i ´ T ´ i || Ñ n Ñ 8 almost surely. Under Condition 1, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations the same conclusion is true now with the ﬁner norm ||| ¨ ||| . The convergence in part (d) also holdsuniformly for all i “ , , . . . , n .3. Fisher Consistency:

It can be directly veriﬁed that p F ´ “ T ˝ F ´ φ so that p F “ F φ ˝ T ´ . Also, p T i “ T i ˝ T ´ , p T ´ i “ T ˝ T ´ i , and p X i “ ξ i φ ˝ T ´ for each i . Further, x K r “ n ´ ř ni “ p p X i ´ X r q b p p X i ´ X r q “ t n ´ ř ni “ ξ i ´ ξ up φ ˝ T ´ q b p φ ˝ T ´ q , where ξ “ n ´ ř ni “ ξ i . Thus, p φ “p φ ˝ T ´ q{||p φ ˝ T ´ q|| , and p ξ i “ x p X i , p φ y “ ξ i || φ ˝ T ´ || . Since all of the above estimators aremeasurable functions of the sample averages of the T i ’s, the ξ i ’s and the ξ i ’s, it follows that all ofthe above estimators are Fisher consistent for their population counterpart.4. An Example:

The condition inf t Pr , s T p t q ě δ ą δ can be relaxed to inf t Pr , s T p t q ě δ i almost surely for i.i.d. positive random variables δ i providedwe assume that E p δ ´ q ă 8 . An example of random warp functions that satisfy inf t Pr , s T p t q ě δ ą ζ p t q “ t and for k ‰ ζ k p t q “ t ´ sin p πkt q{p| k | πβ q for some β ą

0. If K is an integer-valued, symmetric randomvariable, then E p ζ K q “ Id . For a ﬁxed J ě

2, let t K j u Jj “ be i.i.d. integer-valued, symmetricrandom variables, and t U j u J ´ j “ be i.i.d. U nif r , s random variables independent of the K j ’s.Deﬁne T p t q “ U p q ζ K p t q ` ř J ´ j “ p U p j q ´ U p j ´ q q ζ K j p t q ` p ´ U p J ´ q q ζ K J p t q . Then, T is a strictlyincreasing homeomorphism on r , s , T P C r , s surely, E p T q “ Id . Further, it can be easilyshown that inf t Pr , s T p t q ě ´ β ´ . Thus, the condition inf t Pr , s T p t q ě δ ą β “ p ´ δ q ´ .Further to strong consistency, we also derive weak convergence of the estimators: Theorem 3 (Weak Convergence – Fully Observed Case) . Further to assumptions in Deﬁnition 1, assumealso that φ is H¨older continuous with exponent α P p , s , that ξ i and T i are independent for each i , andthat E p|| T || q ă 8 . Then, the estimators in Section 3.1 satisfy the following asymptotic results,(a) nd W p p F , F φ q converges weakly as n Ñ 8 .(b) ? n p p T ´ i ´ T ´ i q and ? n p p T i ´ T i q converge weakly in the C r , s topology as n Ñ 8 for each i ě .(c) ? n p p X i ´ X i q converges weakly in the C r , s topology as n Ñ 8 for each i ě .(d) nd W p p F i , F φ q converges weakly as n Ñ 8 for each i ě .(e) ? n p X r ´ µ q converges weakly to a zero mean Gaussian distribution in the C r , s topology as n Ñ 8 .(f ) ? n p x K r ´ K q converges weakly in the topology of Hilbert-Schmidt operators, and ? n p p K r ´ K q converges weakly in the C pr , s q topology as n Ñ 8 . In both cases, the limits are zero meanGaussian distributions. Moreover, ? n p p φ ´ φ q converges weakly to a zero mean Gaussian distributionin the C r , s topology, and ? n p p ξ i ´ ξ i q converges weakly as n Ñ 8 for each i ě . Since C pr , s k q is a stronger topology than L pr , s k q for any ﬁnite k “ , , . . . , it follows that theweak convergence results in the above theorem which hold in the C pr , s k q topology also hold in the L pr , s k q topology by virtue of the continuous mapping theorem.We shall now study some the asymptotic properties of the estimators in the discrete observation setup(without measurement error). Theorem 4 (Limit Theory – Discretely Observed Case Without Measurement Error) . Further tothe conditions of Theorem 3, assume that φ P C r , s , ş | φ p u q| ´ (cid:15) ă 8 for some (cid:15) ą , and that inf t Pr , s T p u q ě δ ą almost surely for a deterministic constant δ . Deﬁne α “ (cid:15) {p ` (cid:15) q . Assume that ξ i and T i are independent for each i (only for the weak convergence statements). The kernel k p¨q is assumedto be supported on r´ , s . If h “ h p n q “ o p n ´ { q and r “ r p n q satisﬁes r ąą n { α as n Ñ 8 , then theestimators introduced in Section 3.2 satisfy(a) d W p p F ˚ d , F φ q Ñ as n Ñ 8 almost surely, and d W p p F ˚ d , F φ q “ O P p n ´ q as n Ñ 8 .(b) || p T ˚ i,d ´ T ´ i || Ñ and || p T i,d ´ T i || Ñ as n Ñ 8 almost surely. Further, ? n p p T ˚ i,d ´ T ´ i q and ? n p p T i,d ´ T i q converge weakly in the L r , s topology as n Ñ 8 for each i ě . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (c) || p X ˚ i ´ X i || Ñ as n Ñ 8 almost surely, and ? n p p X ˚ i ´ X i q converges weakly in the L r , s topology as n Ñ 8 for each i ě .(d) d W p p F ˚ i , F φ q Ñ as n Ñ 8 almost surely, and d W p p F ˚ i , F φ q “ O P p n ´ q as n Ñ 8 for each i ě .(e) || X r ˚ ´ µ || Ñ as n Ñ 8 almost surely, and ? n p X r ˚ ´ µ q converges weakly in the L r , s topology as n Ñ 8 .(f ) ||| x K r ˚ ´ K ||| Ñ as n Ñ 8 almost surely, and ? n p x K r ˚ ´ K q converges weakly in the topologyof Hilbert-Schmidt operators. Further, || p K r ˚ ´ K || Ñ as n Ñ 8 , and ? n p p K r ˚ ´ K q convergesweakly in the L pr , s q topology as n Ñ 8 . Moreover, || p φ ˚ ´ φ || Ñ as n Ñ 8 almost surely,and ? n p p φ ˚ ´ φ q converges weakly in the L r , s topology. Also, | p ξ i ˚ ´ ξ i | Ñ as n Ñ 8 almostsurely, and ? n p p ξ i ˚ ´ ξ i q converges weakly as n Ñ 8 for each i ě .In all the weak convergence results stated above, the limits are identical to the corresponding limitsobtained in the fully observed scenario in Theorem 3. Remark 8.

1. As in the fully observed setting in Theorem 2, the strong consistency results in thediscrete, noiseless observation setting in Theorem 4 do not require ξ i and T i to be independent.2. The asymptotic results remain valid in the case where the grid over which the r X i ’s are observed,say, 0 ď t i, ă t i, ă . . . ă t i,r i ď

1, diﬀers with i . The proof, however, will be notationally quitecumbersome. In this case, the requirement on the grid will be as follows: max ď j ď r i ´ p t j ` ´ t j q “ O p r ´ i q as r i Ñ 8 for each i , and r r n : “ min ď i ď n r i satisﬁes r r n ąą n { α as n Ñ 8 .3. The choice of h in Theorem 4 is an under-smoothing choice. It is made on account of the absenceof measurement errors in the observations, which enables us to under-smooth the data withoutdamaging ? n -consistency. This is unlike what happens in classical non-parametric regression dueto the presence of errors in that scenario. Also, the boundary points inﬂate the bias of the Nadaraya-Watson estimator to an order of h (the same order as that obtained in Theorem 4 for all points).However, these issues are of no consequence in this scenario. It is also natural to under-smoothin this situation since appropriate under-smoothing retains the features of the curves better andallows estimation at a parametric rate even under non-parametric smoothing. If instead of theNadaraya-Watson estimator, one uses a local linear estimator with bandwidth h , then the bias isof order h (even at the boundaries). In this case, h has to be o p n ´ { q to achieve parametric ratesof convergence, which is again an under-smoothing choice. Thus, the choice of smoothing methoddoes not play a crucial role in this setup.4. Unlike Theorem 3, the weak convergence results are all in the L topology. This is because unlikethe fully observed case, the estimators involved are not continuous functions in r , s . We couldnot consider the weaker D r , s topology since not all estimators will be c`adl`ag functions. However,we still retain the strong consistency results in parts (b), (c) and (e) in the sup norm similar toTheorem 2. This is due to the fact that those estimators are uniformly bounded almost surely,and thus have ﬁnite sup-norm. Further, in all cases, there is no issue with the measurability of thesupremum.5. The condition φ P C r , s can be relaxed to requiring that φ is Lipschitz continuous. Moreover,the requirement ş | φ p u q| ´ (cid:15) ă 8 for some (cid:15) ą φ isbounded away from zero on r , s , in which case one can choose α “

1. Consider the case when φ P C r , s and let t P p , q be such that φ p t q “

0. If φ p t q ą

0, then we can choose an interval A δ “ p t ´ δ, t ` δ q Ă p , q such that inf u P A δ | φ p u q| ě β ą

0. Then, a ﬁrst order Taylor expansionyields ş A δ | φ p t q| ´ (cid:15) dt ď β ´ (cid:15) ş A δ | t ´ t | ´ (cid:15) dt ă 8 for any (cid:15) ă

1. Here, we have used the fact that ş δ t ´ (cid:15) dt ă 8 for any δ ą (cid:15) ă

1. Thus, if none of the zeros of φ and φ coincide, then thecondition ş | φ p u q| ´ (cid:15) ă 8 holds for any (cid:15) ă

1. In general, if φ P C m r , s for some m ě

2, and m be the least integer between 2 and m such that none of the zeros of φ and φ p m q coincide, then ş | φ p u q| ´ (cid:15) ă 8 holds for any (cid:15) ă {p m ´ q .We ﬁnally study the asymptotic properties of the estimators in the modiﬁed registration procedureemployed when one has contamination by measurement error (described in Section 3.3). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Theorem 5 (Limit Theory – Measurement Error Case) . In addition to the assumptions of Theorem3, assume that φ P C r , s , ş | φ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą . Deﬁne α “ (cid:15) {p ` (cid:15) q . Assume that ξ i and T i are independent for each i . Suppose that T P C r , s a.s. and inf t Pr , s T p u q ě δ ą almostsurely for a deterministic constant δ . The kernels k p¨q and k p¨q are assumed to be supported on r´ , s ,symmetric and continuously diﬀerentiable. The errors t (cid:15) ij u are assumed to be a.s. bounded. Also assumethat E t| ξ | ´ α {p ´ α q u ă 8 as well as E p|| T p l q || q ă 8 for l “ , , . The bandwidths satisfy h , h Ñ , rh , rh Ñ 8 . Then, the estimators in Section 3.3 satisfy the following properties.(a) d W p p F e , F φ q “ O P p h α ` p rh q ´ α ` n ´ q as n Ñ 8 .(b) Both || p T ´ i,e ´ T ´ i || and || p T i,e ´ T i || are O P p h α ` p rh q ´ α { ` n ´ { q as n Ñ 8 .(c) || p X ˚ i,e ´ X i || “ O P p h α ` p rh q ´ α { ` h ` p rh q ´ { ` n ´ { q as n Ñ 8 .(d) || X e ˚ ´ µ || “ O P p h α ` p rh q ´ α { ` h ` p rh q ´ { ` n ´ { q as n Ñ 8 .(e) ||| x K e ˚ ´ K ||| “ O P p h α ` p rh q ´ α { ` h ` p rh q ´ { ` n ´ { q as n Ñ 8 . Consequently, || p φ e ˚ ´ φ || and | p ξ i ˚ ,e ´ ξ | have the same rates of convergence for each ﬁxed i . Remark 9.

1. Analogous rates of convergence can also be obtained if one uses diﬀerent non-parametricsmoothing techniques than the ones in the theorem. One may, e.g., use a Nadaraya-Watson estima-tor in Step 3** with boundary kernels to alleviate the boundary bias problem that is well-knownfor this estimator (see, e.g., Wand and Jones (1995)). Also, to estimate r X i , one may use higherorder local polynomials with even orders. However, these will be computationally more intensiveas well as need additional smoothness assumptions on the latent process and the warp maps.2. It is observed in the above theorem that the rates of convergence are slower than the parametricrates achieved in the earlier settings due to the non-parametric smoothing steps involved – especiallythe estimation of derivatives, which is known to have quite slow rates of convergence. Further, thecontributions of the two smoothing steps in the convergence rates are clear. It is well known in locallinear regression that the optimal rate for h is r ´ { and that for h is r ´ { . With these rates, wehave d W p p F e , F φ q “ O P p r ´ α { ` n ´ q , and the remaining quantities are O P p r ´ α { ` n ´ { q . Thus,parametric rates of convergence is achieved if r ą n { α .3. Let β “ α {p ´ α q and observe that β ă α ă

1. The condition E t| ξ | ´ β u ă 8 in Theorem5 is obviously satisﬁed if | ξ | is bounded away from zero. Suppose that ξ has a continuous density f ξ , say, either on r , or on p´8 , in which case it is assumed to be symmetric about zero. Ifsup y Pr ,a q f ξ p y q ă 8 for some a ą

0, then it is easy to show that E t| ξ | ´ β u ă 8 if β ă ô (cid:15) ă β P r , q , then this expectation is ﬁniteif sup y Pr ,a q y ´ f ξ p y q ă 8 . As emphasized before (Section 3.1), our procedure can be used whether or not the latent process falls inthe identiﬁable regime of Deﬁnition 1. In this section, we carry out a theoretical analysis of the stabilityof our registration procedure when the distribution of the latent process deviates from the identiﬁableregime. Since identiﬁability is lost, it is clear that consistency is no longer achievable. However, we canquantify how much the estimators deviate from their population counterparts, at least asymptotically.Since the model is in general unidentiﬁable, strictly speaking there is no unique setting corresponding tothe law of the data. For this reason, as a convention, we will assume that a “true” underlying distributionis known and ﬁxed. For simplicity of exposition, we focus on the rank two case. This will be seen to carrythe essence of the underlying eﬀects, as we discuss in the third point of Remark 10. To obtain moretransparent results, we focus on the case where the underlying functions are completely observable ascontinuous objects.Let X i “ ξ i φ ` ξ i φ for i “ , , . . . , n , where ξ i and ξ i are uncorrelated. Let µ “ E p X q “ E p ξ q φ ` E p ξ q φ . Denote γ l “ V ar p ξ l q and Y il “ r ξ il ´ E p ξ il qs{ γ il for l “ ,

2. Then, X i “ µ ` γ Y i φ ` γ Y i φ (4) . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations gives the Karhunen-Lo`eve expansion of X i . The (random) local variation distribution induced by X i is F i p t q “ ş t | X i p u q| du { ş | X i p u q| du for t P r , s . Note that contrary to the rank one case, where µ didnot play a role in F i (due to cancellation of the term ξ from the numerator and the denominator), hereit cannot be neglected. We will later see that it will play a role in the performance of the estimators.Deﬁning η “ γ { γ , which is the square root of the inverse of the condition number, it follows that F i p t q “ ş t | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du . The local variation distribution induced by the observed warped data r X i “ X i ˝ T ´ i is given by r F i p t q “ ş t | r X i p u q| du ş | r X i p u q| du “ ş T ´ i p t q | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du “ F i p T ´ i p t qq . The idea is that if under suitable conditions the F i ’s manifest small variability, then the registrationprocedure will work quite well. We will illustrate two diﬀerent situations where this is the case. Theestimators of the population parameters will be the same as those considered earlier. The next theoremgives bounds on the estimation errors. Theorem 6.

In the setting of Model 4, deﬁne Z i “ ş | X i p u q ´ µ p u q| du { ş | X i p u q| du if µ ‰ , η ş | Y i φ p u q| du { ş | Y i φ p u q ` ηY i φ p u q| du if µ “ , for i “ , , . . . , n. If µ ‰ , assume that ş | µ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą , and if µ “ , assume that ş | φ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą . Set α “ (cid:15) {p ` (cid:15) q . Suppose that assumption (I2) from Deﬁnition (1) holds and thatfor each i “ , , φ i lie in C r , s with the derivative being α i -H¨older continuous for some α i P r , s .Assume that X i and T i are independent for each i . Also assume that E p Z α q ă 8 . Then:(a) lim sup n Ñ8 || p T ´ i ´ T ´ i || ď const. t E p Z α q` Z i u , and lim sup n Ñ8 || p T i ´ T i || ď const. || T i || t Z αi ` E α p Z α qu almost surely, where the constant term is uniform in i .(b) lim sup n Ñ8 || p X i ´ X i || ď O P p qt E p Z α q ` Z i u almost surely. Remark 10.

1. Theorem 6 reveals that if the Z i are small, the eﬀect of misspeciﬁcation is also small.Here are two such cases:(a) When µ ‰ , Z i “ ş t | Y i φ p u q ` ηY i φ p u q| du { ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du . So,in this case, if | γ ´ µ | has a large enough contribution compared to | Y i φ ` ηY i φ | for all i ,then the Z i ’s are small.(b) On the other hand, if µ “ , then if η is small, i.e., the condition number of the process is large(which essentially implies that the process is “close” to a rank one process provided E p ξ q “ ), then the Z i ’s are small. This can be compared to the minimum eigenvalue registrationprinciple of Ramsay and Silverman (2005), where one tries to ﬁnd the warp function thatminimises the second eigenvalue of the cross-product matrix between the target function andthe registered function. Assume that E p ξ i q “ E p ξ i q “ and without loss of generality that γ “ . If in reality the true unobserved curves are rank one, i.e., the ξ i φ component, andwe observe warped versions of the rank two curves X i ’s, then (in the population case) correctregistration is achieved by T i if the minimum eigenvalue, namely γ “ η , of the expectedcross-product matrix equals zero. Thus, in the empirical case, if η is close to zero, we mayexpect p T i to be close to T i and consequently expect the registration procedure to have goodperformance.2. Bounds similar to those in (a) and (b) of Theorem 6 can also be obtained for the mean, the covari-ance, the γ l ’s and the φ l ’s as well as the principal components Y il ’s. We do not include them in thestatement of the theorem because they need more complicated conditions involving the parameters. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

3. General (possibly inﬁnite) rank situation: Let X i “ µ ` ř Mj “ γ j Y ij φ j for some ď M ď 8 , wherethe t Y ij : j “ , , . . . , M u are uncorrelated with zero mean and unit variance. Without loss of gen-erality, we assume that γ ą γ ą . . . ě . The errors in estimation when µ ‰ remain the same asin Theorem 6. When µ “ , then we deﬁne Z i “ η ş | Y i φ p u q` ř k ě δ k Y ik φ k p u q| du { ş | Y i φ p u q` η r Y i φ p u q` ř k ě δ k Y ik φ k p u qs| du for i “ , , . . . , n , where δ k “ γ k { γ for k ě . In this case, underthe conditions of Theorem 6, the bounds as in that theorem still hold true. Note that δ k ď for all k ě . So, in the general case, the performance of the registration procedure studied in the paperwill only depend on how small η is and does not in general depend on the values of the δ k ’s (or the γ j ’s for j ě ). In other words, only the behaviour of the second frequency component relative tothe ﬁrst one matters (which elucidates the role of δ in the standard model, i.e. Equation 1, whoserole is precisely to tune this behaviour). Of course, the magnitude of the error in estimation forthe same value of η will now diﬀer from the rank case because of the presence of the additionalterms. We have investigated these issues in a simulation study in Section 5.3 (see, in particular,Figure 6).4. In the setup of the inﬁnite rank latent model considered in (3), we now compare the bounds obtainedin Theorem 6 to those obtained by Tang and M¨uller (2008). Denoting ř Mj “ γ j Y ij φ j “ κW i , it followsthat the latent model is exactly the same as considered in that paper (see p. 877 with δ there replacedby κ ). So, if µ ‰ , it follows that Z i “ κ ş | W i p u q| du { ş | µ p u q ` κW i p u q| du “ O P p κ q , whichis similar to the bound obtained in Tang and M¨uller (2008). Our analysis nevertheless reﬁnes theresults of Tang and M¨uller (2008) in the sense that it reveals the impact of µ on the asymptoticbias – larger magnitudes of µ yield smaller asymptotic bias. Further reﬁnements can be oﬀered bydiﬀerentiating between the cases µ ‰ and µ “ . Speciﬁcally, when µ “ , it can be shown that Z i “ ş | W i p u q ´ Y i φ p u q| du { ş | W i p u q| du . Thus, in this case, the error bounds on the warp mapsin Theorem 6 do not depend on κ . This is to be expected for the following reason. Note that µ “ means that the latent process in this case is X p t q “ c ` κW p t q for a constant c , and hence, thewarped process is r X p t q “ c ` κW p T ´ p t qq . Thus, the warped version of the process X diﬀers fromthe warped version of the process W only by a constant shift and a scale factor. Ideally, any properregistration procedure should be invariant with respect these transformations since they do not aﬀectthe time scale. This is clearly true for our procedure. We should thus get the same estimates of thewarp maps if we work with the warped process W p T ´ p t qq (which does not involve κ ) instead of r X .

5. Numerical Experiments

We now carry out simulation experiments to probe the ﬁnite-sample performance of our registrationprocedure. First we treat the case of a well-speciﬁed identiﬁable regime without error, and then separatelythe case when there are measurement errors in the observations. Finally, we consider the setup whenthe rank of the latent process is more than one (departure from identiﬁability). In all cases, we havecompared the performance of the proposed registration method to the continuous monotone registration(CMR) method by Ramsay and Li (1998), the pairwise registration (PW) technique of Tang and M¨uller(2008) and registration using the Fisher-Rao metric (FMR) studied in Srivastava et al. (2011). The CMRprocedure is implemented using the “register.fd” function in the R package fda . The PW procedure isimplemented using the Matlab codes in the

PACE package. The FMR method is implemented usingthe “time warping” function in the R package fdasrvf . The tuning parameters in the PW method arealways chosen to be the default ones since the other choices were found to be computationally extremelyintensive. For the CMR procedure, we compared its performance by using diﬀerent numbers of B-splinebasis functions in the structure of the warp maps (see Ramsay and Li (1998)). This varies their complexity.However, we found that the best performance was obtained when the warp maps are simple. As will beseen in the simulations, the registration procedures involving structural assumptions on warp maps andconsequently more tuning parameters (CMR and PW) encounter diﬃculties in several of the modelsconsidered, which is probably due to the mis-speciﬁcation of the true warping mechanism. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Let X p t q “ ξφ p t q , t P r , s , and consider two models:Model 1: ξ „ N p . , q , φ p t q “ exp t cos p πt ´ π qu ;Model 2: ξ „ ` Beta p , q , φ p t q “ t ´ p t ´ . q u cos p πt q .In either case, the sample size is n “

50 and the curves are observed at r “

101 equally spaced pointsin r , s . The warp maps are chosen according to point (3) of Remark 7 with the parameters J “ K “ V V , where V „ P oisson p q , P p V “ ˘ q “ { V independent of V , and β “ . p T i,d ’s is theEpanechnikov kernel on r´ , s . For both the models, the bandwidths used in the registration procedurewere chosen to under-smooth the data so that the features (maxima, minima, etc.) are not smeared out.In order to provide smooth registered curves, we have smoothed the p T i,d ’s using cubic splines with 11equi-spaced knots on r , s , prior to synchronising the data.Figure 2 shows the plots of the true, warped and registered data curves; the true, warped and registeredmeans; and the true, warped and registered leading eigenfunctions under Model 1 and Model 2. Figure2 suggests that the procedure studied in this paper has been able to adequately register the discretelyobserved and warped sample curves. Moreover, it is clear that the cross-sectional mean and the leadingeigenfunction of the warped curves diﬀer from the true mean and leading eigenfunction in either amplitudeor phase (under either model), while the registration procedure corrects the problem, and the resultingestimates (whether smoothed or raw) are very close to the true functions.Under both the models, it is seen that the estimates of the mean and the leading eigenfunctionobtained using the proposed registration procedure is closest to the true functions compared to all theother methods considered. This is more prominent under Model 2 (see the bottom two rows in Fig. 2),where the estimates of the leading eigenfunction obtained by all of other competing procedures consideredare far from the true eigenfunction. Also, the registered functions obtained using the CMR and the PWmethods do not resemble the true functions (see Figures 8 and 9). The above facts show that for smallsample sizes, even under no measurement error, some of the well-known registration procedures mayyield unsatisfactory results, while the proposed procedure works well in these cases. We now consider the situation when the warped observations under an identifable rank one modelhave been observed with measurement errors. As observed in our theoretical study in Section 4.1, therate of convergence will be much slower than the case when there is no measurement error. For oursimulations, we thus keep the same two models as in Section 5.1 but increase the sample size to n “ p´ . , . q while those under Model 2 are i.i.d.Unif p´ . , . q . The bandwidths for the smoothing steps involved in the registration procedure arechosen using built-in cross-validation bandwidth choice function “regCVBwSelC” in the locpol packagein the R software. Figures 3 and 4 show the plots of the unobserved true rank one curves, the warpedcurves that are observed with error and the registered curves. They also contain the plots of the meanfunction and the leading eigenfunction of the true, warped and registered data under the two models. Itis observed that even subject to measurement error contamination, the proposed registration procedureis able to adequately register the curves. In particular, under Model 2, the means as well as the leadingeigenfunction of the true and the registered curves are quite close. We also performed the registrationprocedure with a Nadaraya-Watson estimator (without boundary kernels) for obtaining an estimate of the r X i ’s (see Step 3**). The performance was not that diﬀerent from the one using a local linear estimator.Only the FRM procedure fares similarly as the proposed one when estimating the leading eigenfunctionunder both models. However, the PW method yields quite similar estimates of the mean as the proposedand the FRM method under each of the two models. Both the CMR and the PW methods fail to produceadequately registered curves as is seen from Figures 10 and 11. The improvement in the performanceof the FRM technique under Model 2 with error compared to the case without error considered in the . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Original curves t 0.0 0.2 0.4 0.6 0.8 1.0

Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0

Registered curves t . . . . . Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . t TrueEstm.WarpPWCMRFRM − − − Original curves t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Registered curves t − − Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − − t TrueEstm.WarpPWCMRFRM Fig 2 . Plots of the true, warped and registered data curves (using our procedure) along with the estimated mean leadingeigenfunction under Model 1 (top two rows) and Model 2 (bottom two rows) without measurement error obtained using ourprocedure as well as some other methods.. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations previous subsection is perhaps due to the increased sample size, which compensates for the measurementerror. Original curves t 0.0 0.2 0.4 0.6 0.8 1.0

Warped curves with error t 0.0 0.2 0.4 0.6 0.8 1.0

Registered curves t . . . . . Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . t TrueEstm.WarpPWCMRFRM Fig 3 . Plots of the true, warped and registered data curves (using our procedure) along with the estimated mean leadingeigenfunction under Model 1 with measurement error obtained using our procedure as well as some other methods.

We next carry out experiments to probe the performance of the registration procedure in a rank 2 and arank 3 setting – these correspond to an unidentiﬁable regime. The model considered in the rank 2 case are X “ ξ φ ` ξ φ with ξ „ N p . , q , ξ „ N p´ . , . q , φ p t q “ ? p πt q and φ p t q “ ? p πt q , t Pr , s . In the rank 3 case, we consider X “ ξ φ ` ξ φ ` ξ φ with the same choices of ξ j and φ j as abovefor j “ , ξ „ N p . , p . q q and φ p t q “ ? p πt q . The warp maps are the same asthose considered in the simulation study in Section 5. The plots of the true curves, the warped curves andthe registered curves are provided in Figure 5 for the rank 2 and the rank 3 models. The unidentiﬁablesetting has to be interpreted as follows: in light of Theorem 1 and the ensuing counter-examples, theremay be other models that could have generated the (statistically) same data. Consequently, strictlyspeaking, we cannot really talk about good or bad performance, as we there may be several equally valid“ground truths” to compare to. But the way we have constructed the unidentiﬁable simulation settingis by means of a mild departure from an identiﬁable model. Therefore, we can arbitrarily consider thatthe latter identiﬁable model is the truth and investigate whether the registration procedure is stable tothe said mild departure. A more detailed investigation of stability is pursued later in this subsection.It is observed that the registration procedure performs quite well and aligns the peak (present inthe true curves) adequately under both models (see Figure 5). Further, the two smaller troughs nearthe end-points present in the rank 3 model are also reasonably aligned (see the plots in the third rowin Figure 5). However, except the FRM procedure, the other two competing methods completely failin registering the data curves (see Figures 12 and 13 in the Supplementary material). Also, unlike ourprocedure, the registered curves using the FRM procedure seems to lack the two troughs present in the . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − − − Original curves t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Warped curves with error t 0.0 0.2 0.4 0.6 0.8 1.0 − − − Registered curves t − − Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − − t TrueEstm.WarpPWCMRFRM Fig 4 . Plots of the true, warped and registered data curves (using our procedure) along with the estimated mean leadingeigenfunction under Model 2 with measurement error obtained using our procedure as well as some other methods. original curves near the boundary points for the rank 3 model. For each of the two models, the meanseems to be estimated very well based on the registered curves using our procedure. The other proceduresfollow suit. A similar statement is also true for the ﬁrst eigenfunction under these two models. However,there is more bias in the estimate of the second eigenfunction under the rank 2 model for all of theregistration procedures. Under the rank 3 model, the CMR and the PW methods are not fully able tocapture the shape of the second eigenfunction, while our procedure and the FRM method does. Thethird eigenfunction under this model is somewhat reasonably estimated only by our procedure.In order to probe the breakdown point of the proposed registration procedure in the rank ą L -error in estimation ofthe data curves, i.e, the median of || p X i ´ X i ||{|| X i || , i “ , , . . . , n , and consider a threshold of 10%error as a criterion for good performance. The models are generated similar to the earlier simulation.For the rank 2 case, let X “ ξ ,c φ ` ξ ,c,r φ , where ξ „ N p c, q , ξ „ N p´ c, r q , where c P r . , s and r P r . , . s . The choices of c and r ensure that we include both approximately rank 1 models ( c and r close to zero) as well as proper rank 2 models (large values of r ). Similarly, for the rank 3 case,let X “ ξ ,c φ ` ξ ,c,r φ ` ξ ,c,r φ , where ξ „ N p c, r q . Figure 6 shows a plot of the relative L -errorsunder these classes of models, for various combinations of the parameters c and r . It is seen that when c is large, the performance of the registration procedure is good, which conforms with our theoreticalarguments in Theorem 6. In fact, for this class of rank 2 models, the maximum L error does not exceed12 . c is small, the allowable range of r values for good performance is muchgreater in the rank 2 setup compared to the rank 3 setup (cf. (c) in Remark 10). In fact, in the rank 3setup, the error is more than 10% for all r in the range considered when c ď .

2. Further, the maximum L error is now 29 . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − Original curves t 0.0 0.2 0.4 0.6 0.8 1.0 − Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0 − Registered curves t0.0 0.2 0.4 0.6 0.8 1.0

Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . . t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − . − . − . − . . . . . t TrueEstm.WarpPWCMRFRM0.0 0.2 0.4 0.6 0.8 1.0 Original curves t 0.0 0.2 0.4 0.6 0.8 1.0

Warped curves t 0.0 0.2 0.4 0.6 0.8 1.0

Registered curves t Mean t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 . . . . t TrueEstm.WarpPWCMRFRM0.0 0.2 0.4 0.6 0.8 1.0 − . − . . . . . t TrueEstm.WarpPWCMRFRM 0.0 0.2 0.4 0.6 0.8 1.0 − − t TrueEstm.WarpPWCMRFRM Fig 5 . Plots of the true, warped and registered data curves along with the means and eigenfunctions of the true, warpedand the registered data using our method and some other procedures under the rank (top two rows) and the rank models(bottom three rows).. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations rank 2 model c r rank 3 model c r Fig 6 . Level-plots of the relative L errors under the rank and the rank classes of models.

6. Data Analysis

In this section, we illustrate the performance of our registration procedure on a data set of growth curvesof

Tribolium beetle larvae, collected and analysed by Irwin and Carter (2013). Each curve representsthe mass measurement (in milligrams) as a function of the age of the larvae since hatching (in days).Their analysis of

Tribolium growth suggests that these beetles’ growth patterns diﬀer from those ofother animals with determinate growth (that is, growth that is contained in certain life stages). Usually,the longer the growth period, the larger the maximal mass attained (see Irwin and Carter (2014), andreferences therein). In

Tribolium , however, it seems that beetles that tend to grow faster, and thus havea shorter growth period, also tend to attain larger size (e.g. Figure 7, top left). See Irwin and Carter(2013) for more details and background. This observation suggests that the

Tribolium data could be well-suited for a phase-amplitude analysis under a latent rank 1 model that has been warped: one expectsthat correcting for diﬀerent “growth clocks” (phase variation) should yield curves that are roughly ofunimodal amplitude variation, due to ﬁnal mass. Conversely, it suggests a potential latent model thatproduces rank 1 vertical variation related only to ﬁnal mass, and horizontal variation due to growthtiming (i.e. how this total ﬁnal mass is accumulated in time).For our analysis, we have only considered the part of the dataset where there were at least 10 discretemeasurements per individual curve, which results in a sample size of 159. Also, not all larvae wererecorded on the same day so that the number of observations diﬀered across individuals. Since there arerelatively few measurements (maximum 12) per individual larvae, we smoothed each observation vectoras a pre-processing step. This was done using the built-in function splinefun in the R software withthe method monoH.FC that uses monotone Hermite spline interpolation proposed by Fritsch and Carlson(1980) (since the curves are expected to be approximately increasing).As is typically the case with growth curves, one expects that, if unaccounted for, the lurking phasevariation would give the impression of several modes of amplitude variation. The aim our analysis is thusto register the curves, estimate the warp maps, estimate the mean of the registered curves, and carryout an eigenanalysis of the registered data.It is indeed observed that prior to any registration, the data present at least two susbtantial modesof amplitude variation, with the ﬁrst three principal components explaining 78 . .

85% ofthe total variation, respectively. However, after registration using our method, the empirical covarianceoperator is almost precisely of rank 1, with the leading principal component explaining 99 .

72% of the totalvariation. Interestingly, the mean of the registered data has the same shape as the leading eigenfunctionand is in fact roughly equal to 776 times the leading eigenfunction. This can be seen as a model diagnostic, . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations l l l l l l l l l l Warped discrete data

Age l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l

Warped smoothed curves

Age 1 4 7 10 13 16 19 22

Registered curves

Age1 4 7 10 13 16 19 22

Warp maps

Age Estm. warpMean estm. warp 1 4 7 10 13 16 19 22

Mean

Age Estm. meanWarp. meanPWCMRFRM 1 4 7 10 13 16 19 22 . . . . . Age Estm. eigfunWarp. eigfunPWCMRFRM

Fig 7 . Plots in the ﬁrst row are those of the Tribolium data, the smoothed curves and the registered curves using ourprocedure. The ﬁrst plot in the second row shows the estimated warp maps, where the dotted line is the identity map. Theother two plots in the second row show the means and the leading eigenfunctions of the warped and the registered data usingour procedure and some other registration methods. corroborating the model: if the rank 1 model were correct, then after registration one would expect tohave a single mode of amplitude variation and a mean in the span of the corresponding eigenfunction(see the discussion after Counterexample 1).Figure 7 show the plots of the actual data, the monotone spline smoothed data and the registereddata, as well as the plot of the estimated warp maps and the average warp map, which is very closeto the identity. It also shows the plots of the mean and the leading eigenfunction of the warped andthe registered data. Although the means of the warped and the registered data are very close, there aresubstantial qualitative diﬀerences between the corresponding eigenfunctions. The eigenfunction of theregistered data shows that the variation in growth pattern essentially starts at about the 8 days afterhatching. Between ages 10 ´

16 days post hatching, there is a notable increase in the growth variation, andit somewhat recedes after that age. These periods are in fact compatible with biologically interpretablephases of growth: the larvae enter an “instar” (a distinct growth period between exoskeleton moults)characterised by exponential growth at around day 7-8; then, around day 17, they enter the “wanderingphase” and begin losing weight in preparation for pupation.The performance of the FRM technique is very similar to the proposed procedure and results inan almost rank one registration. However, the CMR and the PW procedures do not yield a rank oneregistration although the estimated means are very similar to that obtained by our procedure, which isobserved by comparing Figure 7 with Figure 14. However, the diﬀerence lies in the registered curves andthe estimate of the leading eigenfunction. The latter shows some artifacts which do not conform to thebiological explanation provided earlier, e.g., the presence of ﬂat regions in the estimated eigenfunctionduring the “instar” phase of exponential growth as well as the growth spurt towards the end where thelarvae would actually enter the “wandering phase”. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Appendix – Proofs of Formal Statements

Proof of Lemma 1.

Since X p t q “ ξφ p t q , t P r , s , we have F p t q “ ż t | X p u q| du { ż | X p u q| du “ ż t | φ p u q| du { ż | φ p u q| du “ F φ p t q by Deﬁnition 2. Next, r X p t q “ ξφ p T ´ p t qq so that r X p t q “ ξφ p T ´ p t qq{ T p T ´ p t qq . Thus, using the strictmonotonicity of T , we have r F p t q “ ż t | r X p u q| du { ż | r X p u q| du “ "ż t | φ p T ´ p u qq|{ T p T ´ p u qq du * { "ż t | φ p T ´ p u qq|{ T p T ´ p u qq du * . A standard change-of-variable argument and the fact that T is a bijection with T p q “ T p q “ r F p t q “ ş T ´ p t q | φ p u q| du { ş | φ p u q| du “ F φ p T ´ p t qq . So, r F “ F φ ˝ T ´ , equivalently, T “ r F ´ ˝ F φ Ø T ˝ F ´ φ “ r F ´ . Using the assumption that E p T q “ Id , we now have E p r F ´ q “ F ´ φ . Proof of Theorem 1.

Note that f : C r , s ÞÑ f P p C r , s , || ¨ || q is a Lipschitz map. Thus, r X d “ r X implies that r X d “ r X . Consider the random probability measure given byΨ p A q “ ż A | r X p u q| du { ż r , s | r X p u q| du for A in the Borel σ -ﬁeld of r , s . Similarly, Ψ p A q “ ş A | r X p u q| du { ş r , s | r X p u q| du . We equip the space P of diﬀuse probability measures on r , s with the L -Wasserstein metric (see, e.g., Villani (2003)) given by d W p µ, ν q “ || F ´ ν ´ F ´ µ || , where F µ and F ν are the distribution functions associated with the probabilitymeasures µ and ν . Now for any f , f P C r , s satisfying ş | f i p u q| du ą i “ ,

2, consider themeasure µ i with density | f i p s q|{ ş | f i p u q| du for i “ ,

2. The condition ş | f p u q| du ą f ‰ const. . Since µ and µ are supported on the bounded set r , s , it follows from Proposition 7.10 inVillani (2003) that d W p µ , µ q ď c d T V p µ , µ q for a constant c ą

0, where d T V p¨ , ¨q is the total variationdistance. It now follows that d W p µ , µ q ď c ż ˇˇˇˇˇ | f p s q| ş | f p u q| du ´ | f p s q| ş | f p u q| du ˇˇˇˇˇ ds ď c ż ˇˇˇˇˇ | f p s q| ş | f p u q| du ´ | f p s q| ş | f p u q| du ˇˇˇˇˇ ds ` c ż ˇˇˇˇˇ | f p s q| ş | f p u q| du ´ | f p s q| ş | f p u q| du ˇˇˇˇˇ ds ď c ş | f p s q ´ f p s q| ds ş | f p s q| ds ď c || f ´ f || ş | f p s q| ds ď c ||| f ´ f ||| ş | f p s q| ds Thus, the embedding H : f ÞÑ µ f is continuous when the domain, say, A is restricted to the set ofall non-constant functions on C r , s . But the set A c is a one dimensional linear subspace spanned bythe constant function f ”

1, and this implies that A c is a Borel measurable subset of C r , s . So, A is a Borel measurable subset of C r , s . Equip A with the Borel σ -ﬁeld induced from C r , s . Since P p r X P A c q “

0, we have that H p r X q is a valid random probability measure on r , s . Note that for anyBorel subset A of r , s , we have H p r X qp A q “ Ψ p A q . Thus, for any Borel subset B of P , we have P p H p r X q P B q “ P p r X P H ´ p B qq “ P p r X P H ´ p B qq “ P p H p r X q P B q . The ﬁrst equality follows from the continuity of H on A and the fact that P p r X P A c q “ r X and r X have the same distributions by assump-tion. So, H p r X q d “ H p r X q as random probability measures. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Next, note that the random measures H p r X i q , i “ ,

2, have strictly increasing cdfs almost surely.Proposition 2 in Panaretos and Zemel (2016) states that for each i “ ,

2, the map γ Ñ E t d W p H p r X i q , γ qu admits a unique minimizer given by E t r F ´ i u , where r F Ψ i is the random distribution function of the ran-dom measure H p r X i q . Since r X i “ ξ i φ i p T ´ i q with T i being a strictly increasing homeomorphism on r , s , itfollows from the change-of-variable formula that H p r X i qp A q “ Ψ i p A q “ ş T ´ i p A q | φ i p u q| du { ş r , s | φ i p u q| du .Thus, r F Ψ i “ F φ i ˝ T ´ i , equivalently, r F ´ i “ T i ˝ F ´ φ i , where F φ i is the cdf associated with the (deter-ministic) probability measure Φ i p A q “ ş A | φ i p u q| du { ş r , s | φ i p u q| du .Note that F φ i has a continuous and strictly increasing cdf since φ i is zero only on a countable setfor i “ ,

2. Since E p T i q “ Id , it follows that the minimizer E t r F ´ i u “ F φ i for i “ ,

2. But since H p r X q d “ H p r X q , it now follows that F φ “ F φ . Also, T i “ r F ´ i ˝ F φ i , equivalently, T ´ i “ F ´ φ i ˝ r F Ψ i .Using the above facts and the result obtained in the previous paragraph, it now follows that T d “ T .We next claim that the joint distributions of p r X i , T ´ i q , i “ , H : f ÞÑ p f, H p f qq deﬁned from A to A b P with the latter being equipped with the inducedproduct topology and the induced product σ -ﬁeld. It follows from the same arguments used to prove thecontinuity of H that H is continuous. Thus, for Borel subsets G and G of C r , s , we have P p r X P G , T ´ P G q “ P p r X P G , F ´ φ ˝ r F Ψ P G q “ P p r X P G , r F Ψ P F φ p G qq“ P p H p r X q P G ˆ F φ p G qq “ P p r X P H ´ p G ˆ F φ p G qqq“ P p r X P H ´ p G ˆ F φ p G qqq [since F φ “ F φ ] “ P p H p r X q P G ˆ F φ p G qq “ P p r X P G , r F Ψ P F φ p G qq“ P p r X P G , F ´ φ ˝ r F Ψ P G q “ P p r X P G , T ´ P G q . Next, note that X i “ r X i ˝ T i is the true unobserved process. It is easy to show that the map p f, g q ÞÑ f ˝ g from C r , sb C r , s into C r , s is continuous. Thus, using the observation in the previous paragraph,we have X d “ X as random elements in C r , s . It follows from the equality of distributions that theircovariance operators are equal, and thus the corresponding eigenfunctions are equal. Now, the covarianceoperator of X i is given by V ar p ξ i q φ i b φ i . Since X i “ ξ i φ i is a rank one process, the equality of thecovariance operators implies that φ “ ˘ φ (since || φ || “ || φ || “ X d “ X implies that ξ “ x X , φ y d “ x X , φ y “ x X , ˘ φ y “ ˘ ξ . Proof of Theorem 2.

First observe that the T i ’s are also i.i.d. random elements in C r , s . Moreover,since T is strictly increasing and positive, we have E p|| T || q “ E p T p qq “ ă 8 . Thus, by the stronglaw for Banach space valued random elements (see, e.g, Theorem 2.4 in Bosq (2000)), it follows that T Ñ E p T q “ Id as n Ñ 8 almost surely. In addition, if E p|| T || q ă 8 implying that E p||| T ||| q ă 8 ,then the almost sure convergence T Ñ E p T q “ Id holds in C r , s .(a) Since p F ´ “ T ˝ F ´ φ , using Theorem 2.18 in Villani (2003), we get that d W p p F , F φ q “ || p F ´ ´ F ´ φ || “ ż ˇˇˇ p F ´ p F φ p t qq ´ t ˇˇˇ F φ p dt q“ ż ˇˇ T p t q ´ t ˇˇ F φ dt ď || T ´ Id || Ñ n Ñ 8 . (b) Since each T i is a strictly increasing bijection on r , s , we have || p T ´ i ´ T ´ i || “ sup t Pr , s ˇˇ T p T ´ i p t qq ´ T ´ i p t q ˇˇ “ || T ´ Id || Ñ n Ñ 8 . Since both p T ´ i and T ´ i are strictly increasing homeomorphisms, the uniform convergence of p T i to T i follows as a consequence of the above uniform convergence. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Suppose now that Condition 1 holds. We have discussed towards the beginning of the proof that in thiscase ||| T ´ Id ||| Ñ n Ñ 8 almost surely. In view of the ﬁrst half of part (b) of the theorem alongwith the deﬁnition of the ||| ¨ ||| norm, it is enough to show the uniform convergence of the derivatives.Since each T i is a strictly increasing bijection on r , s , so is T for every n ě

1. First note that ||p p T ´ i q ´ p T ´ i q || “ sup t Pr , s |p T ˝ T ´ i q p t q ´ p T ´ i q p t q| “ sup t Pr , s ˇˇˇˇˇ T p T ´ i p t qq T i p T ´ i p t qq ´ T i p T ´ i p t qq ˇˇˇˇˇ “ sup t Pr , s ˇˇˇˇˇ T p t q ´ T i p t q ˇˇˇˇˇ ď δ ´ || T ´ || , where is the constant function taking value 1. It thus follows from an earlier bound that |||p p T ´ i q ´ p T ´ i q ||| ď || T ´ Id || ` δ ´ || T ´ || ď max p , δ ´ q||| T ´ Id ||| Ñ n Ñ 8 . Next note that T p t q “ n ´ ř ni “ T i p t q ě n ´ ř ni “ inf s Pr , s T i p t q “ δ so that inf t Pr , s T p t q ě δ ą || p T i ´ T i || “ sup t Pr , s |p T i ˝ T ´ q p t q ´ T i p t q| “ sup t Pr , s ˇˇˇˇˇ T i p T ´ p t qq T p T ´ p t qq ´ T i p t q ˇˇˇˇˇ “ sup t Pr , s ˇˇˇˇˇ T i p t q T p t q ´ T i p T p t qq ˇˇˇˇˇ ď sup t Pr , s ˇˇˇˇˇ T i p t q T p t q ´ T i p T p t qq T p t q ˇˇˇˇˇ ` sup t Pr , s ˇˇˇˇˇ T i p T p t qq T p t q ´ T i p T p t qq ˇˇˇˇˇ ď δ ´ sup t Pr , s ˇˇ T i p t q ´ T i p T p t qq ˇˇ ` δ ´ || T i || || T ´ || . Since T i is continuous on r , s , it is uniformly continuous. This and the fact that || T ´ Id || Ñ n Ñ 8 almost surely implies that sup t Pr , s ˇˇ T i p t q ´ T i p T p t qq ˇˇ Ñ n Ñ 8 almost surely. Combiningthis fact with the uniform convergence of T to , we get that ||| p T i ´ T i ||| Ñ n Ñ 8 almost surely.(c) Note that || p X i ´ X i || “ | ξ i | sup t Pr , s | φ p T ´ p t qq ´ φ p t q| “ | ξ i | sup t Pr , s | φ p T p t qq ´ φ p t q| Ñ n Ñ 8 , since || T ´ Id || Ñ n Ñ 8 almost surely, and φ is continuous on r , s and hence uniformlycontinuous.Suppose now that Condition 1 holds. Then, as before, || p X i ´ X i || “ | ξ i | sup t Pr , s ˇˇˇˇˇ φ p T ´ p t qq T p T ´ p t qq ´ φ p t q ˇˇˇˇˇ “ | ξ i | sup t Pr , s ˇˇˇˇˇ φ p t q T p t q ´ φ p T p t qq ˇˇˇˇˇ ď | ξ i | sup t Pr , s ˇˇˇˇˇ φ p t q T p t q ´ φ p T p t qq T p t q ˇˇˇˇˇ ` | ξ i | sup t Pr , s ˇˇˇˇˇ φ p T p t qq T p t q ´ φ p T p t qq ˇˇˇˇˇ ď | ξ i | δ ´ sup t Pr , s | φ p t q ´ φ p T p t qq| ` | ξ i | || φ || δ ´ || T ´ || . Using similar arguments as earlier, we conclude that || p X i ´ X i || Ñ ||| p X i ´ X i ||| Ñ n Ñ 8 almost surely.(d) Observe that since p X i “ ξ i φ ˝ T ´ “ X i ˝ T ´ , it follows from the change-of-variable formula that p F i “ F φ ˝ T ´ . Thus, d W p p F i , F φ q “ || p F ´ i ´ F ´ φ || “ || T ˝ F ´ φ ´ F ´ φ || “ ż ˇˇ T p t q ´ t ˇˇ F φ p dt q ď || T ´ Id || Ñ n Ñ 8 . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (e) Observe that || X r ´ µ || “ || n ´ n ÿ i “ p p X i ´ X i q ` n ´ n ÿ i “ X i ´ µ || ď n ´ n ÿ i “ || p X i ´ X i || ` || n ´ n ÿ i “ X i ´ µ || . Since the X i ’s are i.i.d. random elements in C r , s with E p|| X || q “ E p| ξ |q|| φ || ă 8 , we concludefrom the strong law for Banach space valued random elements that || n ´ ř ni “ X i ´ µ || Ñ n Ñ 8 almost surely. Also, from the proof of part (c), we have that n ´ n ÿ i “ || p X i ´ X i || “ sup t Pr , s | φ p T p t qq ´ φ p t q| ˆ n ´ n ÿ i “ | ξ i | “ sup t Pr , s | φ p T p t qq ´ φ p t q| ˆ t E p| ξ |q ` o p qu as n Ñ 8 almost surely. Thus, using similar arguments as in part (c) of the theorem, we obtain n ´ ř ni “ || p X i ´ X i || Ñ n Ñ 8 almost surely. Combining the above facts, we conclude || X r ´ µ || Ñ n Ñ 8 almost surely.Note that since X i “ ξ i φ , it follows that || n ´ ř ni “ X i ´ µ || Ñ n Ñ 8 almost surely. Now,suppose that Condition 1 holds. A similar decomposition as above yields || X r ´ µ || ď n ´ n ÿ i “ || p X i ´ X i || ` || n ´ n ÿ i “ X i ´ µ || . The proof of part (c) implies that n ´ n ÿ i “ || p X i ´ X i || ď δ ´ ˜ n ´ n ÿ i “ | ξ i | ¸ sup t Pr , s | φ p t q ´ φ p T p t qq| ` || φ || || T ´ || + . The right-hand term above converges to zero as n Ñ 8 almost surely. The result is now established uponcombining the above facts.(f) Straightforward algebraic manipulations yield x K r “ n ´ n ÿ i “ p p X i ´ X r q b p p X i ´ X r q“ n ´ n ÿ i “ p X i ´ X q b p X i ´ X q ` n ´ n ÿ i “ p p X i ´ X i q b p p X i ´ X i q ´ p X ´ X r q b p X ´ X r q` n ´ n ÿ i “ tp p X i ´ X i q b p X i ´ X q ` p X i ´ X q b p p X i ´ X i qu . Denote x K “ n ´ ř ni “ p X i ´ X q b p X i ´ X q . Then, ||| x K r ´ x K ||| ď n n ÿ i “ || p X i ´ X i || || X i ´ X || ` n n ÿ i “ || p X i ´ X i || ` || X ´ X r || . Using the Cauchy-Schwarz inequality, we have n ´ ř ni “ || p X i ´ X i || || X i ´ X || ď t n ´ ř ni “ || p X i ´ X i || u { t n ´ ř ni “ || X i ´ X || u { , and n ´ ř ni “ || X i ´ X || “ O p q as n Ñ 8 almost surely. It followsfrom the arguments in the proof of part (c) of the theorem that n ´ n ÿ i “ || p X i ´ X i || ď n ´ n ÿ i “ || p X i ´ X i || ď sup t Pr , s | φ p T p t qq ´ φ p t q| ˜ n ´ n ÿ i “ | ξ i | ¸ , and the right hand side is o p q as n Ñ 8 almost surely since E p| ξ | q ă 8 . Further, || X ´ X r || “ o p q as n Ñ 8 almost surely. Thus, ||| x K r ´ x K ||| “ o p q as n Ñ 8 almost surely. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations The proof of the uniform convergence of p K r p s, t q to K p s, t q is obtained by use of a decomposition of p K r p s, t q similar to the one used above, noting that p K p s, t q converges uniformly to K p s, t q (by the stronglaw of large numbers in C pr , s q ), and the fact that all the other bounds hold in the supremum norm.Next, note that p φ p t q “ p λ ´ ş p K r p s, t q p φ p s q ds and φ p t q “ λ ´ ş K p s, t q φ p s q ds for all t P r , s , where | p λ ´ λ | ď ||| x K r ´ K ||| Ñ n Ñ 8 almost surely. Also, || p φ ´ φ || ď ? λ ´ ||| x K r ´ K ||| Ñ n Ñ 8 almost surely. So, | p φ p t q ´ φ p t q| ď ˇˇˇˇp λ ´ ż p K r p s, t q p φ p s q ds ´ p λ ´ ż K p s, t q p φ p s q ds ˇˇˇˇ ` ˇˇˇˇp λ ´ ż K p s, t q p φ p s q ds ´ p λ ´ ż K p s, t q φ p s q ds ˇˇˇˇ ` ˇˇˇˇp λ ´ ż K p s, t q φ p s q ds ´ λ ´ ż K p s, t q φ p s q ds ˇˇˇˇ ď p λ ´ || p K r ´ K || ` p λ ´ || K || || p φ ´ φ || ` ˇˇˇ p p λ ´ ´ λ ´ q λφ p t q ˇˇˇ ď p λ ´ ` o p qqt|| p K r ´ K || ` || K || || p φ ´ φ || u ` | λ ´ p λ |p λ ´ ` o p qq ´ || φ || as n Ñ 8 almost surely. Thus, || p φ ´ φ || Ñ n Ñ 8 almost surely.Finally, | p ξ i ´ ξ i | “ |x p X i , p φ y ´ x X i , φ y| ď |x p X i ´ X i , p φ y| ` |x X i , p φ ´ φ y| ď || p X i ´ X i || ` || p φ ´ φ || Ñ n Ñ 8 almost surely.

Proof of Theorem 3.

We have | T p t q ´ T p s q| ď || T || | s ´ t | and by assumption E p|| T || q ă 8 . So,by the CLT for i.i.d. C r , s valued random elements (see, e.g, Theorem 2.4 Bosq (2000)), we have ? n p T ´ Id q d Ñ Y for a zero mean Gaussian random element Y in C r , s .(a) From the proof of part (a) of Theorem 2, one has that d W p p F , F φ q “ ş | T p t q ´ t | F φ p dt q . Now, itis easy to check that the map C r , s Q f Ñ ş | f p t q| F φ p dt q is continuous. The result follows from thecontinuous mapping theorem.(b) Note that for each ﬁxed i ě

1, we have ? n p p T ´ i ´ T ´ i q “ U n ˝ V n , where U n “ ? n p T ´ Id q and V n “ T ´ i . We will ﬁrst derive the weak limit conditional on T i “ t i . From the previous paragraph, itfollows that conditional on T i “ t i , U n “ ? n p n ´ t i ` n ´ ř j ‰ i T j ´ Id q d Ñ Y , and V n , being a constantsequence, converges conditionally in probability to t ´ i as n Ñ 8 . So, by Theorem 4.4 in Billingsley(1968), conditional on T i “ t i , we have p U n , V n q d Ñ p

Y, t ´ i q in the C r , s topology. Using the fact thatthe map p f, g q ÞÑ f ˝ g is continuous in C pr , s q (see, e.g., p. 155 in Billingsley (1968)), it follows fromthe continuous mapping theorem that conditional on T i “ t i , ? n p p T ´ i ´ T ´ i q d Ñ Y ˝ t ´ i as n Ñ 8 for each ﬁxed i ě

1. Thus, by the Dominated Convergence Theorem, the unconditional distribution of ? n p p T ´ i ´ T ´ i q converges weakly as n Ñ 8 for each ﬁxed i ě ? n p p T i ´ T i q “ ? n p T i ˝ T ´ ´ T i q , we will as earlier ﬁrst derive itsweak limit conditional on T i “ t i . Now, using the fact that T i P C r , s almost surely, we have p T i p s q ´ t i p s q “ t i p T ´ p s qq ´ t i p s q “ t i p s ` T ´ p s q ´ s q ´ t i p s q“ p T ´ p s q ´ s q ˆ t i p s ` β p T ´ p s q ´ s qq for some β P r , s (possibly depending on s and i ). Thus, ? n p p T i ´ t i q “ t? n p T ´ ´ Id qu ˆ t i p¨ ` o P p qq “ t? n p Id ´ T q ˝ T ´ u ˆ t i p¨ ` o P p qq where the o P p q term is uniform in s since || T ´ ´ Id || Ñ n Ñ 8 almost surely. Using similararguments as in the above proof and noting that || T ´ Id || as n Ñ 8 almost surely, we deduce that ? n p p T i ´ t i q d Ñ Y ˆ t i as n Ñ 8 . Thus, by the Dominated Convergence Theorem, the unconditional . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations distribution of ? n p p T i ´ T i q converges weakly as n Ñ 8 for each ﬁxed i ě i ě p X i p s q ´ X i p s q “ ξ i t φ p T ´ p s qq ´ φ p s qu “ ξ i tp T ´ p s q ´ s q φ p s ` β p T ´ p s q ´ s qquñ ? n p p X i ´ X i q “ ξ i t? n p Id ´ T q ˝ T ´ u ˆ φ p¨ ` o P p qq , where β P r , s , and the o P p q term is uniform in s as earlier. Similar arguments as in part (b) aboveyield ? n p p X i ´ X i q d Ñ ξ i Y ˆ φ as n Ñ 8 for each ﬁxed i ě ? n p X r ´ µ q “ ? n n ´ n ÿ i “ ξ i φ ˝ T ´ ´ E p ξ q φ + “ ? n n ´ n ÿ i “ p ξ i ´ E p ξ qq + φ ˝ T ´ ` E p ξ q? n ! φ ˝ T ´ ´ φ ) d Ñ N p , V ar p ξ qq φ ` E p ξ q Y ˆ φ , which follows from similar arguments as in part (c) and the independence of the ξ i ’s and the T i ’s.(f) For the ﬁrst part, note that x K r “ n ´ n ÿ i “ p p X i ´ X r q b p p X i ´ X r q“ n ´ n ÿ i “ p p X i ´ µ q b p p X i ´ µ q ´ p X r ´ µ q b p X r ´ µ q“ S ` S , say . Now, some straightforward manipulations yield S “ n ´ n ÿ i “ t ξ i φ ˝ T ´ ´ E p ξ q φ u b t ξ i φ ˝ T ´ ´ E p ξ q φ u“ n ´ n ÿ i “ t ξ i ´ E p ξ qu p φ ˝ T ´ q b p φ ˝ T ´ q ` E p ξ qp φ ˝ T ´ ´ φ q b p φ ˝ T ´ ´ φ q` n ´ E p ξ q n ÿ i “ t ξ i ´ E p ξ qu ” p φ ˝ T ´ q b p φ ˝ T ´ ´ φ q ` p φ ˝ T ´ ´ φ q b p φ ˝ T ´ q ı . So, ? n p S ´ K q“ ? n n ´ n ÿ i “ t ξ i ´ E p ξ qu p φ ˝ T ´ q b p φ ˝ T ´ q ´ K + “ ? n n ´ n ÿ i “ t ξ i ´ E p ξ qu p φ ˝ T ´ q b p φ ˝ T ´ q ´ V ar p ξ q φ b φ + “ ? n n ´ n ÿ i “ “ t ξ i ´ E p ξ qu ´ V ar p ξ q ‰ p φ ˝ T ´ q b p φ ˝ T ´ q` V ar p ξ q ” p φ ˝ T ´ q b p φ ˝ T ´ q ´ φ b φ ı ` E p ξ qp φ ˝ T ´ ´ φ q b p φ ˝ T ´ ´ φ q . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ` n ´ E p ξ q n ÿ i “ t ξ i ´ E p ξ qu ” p φ ˝ T ´ q b p φ ˝ T ´ ´ φ q ` p φ ˝ T ´ ´ φ q b p φ ˝ T ´ q ı+ The ﬁrst term on the right hand side of the above equality converges in distribution to N p , E t ξ ´ E p ξ qu q φ b φ since T Ñ Id as n Ñ 8 almost surely. For the latter reason, the third and the fourthterms converge to zero in probability as n Ñ 8 . For the second term, note that p φ ˝ T ´ q b p φ ˝ T ´ q ´ φ b φ “ p φ ˝ T ´ ´ φ q b φ ` p φ ˝ T ´ q b p φ ˝ T ´ ´ φ q . Thus, by similar arguments as in part (c) earlier, and the continuity of the mapping p f, g q ÞÑ f b g from L pr , s q to the space of Hilbert Schmidt operators, we have that the second term converges indistribution to V ar p ξ qtp Y ˆ φ q b φ ` φ b p Y ˆ φ qu . Combining the above observations and the factthat ? nS Ñ ? n p x K r ´ K q d Ñ N p , E t ξ ´ E p ξ qu q φ b φ ` V ar p ξ qtp Y ˆ φ q b φ ` φ b p Y ˆ φ qu as n Ñ 8 .In order to prove the weak convergence of the empirical process t? n p p K r p s, t q ´ K p s, t qq : s, t P r , su in C pr , s q , we follow the same decomposition as in the proof of the weak convergence of the operatorsin the Hilbert Schmidt topology. Now, note that the proof of part (c) of the theorem implies that theempirical process t? n p φ p T ´ p t qq ´ φ p t qq : t P r , su in C r , s converges in distribution to the process t Y p t q φ p t q : t P r , su in C r , s . This fact and the same arguments as in part (f) yield t? n p p K r p s, t q ´ K p s, t qq : s, t P r , su d Ñ t Zφ p s q φ p t q ` V ar p ξ qr Y p s q φ p s q φ p t q ` Y p t q φ p t q φ p s qs : s, t P r , su as n Ñ 8 , where Z „ N p , E t ξ ´ E p ξ qu q does not depend on s, t .For the weak convergence of p φ , ﬁrst note that x K r “ n ´ ř ni “ p ξ i ´ ξ q p φ ˝ T ´ q b p φ ˝ T ´ q . Thus, p φ “ p φ ˝ T ´ q{|| φ ˝ T ´ || . Now, p φ ´ φ “ φ ˝ T ´ || φ ˝ T ´ || ´ φ “ φ ˝ T ´ ´ φ || φ ˝ T ´ || ´ φ p|| φ ˝ T ´ || ´ q|| φ ˝ T ´ || “ φ ˝ T ´ ´ φ || φ ˝ T ´ || ´ φ p|| φ ˝ T ´ || ´ q|| φ ˝ T ´ || p|| φ ˝ T ´ || ` q“ φ ˝ T ´ ´ φ || φ ˝ T ´ || ´ φ p|| φ ˝ T ´ ´ φ || ` x φ ˝ T ´ ´ φ, φ yq|| φ ˝ T ´ || p|| φ ˝ T ´ || ` q . Using the weak convergence of ? n p φ ˝ T ´ ´ φ q to Y ˆ φ in the C r , s topology, we have that ? n p p φ ´ φ q d Ñ Y ˆ φ ´ ˆ x Y ˆ φ , φ y φ “ Y ˆ φ ´ x Y ˆ φ , φ y φ as n Ñ 8 in the C r , s topology.Finally, for the weak convergence of the p ξ i ’s, observe that ? n p p ξ i ´ ξ i q “ ? n tx p X i ´ X i , p φ ´ φ y ` x p X i ´ X i , φ y ` x X i , p φ ´ φ yu“ ? n t ξ i xp φ ˝ T ´ ´ φ q , p p φ ´ φ qy ` ξ i xp φ ˝ T ´ ´ φ q , φ y ` ξ i x φ, p p φ ´ φ qyu . Using the independence of ξ i and the T j ’s, and using the asymptotic distributions obtained above andin part (c), it follows that ? n p p ξ i ´ ξ i q d Ñ ξ i tx Y ˆ φ , φ y ` x φ, p Y ˆ φ ´ ´ t|| Y ˆ φ ` φ || ´ u φ qyu as n Ñ 8 . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations In order to prove Theorem 4, we will ﬁrst prove a few crucial results.

Proposition 1.

Assume that φ P C r , s and inf t Pr , s T p u q ě δ ą almost surely for a deterministicconstant δ . Then, for each i ě , we have ř r ´ j “ | φ p s i,j ` q ´ φ p s i,j q| “ ş | φ p u q| du ` B ,r almost surely,where B ,r “ O p r ´ q almost surely with the O p q term being uniform in i . Further, ř j P I t | φ p s i,j ` q ´ φ p s i,j q| “ ş T ´ i p t q | φ p u q| du ` B ,r p t q for all t P r , s almost surely, where || B ,r || “ O p r ´ q almost surelywith the O p q term being uniform in i . Consequently, we have ř r ´ j “ | φ p t j ` q ´ φ p t j q| “ ş | φ p u q| du ` B ,r and ř j P I t | φ p t j ` q ´ φ p t j q| “ ş t | φ p u q| du ` B ,r p t q for all t P r , s almost surely, where B ,r “ O p r ´ q and || B ,r || “ O p r ´ q almost surely.Proof of Proposition 1. First, let us deﬁne t “ t r ` “ t ą t r ă

1. Then, t t j : 0 ď j ď r ` u is a partition of r , s . Consider the sum S i “ ř rj “ | φ p s i,j ` q ´ φ p s i,j q| and note thatby a Taylor expansion, S i “ ř rj “ p s i,j ` ´ s i,j q| φ p r s i,j q| , where r s i,j P r s i,j , s i,j ` s . The right hand side isa Riemann sum approximation of ş | φ p u q| du with t s i,j “ T ´ i p t j q : 0 ď j ď r ` u as the partition of r , s , since T i is a strictly increasing bijection. Thus, writing ∆ “ max ď j ď r p s i,j ` ´ s i,j q , we have | S i ´ ż | φ p u q| du | ď sup t| | φ p t q| ´ | φ p s q| | : s, t P r , s and | t ´ s | ď ∆ uď sup t| φ p t q ´ φ p s q| : s, t P r , s and | t ´ s | ď ∆ uď || φ || ∆ . Now for any 0 ď j ď r , we have s i,j ` ´ s i,j “ T ´ i p t j ` q ´ T ´ i p t j q “ p t j ` ´ t j q{ T i p T ´ i p r t j qq , for some r t j P r t j , t j ` s . Using the assumption in the theorem and that on the grid, it now follows that∆ “ max ď j ď r p s i,j ` ´ s i,j q ď δ ´ O p r ´ q uniformly on i . Thus, | S i ´ ş | φ p u q| du | ď || φ || δ ´ O p r ´ q .To complete the ﬁrst part of the proof, note that ř r ´ j “ | φ p s i,j ` q ´ φ p s i,j q| diﬀers from S i by at most twoterms, and both of these terms are O p r ´ q uniformly over i by the same arguments as those for S i .For the second part, ﬁx any t P r , s . Deﬁning B ,r p q “

0, there is nothing to prove when t “

0. For t ą

0, deﬁne t “

0. If j ˚ is the largest j for which t j ` ď t , deﬁne t j ˚ ` “ t if t j ˚ ` ă t . Note that j ˚ depends on t . Then, t t j : 0 ď j ď j ˚ ` u is a partition of r , t s , and hence t s i,j “ T ´ i p t j q : 0 ď j ď j ˚ ` u is a partition of r , T ´ i p t qs . Deﬁne R i p t q “ ř j ˚ j “ | φ p s i,j ` q´ φ p s i,j q| . Then, by similar arguments as earlier,we have ˇˇˇˇˇ R i p t q ´ ż T ´ i p t q | φ p u q| du ˇˇˇˇˇ ď || φ || δ ´ max ď j ď j ˚ p s i,j ` ´ s i,j q “ B ,r p t q , say . Thus, || B ,r || ď O p r ´ q uniformly over i . The proof is completed upon noting that R i p t q diﬀers from ř j P I t | φ p s i,j ` q ´ φ p s i,j q| by at most two terms, and both of them are O p r ´ q uniformly over i by thesame argument as before.The last statement of the proposition is an immediate corollary for the case T “ Id almost surely.Note that the B l,r ’s are not continuous functions, but we can still deﬁne their || ¨ || norms as all ofthem are uniformly bounded functions on r , s . The following corollary is a consequence of Proposition1 and the fact that ş | φ p u q| du P p , . Corollary 1.

Under the assumptions of Proposition 1, we have r F i,d p t q “ r F i p t q ` C ,r p t q for all t P r , s almost surely for each i ě , where || C ,r || “ O p r ´ q almost surely uniformly over i . Further, F d p t q “ F φ p t q ` C ,r p t q for all t P r , s , where || C ,r || “ O p r ´ q . Lemma 2.

Assume that ş | φ p u q| ´ (cid:15) du ă 8 for some (cid:15) ą . Then, | F ´ φ p s q ´ F ´ φ p t q| ď C φ | t ´ s | (cid:15) {p ` (cid:15) q ,where C ` (cid:15)φ “ ş | φ p u q| ´ (cid:15) du . In other words, F ´ φ is α -H¨older continuous for α “ (cid:15) {p ` (cid:15) q . . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Proof of Lemma 2.

Note that the assumption in the statement of the lemma implies that φ ą r , s . This fact along with Zarecki’s theorem on theinverse of an absolutely continuous function (see, e.g., p. 271 in Natanson (1955)) applied to the function F φ yields that F ´ φ is absolutely continuous on r , s . Thus, F ´ φ p t q “ ş t r F φ p F ´ φ p u qqs ´ du . Now, usingH¨older’s inequality and some algebraic manipulations, we obtain | F ´ φ p s q ´ F ´ φ p t q| ď || φ || | t ´ s | { p ˆż | φ p u q| ´ q ` du ˙ { q . To complete the proof, choose q “ ` (cid:15) , which implies that p “ p ` (cid:15) q{ (cid:15) . Proposition 2.

Assume that the conditions of Proposition 1 and Lemma 2 hold. Let α “ (cid:15) {p ` (cid:15) q asin Lemma 2. Then, for each i ě ,(a) r F ´ i is α -H¨older continuous almost surely.(b) r F ´ i,d p t q “ r F ´ i p t q ` || T i || D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almostsurely uniformly over i .Proof of Proposition 2. (a) Using the deﬁnition of r F i , it follows that | r F ´ i p s q ´ r F ´ i p t q| “ | T i p F ´ φ p s qq ´ T i p F ´ φ p t qq| ď || T i || | F ´ φ p s q ´ F ´ φ p t q| ď || T i || C φ | s ´ t | α , where the last inequality follows from Lemma 2. This completes the proof of part (a).(b) As mentioned earlier, r F i,d is a c`adl`ag step function with maximum jump discontinuities given by A i,r .Thus, if t P p r F i,d p t j q , r F i,d p t j ` qs for any 1 ď j ď r ´

1, it follows that r F i,d p r F ´ i,d p t qq “ r F i,d p t j ` q “ t ` q i,j,r p t q ,where q i,j,r p t q “ r F i,d p t j ` q ´ t . So, | q i,j,r p t q| ď r F i,d p t j ` q ´ r F i,d p t j q ď A i,r , where A i,r is the maximumstep size of r F i,d deﬁned earlier. Now, from arguments similar to those used in Proposition 1, it followsthat A i,r “ O p r ´ q uniformly in i . Thus, r F i,d p r F ´ i,d p t qq “ t ` Q i,r p t q for all t P r , s almost surely, where || Q r || “ O p r ´ q almost surely uniformly over i .From Proposition 1, we know that r F i,d p s q “ r F i p s q ` C ,r p s q for all s P r , s almost surely, where || C ,r || “ O p r ´ q almost surely uniformly over i . Letting s “ r F ´ i,d p t q , we now have t ` Q r p t q “ r F i p r F ´ i,d p t qq ` C ,r p r F ´ i,d p t qq for all t almost surely. Re-arranging terms, we obtain r F ´ i,d p t q “ r F ´ i p t ` Q ,r p t qq for all t P r , s almost surely, where Q ,r p t q “ Q r p t q ´ C ,r p r F ´ i,d p t qq . Thus, || Q ,r || “ O p r ´ q almostsurely uniformly over i . Now, using part (a), we can conclude that r F ´ i,d p t q “ r F ´ i p t q ` || T i || D ,r p t q for all t P r , s almost surely, where D ,r p t q “ C φ | Q ,r p t q| α satisﬁes || D ,r || “ O p r ´ α q almost surely uniformlyover i . Proof of Theorem 4. (a) Note that p F ˚ d p t q “ n ´ n ÿ i “ r F ´ i,d p t q “ n ´ n ÿ i “ t r F ´ i p t q ` || T i || D ,r p t qu “ p F ´ p t q ` ˜ n ´ n ÿ i “ || T i || D ,r p t q ¸ “ p F ´ p t q ` D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely since || D ,r || “ O p r ´ α q almostsurely and n ´ ř ni “ || T i || “ E p|| T || q ` o p q almost surely. Thus, it follows from Theorem 2.18 inVillani (2003) that d W p p F d , F φ q “ || p F ˚ d ´ F ´ φ || ď || p F ´ ´ F ´ φ || ` || D ,r || ď d W p p F , F φ q ` O p r ´ α q almost surely. Combining the above statement with part (a) of Theorem 2 and 3 completes the proof ofpart (a) of Theorem 4.(b) Next, note that p T ˚ i,d p t q “ n ´ ÿ l “ r F ´ l,d p r F i,d p t qq “ n ´ n ÿ l “ ! r F ´ l p r F i,d p t qq ` || T i || D ,r p r F i,d p t qq ) . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations “ n ´ n ÿ l “ r F ´ l p r F i p t q ` C ,r p t qq ` n ´ n ÿ i “ || T i || D ,r p r F i,d p t qq“ n ´ n ÿ l “ ” r F ´ l p r F i p t qq ` ! r F ´ l p r F i p t q ` C ,r p t qq ´ r F ´ l p r F i p t qq )ı ` n ´ n ÿ i “ || T i || D ,r p r F i,d p t qq“ p T ´ i p t q ` n ´ n ÿ l “ ! r F ´ l p r F i p t q ` C ,r p t qq ´ r F ´ l p r F i p t qq ) ` n ´ n ÿ i “ || T i || D ,r p r F i,d p t qq , for all t P r , s almost surely. By part (a) of Proposition 2, we have |t r F ´ l p r F i p t q` C ,r p t qq´ r F ´ l p r F i p t qqu| ď|| T i || D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely uniformly over i . Thus, sup t Pr , s n ´ ř ni “ |t r F ´ l p r F i p t q ` C ,r p t qq ´ r F ´ l p r F i p t qqu| ď t E p|| T || q ` o p qu O p r ´ α q almostsurely. Similar arguments yield sup t Pr , s n ´ ř ni “ || T i || | D ,r p r F i,d p t qq| ď t E p|| T || q ` o p qu O p r ´ α q al-most surely. Thus, p T ˚ i,d p t q “ p T ´ i p t q ` D ,r p t q , (5)for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely uniformly over i . Consequently, || p T ˚ i,d ´ T ´ i || ď || p T ´ i ´ T ´ i || ` O p r ´ α q almost surely, where the O p q term is uniform over i . This along with part (b) of Theorem 2 shows that || p T ˚ i,d ´ T ´ i || Ñ n Ñ 8 almost surely for all i ě

1. Equation (5) implies that ? n p p T ˚ i,d ´ T ´ i q “? n p p T ´ i ´ T ´ i q ` O p? nr ´ α q in L r , s . This in conjunction with part (b) of Theorem 3 proves that ? n p p T ˚ i,d ´ T ´ i q has the same asymptotic distribution as ? n p p T ´ i ´ T ´ i q in the L r , s topology.Next we consider p T i,d p t q “ r F ´ i,d p p F d p t qq “ r F ´ i p p F d p t qq ` || T i || D ,r p p F d p t qq for all t P r , s almostsurely (from part (b) of Proposition 2). Note that p F d p t q “ t n ´ ř nl “ r F ´ l,d u ´ p t q “ t G n ` D ,r u ´ p t q , where G n p s q “ n ´ ř nl “ r F ´ l p s q and D ,r p s q “ n ´ ř nl “ || T l || D ,r p s q . Thus, || D ,r || “ O p r ´ α q . Also notethat G n is a strictly increasing homeomorphism on r , s . Deﬁne r G n,r “ G n ` D ,r “ n ´ ř nl “ r F ´ l,d sothat r G n,r is an increasing function (not necessarily strictly increasing) from r , s onto r , s . In fact,since each r F ´ l,d is left continuous and has right limits (being the generalized inverse of the c`adl`ag function r F l,d ), r G n,r is also left continuous and has right limits.If t P p r G n,r p v q , r G n,r p v `qs for some v P r , s with r G n,r p v `q ą r G n,r p v q , then r G n,r p p F d p t qq “ r G n,r p r G ´ n,r p t qq “ r G n,r p v q “ t ` p r G n,r p v q ´ t q . Now, | r G n,r p v q ´ t | ď | r G n,r p v `q ´ r G n,r p v q| “ | G n p v `q ´ G n p v q ` D ,r p v `q ´ D ,r p v q| “ | D ,r p v `q ´ D ,r p v q| “ O p r ´ α q uniformly in t almost surely, where the penultimate equalityfollows from the continuity of G n . So, in these cases, G n p p F d p t qq “ r G n,r p p F d p t qq ´ D ,r p p F d p t qq “ t ` O p r ´ α q uniformly in t almost surely, i.e., t “ G n p p F d p t qq ` O p r ´ α qq uniformly in t almost surely.Next, suppose that for some v ă v , we have r G n,r p v q “ r G n,r p v q , r G n,r p v q ă r G n,r p v q for v ă v and r G n,r p v q ą r G n,r p v q for v ą v . If t “ r G n,r p v q “ r G n,r p v q , then r G n,r p p F d p t qq “ t if v is a continuity pointof r G n,r . If not, then this is already taken care of in the previous paragraph. In the former case, we have t “ G n p p F d p t qq ` O p r ´ α q uniformly over t almost surely.Finally, if t is a point of both continuity and strict increment of r G n,r , then r G n,r p p F d p t qq “ t as well,which implies that t “ G n p p F d p t qq ` O p r ´ α q uniformly over t almost surely. Thus, all possibilities areexhausted. Let us denote the O p r ´ α q term by D ,r p¨q .Now note that G ´ n “ p n ´ ř nl “ r F ´ l q ´ “ p n ´ ř nl “ T l ˝ F ´ φ q ´ “ F φ ˝ T ´ . Thus, it follows from ourwork above that p F d p t q “ F φ t T ´ p t ´ D ,r p t qqu . Recall that r F ´ i “ T i ˝ F ´ φ and that p T i,d p t q “ r F ´ i p p F d p t qq`|| T i || D ,r p p F d p t qq for all t P r , s almost surely as obtained earlier. Since p F d p t q “ F φ t T ´ p t ´ D ,r p t qqu , itfollows from the decomposition of p T i,d p t q that p T i,d p t q “ T i t T ´ p t ´ D ,r p t qqu ` || T i || D ,r p p F d p t qq for all t Pr , s almost surely. Since inf t Pr , s T p t q ě δ ą

0, it follows that inf t Pr , s T p t q ě n ´ ř nl “ inf t Pr , s T l p t q ě . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations δ ą

0. So, by Taylor expansion, we have T i t T ´ p t ´ D ,r p t qqu “ T i p T ´ p t qq ` || T i || D ,r p t q for all t P r , s almost surely, where || D ,r || “ O p r ´ α q almost surely, where the O p q term is uniform over i .Combining the above ﬁndings, we arrive at p T i,d p t q “ r F ´ i p p F d p t qq ` || T i || D ,r p p F d p t qq “ r F ´ i p G ´ n p t q ` D ,r p t qq ` || T i || D ,r p p F d p t qq“ T i p T ´ p t qq ` || T i || D ,r p t q ` || T i || D ,r p p F d p t qq , where the last equality follows from the discussion in the previous paragraph. Since || D ,r || “ O p r ´ α q almost surely uniformly over i , we obtain p T i,d p t q “ p T i p t q ` || T i || D ,r p t q for all t P r , s almost surely, where || D r, || “ O p r ´ α q almost surely uniformly over i . Consequently, || p T i,d ´ T i || ď || p T i ´ T i || ` O p q r ´ α , almost surely. Combined with part (b) of Theorem 2, this shows that || p T i,d ´ T i || Ñ n Ñ 8 almostsurely for all i ě

1. Equation (6) implies that ? n p p T i,d ´ T i q “ ? n p p T i ´ T i q ` O p? nr ´ α q in L r , s . This inconjunction with part (b) of Theorem 3 proves that ? n p p T i,d ´ T i q has the same asymptotic distributionas ? n p p T i ´ T i q in the L r , s topology. This completes the proof of part (b) of Theorem 4.(c) Next we register the warped functional observations. As mentioned earlier, since the warped observa-tions are only recorded over a discrete grid, the registration algorithm in the fully observed case will notwork. So, as a pre-processing step, we need to ﬁrst smooth the warped discrete observations. We do thisby using the Nadaraya-Watson kernel regression estimator as follows. Let k p¨q be any kernel supportedon r´ , s and choose a bandwidth parameter h ą

0. Then, the smooth version of p X i,d is given by X : i p t q “ ř rj “ k ´ t ´ t j h ¯ r X i p t j q ř rj “ k ´ t ´ t j h ¯ “ ξ i ř rj “ k ´ t ´ t j h ¯ φ p T ´ i p t j qq ř rj “ k ´ t ´ t j h ¯ , t P r , s . Now, note that | X : i p t q ´ r X i p t q| “ ˇˇˇˇˇˇ ξ i ř rj “ k ´ t ´ t j h ¯ t φ p T ´ i p t j qq ´ φ p T ´ i p t qqu ř rj “ k ´ t ´ t j h ¯ ˇˇˇˇˇˇ ď || φ || δ ´ | ξ i | ř rj “ k ´ t ´ t j h ¯ | t j ´ t | ř rj “ k ´ t ´ t j h ¯ ď c | ξ i | h, for all t P r , s almost surely, where c is a constant not depending on i and t . The ﬁrst inequality abovefollows from arguments similar to those used in the proof of Theorem 1. The second inequality followsform the fact that k p¨q is supported on r´ , s so that only those j ’s in the numerator for which | t j ´ t | ď h will contribute to the sum. Thus, || X : i ´ r X i || ď c | ξ i | h almost surely.We register the warped discrete observation r X i,d by deﬁning p X ˚ i “ X : i ˝ p T i,d for each 1 ď i ď n .Observe that | p X ˚ i p t q ´ p X i p t q| ď | p X ˚ i p t q ´ r X i p p T i,d p t qq| ` | r X i p p T i,d p t qq ´ p X i p t q|ď || X : i ´ r X i || ` | ξ i | | φ p T ´ i p p T i,d p t qqq ´ φ p T ´ i p p T i p t qqq|ď c | ξ i | h ` | ξ i | | φ p T ´ i p p T i p t q ` || T i || D ,r p t qqq ´ φ p T ´ i p p T i p t qqq|ď c | ξ i | h ` | ξ i | || T i || | D ,r p t q| || φ || δ ´ ď O p q| ξ i |p h ` || T i || r ´ α q (6)for all t P r , s almost surely, where the O p q term is uniform in i and t . The last two inequalities abovefollow from a ﬁrst order Taylor expansion and the fact that || D ,r || “ O p r ´ α q almost surely uniformlyover i . Hence, || p X ˚ i ´ p X i || “ O p q| ξ i ||p h ` || T i || r ´ α q . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations almost surely. In conjunction with part (c) of Theorem 2, this shows that || p X ˚ i ´ X i || Ñ n Ñ 8 almost surely for all i ě

1. Equation (6) implies that ? n p p X ˚ i ´ X i q “ ? n p p X i ´ X i q ` O p? n p h ` r ´ α qq in L r , s . Invoking part (c) of Theorem 3 thus establishes that ? n p p X ˚ i ´ X i q has the same asymptoticdistribution as ? n p p X i ´ X i q in the L r , s topology. This completes the proof of part (c) of Theorem 4.(d) Next, deﬁne the random measure induced by p X ˚ i as p F ˚ i p t q “ ÿ j P I t | p X ˚ i p t j ` q ´ p X ˚ i p t j q| N r ´ ÿ j “ | p X ˚ i p t j ` q ´ p X ˚ i p t j q|“ ÿ j P I t | p X : i p p T i,d p t j ` qq ´ p X : i p p T i,d p t j qq| N r ´ ÿ j “ | p X : i p p T i,d p t j ` qq ´ p X : i p p T i,d p t j qq|“ $&% ÿ j P I t | r X i p p T i,d p t j ` qq ´ r X i p p T i,d p t j qq| ` O p h q| ξ i | ,.- N r ´ ÿ j “ | r X i p p T i,d p t j ` qq ´ r X i p p T i,d p t j qq| ` O p h q| ξ i | + for all t P r , s almost surely, where the O p q term is uniform in i and t , and the last equality followsfrom the fact that || X : i ´ r X i || ď c | ξ i | h almost surely. Also note that by deﬁnition of r X i , the term | ξ i | cancels from the numerator and the denominator.Using the fact that p T i,d p t q “ p T i p t q`|| T i || D ,r p t q with || D ,r || “ O p r ´ α q almost surely, and argumentssimilar to those used in the proof of Proposition 1, one obtains p F ˚ i p t q “ p F p t q ` O p qp h ` || T i || r ´ α q for all t P r , s almost surely, where the O p q term is uniform in i and t almost surely. Now, using Lemma2 and arguments similar to those used in the proof of part (b) of Proposition 2, we have p p F ˚ i q ´ p t q “ p F ´ p t q ` O p q r ´ α p h ` || T i || r ´ α q for all t P r , s almost surely, where the O p q term is uniform in i and t almost surely. Thus, d W p p F ˚ i , F φ q “ ||p p F ˚ i q ´ ´ F ´ φ || ď || p F ´ ´ F ´ φ || ` O p q r ´ α p h ` r ´ α q“ d W p p F , F φ q ` O p q r ´ α p h ` r ´ α q almost surely. Combining the above statement with part (d) of Theorems 2 and 3 completes the proofof part (d) of Theorem 4.(e) Next, deﬁne X r ˚ “ n ´ ř ni “ p X ˚ i . Since || p X ˚ i ´ p X i || “ O p q| ξ i ||p h ` || T i || r ´ α q almost surely, itfollows that ||p X r ˚ ´ µ q ´ p X r ´ µ q|| ď n ´ n ÿ i “ || p X ˚ i ´ p X i || ď O p qt h ` r ´ α n ´ n ÿ i “ || T i || uď O p qp h ` r ´ α q (7)almost surely since E p|| T || q ă 8 . Along with part (e) of Theorem 2, this shows that || X r ˚ ´ µ || Ñ n Ñ 8 almost surely. Equation (7) implies that ? n p X r ˚ ´ µ q “ ? n p X r ´ µ q ` O p? n p h ` r ´ α qq in L r , s . So by part (e) of Theorem 3 we see that ? n p X r ˚ ´ µ q has the same asymptotic distribution as ? n p X r ´ µ q in the L r , s topology, and the proof of part (e) of Theorem 4 is complete.(f) Next, we consider the empirical covariance operator of the p X ˚ i ’s which we will denote by x K r ˚ “ n ´ ř ni “ p p X ˚ i ´ X r ˚ q b p p X ˚ i ´ X r ˚ q . Recall S “ n ´ ř ni “ p p X i ´ µ q b p p X i ´ µ q from the proof of part (f)of Theorem 3. Now, some straightforward manipulations yield x K r ˚ “ S ` n ´ n ÿ i “ p p X ˚ i ´ p X i q b p p X ˚ i ´ p X i q ´ p X r ˚ ´ µ q b p X r ˚ ´ µ q . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ` n ´ n ÿ i “ tp p X ˚ i ´ p X i q b p p X i ´ µ q ` p p X i ´ µ q b p p X ˚ i ´ p X i qu“ S ` W ´ W ` W , say . Note that ||| W ||| ď n ´ ř ni “ || p X ˚ i ´ p X i || ď O p qt h n ´ ř ni “ | ξ i | ` r ´ α n ´ ř ni “ || T i || u “ O p qp h ` r ´ α q almost surely. Next, from the previous paragraph, it follows that ||| W ||| ď || X r ˚ ´ µ || ď O p qp h ` r ´ α q ` || X r ´ µ || . Moreover, ||| W ||| ď n ´ ř ni “ || p X ˚ i ´ p X i || || p X i ´ µ || ď O p q n ´ ř ni “ t h | ξ i | `|| T i || r ´ α u|| p X i ´ µ || almost surely. Observe that n ´ n ÿ i “ | ξ i | || p X i ´ µ || “ n ´ n ÿ i “ | ξ i | || ξ i φ ˝ T ´ ´ E p ξ q φ || ď n ´ n ÿ i “ | ξ i | | ξ i ´ E p ξ q| || φ ˝ T ´ || ` n ´ n ÿ i “ | ξ i | | E p ξ q| || φ ˝ T ´ ´ φ || . Since || φ ˝ T ´ ´ φ || Ñ O p q almost surely, andthe second term is o p q almost surely. Similar arguments show that n ´ ř ni “ || T i || || p X i ´ µ || “ O p q almost surely. Thus, ||| W ||| ď O p qp h ` r ´ α q almost surely. Also, S in the proof of part (f) of Theorem3 satsiﬁes ||| S ||| “ O P p n ´ q . Combining the above facts and using the decomposition of x K r in the proofof part (f) of Theorem 3, it follows that x K r ˚ “ S ` O p qp h ` r ´ α ` || X r ´ µ || q “ x K r ` O p qp h ` r ´ α ` || X r ´ µ || q (8)almost surely. This along with part (f) of Theorem 2 shows that ||| x K r ˚ ´ K ||| Ñ n Ñ 8 almostsurely. By part (e) of Theorem 3, it follows that ? n || X r ´ µ || “ O P p q as n Ñ 8 . So, equation (8)implies that ? n p x K r ˚ ´ K q “ ? n p x K r ´ K q ` O p? n p h ` r ´ α qq in L r , s . This in conjunction with part(f) of Theorem 3 proves that ? n p x K r ˚ ´ K q has the same asymptotic distribution as ? n p x K r ´ K q inthe Hilbert-Schmidt topology.For the convergence of the empirical covariance kernel p K r ˚ p s, t q “ n ´ ř ni “ r p X ˚ i p s q ´ X r ˚ p s qsr p X ˚ i p t q ´ X r ˚ p t qs , we follow the same decomposition as above for the case of the operator. Noting the all thebounds used for that proof remain valid in the sup-norm and using the same arguments, we arrive that p K r ˚ p s, t q “ p K r p s, t q ` O p qp h ` r ´ α ` || X r ´ µ || q (9)for all s, t P r , s almost surely, where the O p q term is uniform in s, t almost surely. This along withpart (f) of Theorem 2 shows that || p K r ˚ ´ K || Ñ n Ñ 8 almost surely. Equation (9) implies that t? n p p K r ˚ p s, t q´ K p s, t qq : s, t P r , su “ t? n p p K r p s, t q´ K p s, t qq : s, t P r , su` O p? n p h ` r ´ α qq in L r , s with the O p q term being uniform in s, t . This in conjunction with part (f) of Theorem 3 proves that t? n p p K r ˚ p s, t q ´ K p s, t qq : s, t P r , su has the same asymptotic distribution as t? n p p K r p s, t q ´ K p s, t qq : s, t P r , su in the L pr , s q topology.To prove the strong consistency and the weak convergence of the estimated eigenfunction, we will useperturbation bounds for compact operators (see, e.g., Ch. 5 of Hsing and Eubank (2015)). The leadingeigenfunction p φ ˚ of x K r ˚ satisﬁes the inequality || p φ ˚ ´ φ || ď ? λ ´ ||| x K r ˚ ´ K ||| Ñ n Ñ 8 almostsurely. Further, Theorem 5.1.8 of Hsing and Eubank (2015), speciﬁcally equation (5.27), implies that ? n p p φ ˚ ´ φ q has the same asymptotic distribution (in L r , s ) as that of S ? n p x K r ˚ ´ K q φ , where, inour setup, S “ ´ λ ´ p I ´ φ b φ q with λ “ V ar p ξ q being the leading eigenvalue of K , and I being theidentity operator. Thus, from the results already establishes, it follows that the asymptotic distributionof ? n p p φ ˚ ´ φ q is that of ´ λ ´ p I ´ φ b φ q? n p x K r ´ K q φ . Using the expression of the asymptoticdistribution of ? n p x K r ´ K q obtained in part (f) of Theorem 3 and some simple calculations, it followsthat the asymptotic distribution of ? n p p φ ˚ ´ φ q is that of Y ˆ φ ´ x Y ˆ φ , φ y φ , which is the same as inTheorem 3.The proof of the strong consistency and the weak convergence of p ξ i ˚ follows in direct analogy to thatof p ξ i upon using part (c) and the above facts. The proof of part (f) of Theorem 4 is now complete. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Proof of Theorem 5.

First observe that | r F i,w p t q ´ r F i p t q| ď ˇˇˇˇˇ ş t | p X p q i,w p u q| du ş | p X p q i,w p u q| du ´ ş t | p X p q i,w p u q| du ş | r X i p u q| du ˇˇˇˇˇ ` ˇˇˇˇˇ ş t | p X p q i,w p u q| du ş | r X i p u q| du ´ ş t | r X i p u q| du ş | r X i p u q| du ˇˇˇˇˇ ď ş | p X p q i,w p u q ´ r X i p u q| du ş | r X i p u q| du ď || p X p q i,w ´ r X i || | ξ i | ş | φ p u q| du “ d φ | ξ i | ´ A i,r , say . ñ || r F i,w ´ r F i || ď d φ | ξ i | ´ A i,r . (10)Since the term A i,r will be key for our proof, we will ﬁrst bound E t A i,r u . To achieve this, we willﬁrst provide bounds on E t A i,r | ξ i , T i u using standard tools from non-parametric regression. So, we willhave to estimate the MSE for the regression problem Y ij “ ξ i φ p T ´ i p t j qq ` (cid:15) ij and integrate this MSEover u P r , s , when ξ i and T i are ﬁxed. The expression for the MSE in the deterministic design caseis the same as the conditional MSE (given design points) in the random design case with the designdistribution being uniform on r , s . Next, observe that V ar p p X i,w p u q| ξ i , T i q does not depend on ξ i and T i and is thus uniform over i (since the (cid:15) ij ’s are i.i.d.). For u P r h , ´ h s , the expression of thisvariance is given in p. 137 in Wand and Jones (1995) and equals O pp rh q ´ q , where the O p q termdepends on k , is bounded and is uniform over u P r h , ´ h s . Next, we have to take into account theboundary points. Let u “ αh for some α P r , q . It follows from a similar analysis that even in this case, V ar p p X i,w p u q| ξ i , T i q “ O pp rh q ´ q , where the O p q term is integrable over α P r , q (see, e.g. pp. 244-247in Schimek (2000)). Similar estimates also hold for t P r ´ h , s , say t “ ´ αh . Hence, we get that V ar p p X i,w p u q| ξ i , T i q “ O pp rh q ´ q for all u P r , s with the O p q term being integrable over u P r , s .Next we consider the bias. In our case the degree of the ﬁtted polynomial is one more than the degreeof derivative estimated. Thus, applying Taylor’s formula and using the expressions in Thm. 9.1 andpp. 244-247 in Schimek (2000), we have | Bias p p X i,w p u q| ξ i , T i q| “ || r X p q i || O p h q ` || r X p q i || o p h q for all u P r , s . Here, the O p q and o p q terms are non-random and are integrable in u P r , s . So, using themoment assumptions on the sup-norm of the derivatives of T , the independence of the ξ i ’s and the T i ’salong with the assumption that inf t Pr , s T p t q ě δ ą

0, it follows that E t A i,r u “ O p h q ` O pp rh q ´ q (11)where the O p q terms are bounded and do not depend on i (the r X i ’s are i.i.d). This also implies (usingMarkov’s inequality) that n ´ n ÿ i “ A i,r “ O P p h ` p rh q ´ q (12)We will now proceed with the rest of the proof. First, let u i,t “ r F ´ i,w p t q . From (10), it follows that r F i p u i,t q “ t ´ r A i,r p t q , where || r A i,r || ď d φ | ξ i | ´ A i,r . Thus, using part (a) of Proposition 2, it follows that | r F ´ i,w p t q´ r F ´ i p t q| “ | u i,t ´ r F ´ i p t q| “ r F ´ i p t ´ r A i,r p t qq´ r F ´ i p t q| ď || T i || c φ | ξ i | ´ α A αi,r for a constant c φ . So, || r F ´ i,w ´ r F ´ i || ď || T i || c φ | ξ i | ´ α A αi,r . Thus, p F ´ e “ n ´ ř ni “ r F ´ i,w “ n ´ ř ni “ r F ´ i ` r B r “ p F ´ ` r B r , where || r B r || ď c φ n ´ ř ni “ || T i || | ξ i | ´ α A αi,r . Deﬁne R r “ n ´ ř ni “ || T i || | ξ i | ´ α A αi,r . By H¨older’s inequality,the law of large numbers, independence of T i ’s and ξ i ’s, and (12), we get that R r ď « n ´ n ÿ i “ || T i || {p ´ α q8 | ξ i | ´ α {p ´ α q ﬀ ´ α { « n ´ n ÿ i “ A i,r ﬀ α { ñ R r “ O P p h α ` p rh q ´ α { q (13)(a) Since d W p p F e , F φ q “ || p F ´ e ´ F ´ φ || ď || p F ´ e ´ p F ´ || ` || p F ´ ´ F ´ φ || ď R r ` d W p p F , F φ q , theproof follows using part (a) of Theorem 3 and (13). . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (b) Note that p T ´ i,e p t q “ p F ´ e p r F i,w p t qq “ p F ´ p r F i,w p t qq ` r B r p r F i,w p t qq using statements proved earlier. Now,arguments in the proof of part (b) of Theorem 3 along with (10) yield p F ´ p r F i,w p t qq “ p T ´ i p t q ` r C r p t q ,where || C r || ď const.R r . Thus, r T ´ i,e “ r T ´ i ` r C ,r , where || r C ,r || ď const.R r . The proof of the ﬁrststatement in part (b) of this theorem now follows using part (b) of Theorem 3 and (13).Next consider p T i,e p t q “ r F ´ i,w p p F e p t qq “ r F ´ i p p F e p t qq ` r C ,r,i p t q , where || r C ,r,i || ď || T i || c φ | ξ i | ´ α A αi,r from statements proved earlier. Note that if p F e p t q “ v then t “ p F ´ e p v q “ p F ´ p v q ` r C ,r p v q , where || r C ,r || ď R r . So, p F e p t q “ v “ p F p t ´ r C ,r p v qq “ F φ p T ´ p t ´ r C ,r p v qqq . Noting that r F ´ i “ T i ˝ F ´ φ , weget that r F ´ i p p F e p t qq “ T i p T ´ p t ´ r C ,r p v qqq “ T i p T ´ p t qq ` || T i || r C ,r p v q “ r F ´ i p p F p t qq ` || T i || r C ,r p v q “ p T i p t q ` || T i || r C ,r p v q , where || r C ,r || ď R r . This follows from arguments similar to those used earlierusing the smoothness of T and the assumption that inf t Pr , s T p t q ě δ ą

0. Thus, we ﬁnally have || p T i,e ´ p T i || ď const. t|| T i || R r ` || T i || | ξ i | ´ α A αi,r u . (14)The proof of the second statement of part (b) of this theorem is now completed via part (b) of Theorem3, (11) and (13).For proving part (c) of the theorem we will ﬁrst have to control E t|| p X i,w ´ r X i || | ξ i , T i u for each i . Recallthat p X i,w p t q “ r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q Y ij p s p t ; h q p s p t ; h q ´ p s p t ; h q , where k ,h p u q “ h ´ k p u { h q and p s l p t ; h q “ r ´ ř rj “ p t j ´ t q l k ,h p t j ´ t q for l “ , ,

2. Call thedenominator p f p t q , which is deterministic. We will ﬁrst analyse the term q Y i,w p t q which is deﬁned like p X i,w p t q but with r X i p t j q in place of Y ij . Deﬁne q Z i,w p t q “ p X i,w p t q ´ q Y i,w p t q .Using Taylor’s formula, we get that r X i p t j q “ r X i p t q ` p t j ´ t q r X i p t q ` ´ p t j ´ t q r X i p t q ` ´ p t j ´ t q r X p q i p r t i,j q , where r t i,j lies between t and t j . Plugging-in this expansion in the deﬁnition of q Y i,w p t q , wehave q Y i,w p t q “ r X i p t q ` r X i p t q p s p t ; h q ´ p s p t ; h q p s p t ; h q p s p t ; h q p s p t ; h q ´ p s p t ; h q` r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t qp t j ´ t q r X p q i p r t i,j q p s p t ; h q p s p t ; h q ´ p s p t ; h q“ r X i p t q ` Q i, p t ; h q ` Q i, p t ; h q , sayfor all t P r , s . Note that the term involving r X i p t q vanishes, which plays a crucial role in putting thelocal linear estimator at an advantage over other standard non-parametric regression estimators nearthe boundary of the data set. Thus, | p X i,w p t q ´ r X i p t q| ď | q Y i,w p t q ´ r X i p t q| ` | q Z i,w p t q| ď | Q i, p t ; h q| `| Q i, p t ; h q| ` | q Z i,w p t q| .By approximations of Riemann sums, we have p s l p t ; h q “ h l ş ´ u l k p u q du ` O pp rh q ´ q uniformly for t P r h , ´ h s . Also, for t P r , h q , say, t “ αh with α P r , q , we have p s l p t ; h q “ h l ş ´ α u l k p u q du ` O pp rh q ´ q uniformly for α P r , q . The same estimate also holds for t P p ´ h , s , say, t “ ´ αh . Deﬁne µ l,α “ ş ´ α u l k p u q du for l “ , ,

2. These estimates imply that for t P r h , ´ h s , wehave | Q i, p t ; h q| ď ´ || r X i || t h ş ´ u k p u q du ` O pp rh q ´ qu . Further, for boundary points, we have | Q i, p t ; h q| ď ´ || r X i || t h | B α | ` O pp rh q ´ qu for α P r , q , where B α “ r µ ,α ´ µ ,α µ ,α s{r µ ,α µ ,α ´ µ ,α s . In both case, the O p q terms are non-random (hence does not depend on i ) and uniform over choicesof t . Note that the leading term in the squared bias term obtainable from the previous bias expressionis an upper bound for the coeﬃcient of the squared bias term in the general result obtained in Thm. 3.3in Fan and Gijbels (1996). It can be shown using similar arguments that | Q i, p t ; h q| ď || r X p q i || o p h q ,where the o p q term is non-random and uniform over t P r , s . Note that for α “

1, which correspond to . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations t P r h , ´ h s , we have B α “ ş ´ u k p u q du by the symmetry of the kernel. Further, it can be shown thatthe denominator (which is positive by the Cauchy-Schwarz inequality) in the deﬁnition of B α is a strictlyincreasing function of α P r , s and hence its inﬁmum is achieved at α “

0, where it takes the value ş u k p u q du ş k p u q du ´p ş uk p u q du q “ : a ą k . Thus sup α Pr , s | B α | ď sup α Pr , s | µ ,α ´ µ ,α µ ,α |{ a ă 8 as the numerator is uniformlybounded in α . Hence, || q Y i,w ´ r X i || ď ´ || r X i || t h sup α Pr , s | B α | ` O pp rh q ´ qu ` || r X p q i || o p h q ď|| r X i || t O p h q ` O pp rh q ´ qu ` || r X p q i || o p h q , where the O p q and the o p q terms are non-random (andhence do not depend on i ).We next control E t|| q Z i,w || u . Observe that this does not depend on r X i and hence does not dependon i (the errors are i.i.d.). Now, E $&% sup t Pr , s ˇˇˇˇˇ r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q (cid:15) ij p s p t ; h q p s p t ; h q ´ p s p t ; h q ˇˇˇˇˇ ,.- ď E sup t Pr , s r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q (cid:15) ij r p s p t ; h q p s p t ; h q ´ p s p t ; h qs + ` E $&% r ÿ j ‰ j (cid:15) ij (cid:15) ij sup t Pr , s t p s p t ; h q ´ p s p t ; h qp t j ´ t qut p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t q k ,h p t j ´ t qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs ,.- ď M r ´ sup t Pr , s r r ÿ j “ t p s p t ; h q ´ p s p t ; h qp t j ´ t qu k ,h p t j ´ t qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs “ M p rh q ´ sup t Pr , s p s p t ; h q r s p t ; h q ` p s p t ; h q r s p t ; h q ´ p s p t ; h q p s p t ; h q r s p t ; h qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs . (15)The second term on the right hand side of the ﬁrst inequality vanishes due to the uncorrelatedness ofthe errors and the fact that the t j ’s are non-random. The bound for the ﬁrst term follows from the a.s.boundedness of the errors, say with bound M . Here, r s l p t ; h q “ r ´ ř rj “ p t j ´ t q l h ´ k tp t j ´ t q{ h u , whichis a deﬁnition similar to p s l p t ; h q but with a new “kernel” k . As earlier, by Riemann sum approximations,we have r s l p t ; h q “ h l ş ´ α u l k p u q du ` O pp rh q ´ q for α P r , s with the O p q term being uniform on t P r , s . Deﬁne ν l,α “ ş ´ α u l k p u q du . Then, p s p t ; h q r s p t ; h q ` p s p t ; h q r s p t ; h q ´ p s p t ; h q p s p t ; h q r s p t ; h qr p s p t ; h q p s p t ; h q ´ p s p t ; h qs “ µ ,α ν ,α ` µ ,α ν ,α ´ µ ,α µ ,α ν ,α r µ ,α µ ,α ´ µ ,α s ` O pp rh q ´ q “ C α ` O pp rh q ´ q , say , for all α P r , s , where the O p q term is uniform over t P r , s . Note that the expression of C α is the sameas the coeﬃcient of the variance term in the general result obtained in Thm. 3.3 in Fan and Gijbels (1996)(with necessary adaptations). Using (15), it now follows that E t|| q Z i,w || u ď M t sup α Pr , s C α up rh q ´ ` o pp rh q ´ q “ O pp rh q ´ q . Hence, using the assumptions in the theorem and the bounds on || q Y i,w ´ r X i || obtained earlier as well as the previous bound, it follows that E t|| p X i,w ´ r X i || u “ O p h q ` O pp rh q ´ q , (16)where the O p q terms are bounded and do not depend in i . Thus, using Markov’s inequality, we have n ´ n ÿ i “ || p X i,w ´ r X i || “ O P t h ` p rh q ´ { u . (17) . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations (c) Recall that p X ˚ i,e p t q “ p X i,w p p T i,e p t qq . Thus, using (14) we have | p X ˚ i,e p t q ´ p X i p t q| ď | p X i,w p p T i,e p t qq ´ r X i p p T i,e p t qq| ` | r X i p p T i,e p t qq ´ r X i p p T i p t qq|ď || p X i,w ´ r X i || ` || r X i || || p T i,e ´ p T i || ñ || p X ˚ i,e ´ p X i || ď || p X i,w ´ r X i || ` const. | ξ i | || T i || t R r ` | ξ i | ´ α A αi,r u . (18)The proof of part (c) of this theorem now follows from (11), (13), (16) and part (c) of Theorem 3.(d) Observe that by (18), we have || X e ˚ ´ n ´ n ÿ i “ p X i || ď n ´ n ÿ i “ || p X i,w ´ r X i || ` const. R r ˜ n ´ n ÿ i “ | ξ i | || T i || ¸ ` n ´ n ÿ i “ | ξ i | ´ α || T i || A αi,r + . The third term on the right hand side can be bounded using H¨older’s inequality and (12) as earlier. Thebounds on the ﬁrst two terms are given by (17) and (13), respectively. The proof of this part of thetheorem is now completed upon using these bounds along with part (e) of Theorem 3.(e) For the proof of this part of theorem, we will use a decomposition of x K e ˚ similar to that of x K r in the proof of part (f) of Theorem 3. In the same notation, we obtain the following boundson W , W and W . First, note that ||| W ||| ď n ´ ř ni “ || p X ˚ i,e ´ p X i || ď n ´ ř ni “ || p X i,w ´ r X i || ` const.n ´ ř ni “ ξ i || T i || t R r ` | ξ i | ´ α A αi,r u . Applying H¨older’s inequality and using (12), (13) and (16),we get that ||| W ||| “ O P t h `p rh q ´ ` h α `p rh q ´ α u . Next, using part (d) of this theorem and part (e)of Theorem 3, it follows that ||| W ||| ď || X e ˚ ´ µ || ď || X e ˚ ´ n ´ ř ni “ p X i || ` || n ´ ř ni “ p X i ´ µ || “ O P t h α `p rh q ´ α ` h `p rh q ´ ` n ´ u . In a similar manner, ||| W ||| ď n ´ ř ni “ || p X ˚ i,e ´ p X i || || p X i ´ µ || “ O P t h ` p rh q ´ { ` h α ` p rh q ´ α { u by the Cauchy-Schwarz inequality and the bounds obtained earlier.So, using part (f) of Theorem 3, we have ||| x K e ˚ ´ K ||| “ O P t h ` p rh q ´ { ` h α ` p rh q ´ α { ` n ´ { u .The bounds for the leading eigenvalue and eigenfunction follow directly by standard bounds in the theoryof perturbation of operators. Proof of Theorem 6.

First assume that µ ‰

0. Then, deﬁne G p t q “ ş t | γ ´ µ p u q| du { ş | γ ´ µ p u q| du “ ş t | µ p u q| du { ş | µ p u q| du and r G i p t q “ G p T ´ i p t qq for t P r , s and i “ , , . . . , n . Some algebraic manipu-lations yield | F i p t q ´ G p t q|ď ş t | Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ` ˇˇˇˇˇ ş t | γ ´ µ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du ´ ş t | γ ´ µ p u q| du ş | γ ´ µ p u q| du ˇˇˇˇˇ ď ş | Y i φ p u q ` ηY i φ p u q| du ş | γ ´ µ p u q ` Y i φ p u q ` ηY i φ p u q| du “ Z i . Thus, || F i ´ G || ď Z i almost surely for each i . So || r F i ´ r G i || “ sup t Pr , s | F i p T ´ i p t qq ´ G p T ´ i p t qq| “ sup t Pr , s | F i p t q ´ G p t q| ď Z i , where the last equality holds because T i is a bijection on r , s .Next, let c i “ F ´ i p t q and c “ G ´ p t q . So, t “ F i p c i q “ G p c q . Also, G p c q ´ G p c i q “ G p c q ´ F i p c i q ` F i p c i q ´ G p c i q “ F i p c i q ´ G p c i q so that | G p c q ´ G p c i q| ď || F i ´ G || ď Z i . The conditions of the theoremand arguments as in Lemma 2 earlier show that G ´ is α -H¨older continuous for α “ (cid:15) {p ` (cid:15) q . Thus, fora ﬁnite, positive constant C µ , we have | F ´ i p t q ´ G ´ p t q| “ | c i ´ c | “ | G ´ p G p c i qq ´ G ´ p G p c qq| ď C µ | G p c i q ´ G p c q| α ď C µ Z αi . Thus, || F ´ i ´ G ´ || ď C µ Z αi almost surely. Consequently, || r F ´ i ´ r G ´ i || “ sup t Pr , s | T i p F ´ i p t qq ´ T i p G ´ p t qq| ď || T i || || F ´ i ´ G ´ || ď C µ || T i || || Z αi almost surely. Further, || p F ´ ´ p G ´ || ď n n ÿ i “ || r F ´ i ´ r G ´ i || ď C µ n n ÿ i “ || T i || Z αi ď C µ E p|| T || q E p Z α q , . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations as n Ñ 8 almost surely. Here, the last inequality follows from the moment assumptions in the theorem,the Cauchy-Schwarz inequality, the strong law of large numbers and the fact that the Y il ’s (and hencethe X i ’s) are independent of the T i ’s. Thus, | p T ´ i p t q ´ T i p t q| “ | p F ´ p F i p T ´ i p t qqq ´ T ´ i p t q|ď | p F ´ p F i p T ´ i p t qqq ´ p G ´ p F i p T ´ i p t qqq| ` | p G ´ p F i p T ´ i p t qqq ´ p G ´ p G p T ´ i p t qqq|` | p G ´ p G p T ´ i p t qqq ´ T ´ i p t q|ď || p F ´ ´ p G ´ || ` | T p G ´ p F i p T ´ i p t qqqq ´ T p G ´ p G p T ´ i p t qqqq|` | T p G ´ p G p T ´ i p t qqqq ´ T ´ i p t q|ď || p F ´ ´ p G ´ || ` || T || C µ | F i p T ´ i p t qq ´ G p T ´ i p t qq| α ` | T p T ´ i p t qq ´ T ´ i p t q|ď || p F ´ ´ p G ´ || ` C µ n ´ n ÿ j “ || T j || + || F i ´ G || ` || T ´ Id || ď const. (cid:32) E p Z α q ` Z i ` || T ´ Id || ( , ñ || p T ´ i ´ T ´ i || ď const. (cid:32) E p Z α q ` Z i ` || T ´ Id || ( as n Ñ 8 almost surely, where the constant term is uniform in i .Next, let t “ p F ´ p u q . Then, n ´ ř ni “ T i p F ´ i p u qq “ t . Let t ˚ “ n ´ ř ni “ T i p G ´ p u qq “ T p G ´ p u qq “ p G ´ p u q so that u “ p G p t ˚ q . Note that p F p t q ´ p G p t q “ p F p t q ´ p G p t ˚ q ` p G p t ˚ q ´ p G p t q “ p G p t ˚ q ´ p G p t q “ G p T ´ p t ˚ qq ´ G p T ´ p t qq . Thus, using the assumptions in the theorem and arguments similar to thoseused in the proof of part (b) of Theorem 2, we have | p F p t q ´ p G p t q| “ | G p T ´ p t ˚ qq ´ G p T ´ p t qq| ď || G || | T ´ p t ˚ q ´ T ´ p t q|ď || G || δ ´ | t ˚ ´ t |ď || G || δ ´ n ´ n ÿ i “ ˇˇ T i p F ´ i p u qq ´ T i p G ´ p u qq ˇˇ ď || G || δ ´ C µ n ´ n ÿ i “ || T i || Z αi ď const.E p|| T || q E p Z α qñ || p F ´ p G || ď const.E p Z α q as n Ñ 8 almost surely. Therefore, | p T i p t q ´ T i p t q| “ | r F ´ i p p F p t qq ´ T i p t q|ď | r F ´ i p p F p t qq ´ r G ´ i p p F p t qq| ` | r G ´ i p p F p t qq ´ r G ´ i p p G p t qq| ` | r G ´ i p p G p t qq ´ T i p t q|ď || r F ´ i ´ r G ´ i || ` | T i p G ´ p p F p t qq ´ T i p G ´ p p G p t qq| ` | T i p G ´ p p G p t qq ´ T i p t q|ď || r F ´ i ´ r G ´ i || ` || T i || C µ | p F p t q ´ p G p t q| α ` | T i p T ´ p t qq ´ T i p t q|ď || r F ´ i ´ r G ´ i || ` || T i || C µ || p F ´ p G || α ` || T i || || T ´ ´ Id || “ || r F ´ i ´ r G ´ i || ` || T i || C µ || p F ´ p G || α ` || T i || || T ´ Id || ď const. || T i || (cid:32) Z αi ` E α p Z α q ` || T ´ Id || ( ñ || p T i ´ T i || ď const. || T i || (cid:32) Z αi ` E α p Z α q ` || T ´ Id || ( as n Ñ 8 almost surely, where the constant term is uniform in i .Next, note that p X i “ r X i ˝ p T i “ X i ˝ T ´ i ˝ p T i “ µ ˝ T ´ i ˝ p T i ` γ Y i φ ˝ T ´ i ˝ p T i ` γ Y i φ ˝ T ´ i ˝ p T i . So, | p X i p t q ´ X i p t q| ď | µ p T ´ i p p T i p t qqq ´ µ p t q| ` γ | Y i | | φ p T ´ i p p T i p t qqq ´ φ p t q| . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ` γ | Y i | | φ p T ´ i p p T i p t qqq ´ φ p t q|ď | T ´ i p p T i p t qq ´ t | (cid:32) || µ || ` γ | Y i | || φ || ` γ | Y i | || φ || ( ñ || p X i ´ X i || ď || p T ´ i ´ T ´ i || (cid:32) || µ || ` γ | Y i | || φ || ` γ | Y i | || φ || ( ď O P p q (cid:32) E p Z α q ` Z i ` || T ´ Id || ( as n Ñ 8 almost surely, where the O P p q term is independent on n .Next, consider the case when µ “

0. Then, deﬁne G p t q “ ş t | φ p u q| du { ş | φ p u q| du . Some algebraicmanipulations yield | F i p t q ´ G p t q| “ ˇˇˇˇˇ ş t | Y i φ p u q ` ηY i φ p u q| du ş | Y i φ p u q ` ηY i φ p u q| du ´ ş t | φ p u q| du ş | φ p u q| du ˇˇˇˇˇ ď η ş | Y i φ p u q| du ş | Y i φ p u q ` ηY i φ p u q| du “ Z i . Similar arguments as in the case of µ ‰ Acknowledgements

We are grateful to Dr. Kristen Irwin (EPFL) for kindly sharing and discussing her

Triboleum data set.

References

Billingsley, P. (1968).

Convergence of probability measures . John Wiley & Sons, Inc., New York-London-Sydney. MR0233396

Bosq, D. (2000).

Linear processes in function spaces . Lecture Notes in Statistics . Springer-Verlag,New York Theory and applications. MR1783138

Claeskens, G. , Silverman, B. W. and

Slaets, L. (2010). A multiresolution approach to time warpingachieved by a Bayesian prior-posterior transfer ﬁtting strategy.

J. R. Stat. Soc. Ser. B Stat. Methodol. Fan, J. and

Gijbels, I. (1996).

Local polynomial modelling and its applications . Monographs on Statisticsand Applied Probability . Chapman & Hall, London. MR1383587 Fritsch, F. N. and

Carlson, R. E. (1980). Monotone piecewise cubic interpolation.

SIAM J. Numer.Anal. Gervini, D. and

Gasser, T. (2004). Self-modelling warping functions.

J. R. Stat. Soc. Ser. B Stat.Methodol. Gervini, D. and

Gasser, T. (2005). Nonparametric maximum likelihood estimation of the structuralmean of a sample of curves.

Biometrika Hadjipantelis, P. Z. , Aston, J. A. D. , Mller, H. G. and

Evans, J. P. (2015). Unifying Ampli-tude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-EﬀectsModeling of Mandarin Chinese.

J. Amer. Statist. Assoc.

H¨ardle, W. and

Marron, J. S. (1990). Semiparametric comparison of regression curves.

Ann. Statist. Hsing, T. and

Eubank, R. (2015).

Theoretical foundations of functional data analysis, with an in-troduction to linear operators . Wiley Series in Probability and Statistics . John Wiley & Sons, Ltd.,Chichester. MR3379106

Irwin, K. and

Carter, P. (2013). Constraints on the evolution of function-valued traits: a study ofgrowth in Tribolium castaneum.

Journal of evolutionary biology Irwin, K. and

Carter, P. (2014). Artiﬁcial selection on larval growth curves in Tribolium: correlatedresponses and constraints.

Journal of evolutionary biology James, G. M. (2007). Curve alignment by moments.

Ann. Appl. Stat. Kneip, A. and

Gasser, T. (1992). Statistical tools to analyze data representing a sample of curves.

Ann. Statist. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations Kneip, A. and

Ramsay, J. O. (2008). Combining registration and ﬁtting for functional models.

J.Amer. Statist. Assoc.

Kneip, A. , Li, X. , MacGibbon, K. B. and

Ramsay, J. O. (2000). Curve registration by local regres-sion.

Canad. J. Statist. Lila, E. and

Aston, J. A. D. (2017). Functional and Geometric Statistical Analysis of TexturedSurfaces with an application to Medical Imaging. Tech. Report arXiv:1707.00453v1.

Liu, X. and

M¨uller, H.-G. (2004). Functional convex averaging and synchronization for time-warpedrandom curves.

J. Amer. Statist. Assoc. Marron, J. S. , Ramsay, J. O. , Sangalli, L. M. and

Srivastava, A. (2015). Functional data analysisof amplitude and phase variation.

Statist. Sci. Mø rken, K. (1991). Some identities for products and degree raising of splines.

Constr. Approx. Natanson, I. P. (1955).

Theory of functions of a real variable . Frederick Ungar Publishing Co., NewYork Translated by Leo F. Boron with the collaboration of Edwin Hewitt. MR0067952

Panaretos, V. M. and

Zemel, Y. (2016). Amplitude and phase variation of point processes.

Ann.Statist. Pigoli, D. , Hadjipantelis, P. Z. , Coleman, J. S. and

Aston, J. A. D. (2017). The statisticalanalysis of acoustic phonetic data: exploring diﬀerences between spoken Romance languages. Tech.Report arXiv:1507.07587v2.

Ramsay, J. O. and

Li, X. (1998). Curve registration.

J. R. Stat. Soc. Ser. B Stat. Methodol. Ramsay, J. O. and

Silverman, B. W. (2005).

Functional data analysis , second ed.

Springer Series inStatistics . Springer, New York. MR2168993

Rønn, B. B. (2001). Nonparametric maximum likelihood estimation for shifted curves.

J. R. Stat. Soc.Ser. B Stat. Methodol. Schimek, M. G. , ed. (2000).

Smoothing and regression . Wiley Series in Probability and Statistics: Ap-plied Probability and Statistics . John Wiley & Sons, Inc., New York Approaches, computation, andapplication, A Wiley-Interscience Publication. MR1795148

Schumaker, L. L. (2007).

Spline functions: basic theory , third ed.

Cambridge Mathematical Library .Cambridge University Press, Cambridge. MR2348176

Srivastava, A. , Wu, W. , Kurtek, S. , Klassen, E. and

Marron, J. S. (2011). Registration offunctionl data using Fisher-Rao metric. Tech. Report arXiv:1103.3817v2.

Tang, R. and

M¨uller, H.-G. (2008). Pairwise curve synchronization for functional data.

Biometrika Villani, C. (2003).

Topics in optimal transportation . Graduate Studies in Mathematics . AmericanMathematical Society, Providence, RI. MR1964483 Wand, M. P. and

Jones, M. C. (1995).

Kernel smoothing . Monographs on Statistics and AppliedProbability . Chapman and Hall, Ltd., London. MR1319818 Wang, K. and

Gasser, T. (1997). Alignment of curves by dynamic time warping.

Ann. Statist. Wang, K. and

Gasser, T. (1999). Synchronizing sample curves nonparametrically.

Ann. Statist. . Chakraborty and V. M. Panaretos/Functional Registration and Local Variations SUPPLEMENTARY MATERIAL

CMR t 0.0 0.2 0.4 0.6 0.8 1.0

FRM t 0.0 0.2 0.4 0.6 0.8 1.0 PW t Fig 8 . Plots of the registered data curves using some other procedures under Model without measurement error. − − − CMR t 0.0 0.2 0.4 0.6 0.8 1.0 − − − FRM t 0.0 0.2 0.4 0.6 0.8 1.0 − − − PW t Fig 9 . Plots of the registered data curves using some other procedures under Model without measurement error. CMR t 0.0 0.2 0.4 0.6 0.8 1.0

FRM t 0.0 0.2 0.4 0.6 0.8 1.0 PW t Fig 10 . Plots of the registered data curves using some other procedures under Model in the presence of measurement error.. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations − − − CMR t 0.0 0.2 0.4 0.6 0.8 1.0 − − − FRM t 0.0 0.2 0.4 0.6 0.8 1.0 − − − PW t Fig 11 . Plots of the registered data curves using some other procedures under Model in the presence of measurement error. − CMR t 0.0 0.2 0.4 0.6 0.8 1.0 − FRM t 0.0 0.2 0.4 0.6 0.8 1.0 − PW t Fig 12 . Plots of the registered data curves using some other procedures under the rank model. CMR t 0.0 0.2 0.4 0.6 0.8 1.0

FRM t 0.0 0.2 0.4 0.6 0.8 1.0 PW t Fig 13 . Plots of the registered data curves using some other procedures under the rank model. CMR

Age 1 4 7 10 13 16 19 22

FRM

Age 1 4 7 10 13 16 19 22 PW Age