[PDF] Intrinsic Riemannian Functional Data Analysis for Sparse Longitudinal Observations

Abstract

A novel framework is developed to intrinsically analyze sparsely observed Riemannian functional data. It features four innovative components: a frame-independent covariance function, a smooth vector bundle termed covariance vector bundle, a parallel transport and a smooth bundle metric on the covariance vector bundle. The introduced intrinsic covariance function links estimation of covariance structure to smoothing problems that involve raw covariance observations derived from sparsely observed Riemannian functional data, while the covariance vector bundle provides a rigorous mathematical foundation for formulating the smoothing problems. The parallel transport and the bundle metric together make it possible to measure fidelity of fit to the covariance function. They also plays a critical role in quantifying the quality of estimators for the covariance function. As an illustration, based on the proposed framework, we develop a local linear smoothing estimator for the covariance function, analyze its theoretical properties, and provide numerical demonstration via simulated and real datasets. The intrinsic feature of the framework makes it applicable to not only Euclidean submanifolds but also manifolds without a canonical ambient space.

Full PDF

IIntrinsic Riemannian Functional Data Analysis for SparseLongitudinal Observations

Zhenhua Lin Lingxuan Shao Fang Yao

National University of Singapore and Peking University

Abstract

A novel framework is developed to intrinsically analyze sparsely observed Riemannian functional data.It features four innovative components: a frame-independent covariance function, a smooth vector bundletermed covariance vector bundle , a parallel transport and a smooth bundle metric on the covariance vectorbundle. The introduced intrinsic covariance function links estimation of covariance structure to smoothingproblems that involve raw covariance observations derived from sparsely observed Riemannian functionaldata, while the covariance vector bundle provides a rigorous mathematical foundation for formulating thesmoothing problems. The parallel transport and the bundle metric together make it possible to measureﬁdelity of ﬁt to the covariance function. They also plays a critical role in quantifying the quality ofestimators for the covariance function. As an illustration, based on the proposed framework, we developa local linear smoothing estimator for the covariance function, analyze its theoretical properties, andprovide numerical demonstration via simulated and real datasets. The intrinsic feature of the frameworkmakes it applicable to not only Euclidean submanifolds but also manifolds without a canonical ambientspace.

MSC 2020 subject classiﬁcations : primary 62R10; secondary 62R30

Keywords : Intrinsic covariance function, vector bundle, parallel transport, diﬀusion tensor, smoothing,Fréchet mean.

Functional data are nowadays commonly encountered in practice and have been extensively studied in theliterature; for instance, see the monographs Ramsay and Silverman (2005); Ferraty and Vieu (2006); Hsingand Eubank (2015); Kokoszka and Reimherr (2017), as well as the survey papers Wang et al. (2016) andAneiros et al. (2019), for a comprehensive treatment on functional data analysis. These classic endeavorsstudy functional data of which functions are real- or vector-valued, and thus are challenged by data offunctions that do not take values in a vector space. Such data emerge recently, partially due to the rapiddevelopment of modern technologies. For example, in the longitudinal study of diﬀusion tensors, as thetensor measured at a time point is represented by a 3 × ∗ [email protected]. Research is partially supported by NUS startup grant R-155-000-217-133. † [email protected]. ‡ [email protected]. Research is partially supported by National Natural Science Foundation of China with a Key grant11931001 and a General grant 11871080, and the Key Laboratory of Mathematical Economics and Quantitative Finance (PekingUniversity), Ministry of Education. a r X i v : . [ s t a t . M E ] S e p Lin, Shao and Yao in particular, the usual Euclidean distance on it suﬀers from the “swelling eﬀect” which introduces artiﬁcialand undesirable inﬂation of variability in data analysis (Arsigny et al., 2007). Specialized distance functions(Pennec et al., 2006; Dryden et al., 2009) or metrics (Moakher, 2005; Arsigny et al., 2007; Lin, 2019) arerequired to alleviate or completely eliminate the swelling eﬀect. These metrics turn the space of SPD matricesof a ﬁxed dimension into a nonlinear Riemannian manifold. Data in the form of Riemannian-manifold-valuedfunctions are termed Riemannian functional data and modeled by Riemannian random processes which arerandom processes taking values in Riemannian manifolds (Lin and Yao, 2019).In addition to inﬁnite dimensionality, statistical analysis of Riemannian functional data is primarilychallenged by lack of vector structure of the Riemannian manifold. Consequently, the aforementioned classicworks on functional data, which are geared to Euclidean or vector spaces, do not directly apply to Riemannianfunctional data. This challenge is also shared by non-functional manifold-valued data analysis, on which thereexists vast literature. For example, fundamentals related to the Fréchet mean, a generalization of the meanof random variables or vectors for random elements in a Riemannian manifold, were studied in depth byBhattacharya and Patrangenaru (2003, 2005); Afsari (2011); Schötz (2019); Pennec (2019), while regressionfor manifold-valued data were investigated by Pelletier (2006); Shi et al. (2009); Steinke et al. (2010); Fletcher(2013); Hinkle et al. (2014); Cornea et al. (2017), and more broadly, for metric-space valued data by Hein(2009); Faraway (2014); Petersen and Müller (2019); Lin and Müller (2019), among many others.In contrast, the literature on Riemannian functional data is scarce and emerges only in recent years.Su et al. (2014) ﬁrst represented each trajectory by its normalized velocity curve (called square-root vectorﬁeld) and then transported the velocity vectors into a common tangent space. While Zhang et al. (2018)speciﬁcally considered spherical trajectories and developed a data transformation geared to the sphericalgeometry, Dai and Müller (2018) studied trajectories on a more general manifold but required the manifoldto sit in a Euclidean ambient space. Lin and Yao (2019) developed an intrinsic framework for Riemannianfunctional data that eliminates the need of an ambient space. Dubey and Müller (2020) investigated metric-space-valued functional data via the device of metric covariance, where each function takes values in ageneral metric space that includes the Riemannian manifold as a special case. All of these works assumefully observed functional data. The development by Dai et al. (2020) targets discretely and noisily recordedRiemannian functional data, but again requires a Euclidean ambient space.The mean and covariance functions are two of the most fundamental concepts in functional data analysis,as many downstream analysis depends on them. Therefore, it is of particular importance to generalize themto Riemannian functional data. For the mean function, the generalized counterpart is the well establishedFréchet mean function that is adopted in Dai and Müller (2018); Dai et al. (2020); Lin and Yao (2019) as wellas this paper. The real diﬃculty comes from modeling the covariance structure. To tackle nonlinearity ofthe Riemannian manifold, a strategy commonly employed in the aforementioned works is to transform datafrom the manifold into tangent spaces via Riemannian logarithmic maps, and then to model the covariancevia the transformed data. Speciﬁcally, at each time point, the associated observations are transformed intothe tangent space at the Fréchet mean in that time point. Although tangent spaces of a manifold are linearspaces and thus provide the desired vector structure, there is one issue to resolve:

Diﬀerent tangent spacesare distinct vector spaces and thus their tangent vectors are incomparable, but the covariance involves randomtangent vectors from diﬀerent tangent spaces.

More speciﬁcally, the value of the covariance function at apair ( s, t ) of time points involves observations at both s and t , and in the manifold setting, the observationsat these time points are often transformed into tangent vectors of distinct tangent spaces; see Figure 1.One approach to overcome the above issue, referred to as the ambient approach in this paper, is to placethe tangent spaces in a larger common vector space. This approach, adopted in Dai and Müller (2018);Dai et al. (2020), is particularly useful when the manifold comes with a natural Euclidean ambient space,2 INTRODUCTION

Intrinsic Riemannian Functional Data Analysis µ ( s ) µ ( t ) T µ ( t ) M T µ ( s ) M e e e e U V M Figure 1: Illustration of random tangent vectors in diﬀerent tangent spaces. The curve across the manifold M represents the Fréchet mean function µ deﬁned in (1). The two parallelograms represent the tangent spacesat µ ( s ) and µ ( t ) , respectively. An (random) observation associated with the time s is often transformedinto the tangent space T µ ( s ) M at µ ( s ) ; the transformed observation is represented by the tangent vector U in the ﬁgure. Similarly, the tangent vector V in the tangent space T µ ( t ) M at µ ( t ) represents a transformedobservation associated with the time point t . The tangent vectors U and V are incomparable, since theyreside in distinct tangent spaces. The vectors e and e represent an orthonormal basis in T µ ( s ) M while e ′ and e ′ form an orthonormal basis of T µ ( t ) M . The tangent vectors U and V can be respectively represented bytheir (random) coeﬃcients with reference to the bases { e , e } and { e ′ , e ′ } . The coeﬃcients are real-valuedrandom vectors and directly comparable, e.g., their covariance can be deﬁned in the usual way.as the ambient space serves as the common space for the transformed data, but is less appropriate whenthe manifold does not have a canonical Euclidean ambient space. In addition, operation performed in theambient space may violate the intrinsic geometry of the manifold and produce undesirable consequences.Another approach, adopted by Lin and Yao (2019), takes a completely intrinsic perspective by treating thetransformed data as realizations from a random element in a delicately constructed Hilbert space and thenmodeling the covariance by the covariance operator of the random element. Since the intrinsic strategydoes not refer to an ambient space, it does not suﬀer from the aforementioned shortcomings of the ambientstrategy. However, estimation of the covariance operator in Lin and Yao (2019) requires fully or denselyobserved functional data.In light of the major advantage of the intrinsic perspective, it is desirable to model and estimate thecovariance structure for sparsely observed Riemannian functional data without reference to an ambientspace, which turns out be rather challenging. For sparse functional data, a common strategy that is wellestablished in the Euclidean setting is to smooth the discrete and noisy raw covariance function (Yao et al.,2005; Li and Hsing, 2010; Zhang and Wang, 2016). However, there are fundamental diﬃculties in extendingthis seemingly simple strategy to the manifold setting. First, as previously mentioned, the covariance functioninvolves tangent vectors from diﬀerent tangent spaces, so that an appropriate deﬁnition of covariance betweentwo incomparable random tangent vectors is in order. A possible way is to ﬁx an orthonormal basis for eachtangent space and represent the random tangent vectors by their coeﬃcients with respect to the correspondingbases; see Figure 1. The bases are collectively referred to as an orthonormal frame. Then the covarianceof two random tangent vectors can be deﬁned as the covariance matrix of their coeﬃcients that are real-valued vectors and thus comparable. The resulting covariance function is indeed the covariance function3 Lin, Shao and Yao of the coeﬃcients of the transformed Riemannian random process with respect to the frame, and can beestimated via smoothing the coeﬃcients of the raw covariance functions with respect to the same frame.However, the estimate obtained by this approach is not invariant to the frame, i.e., diﬀerent frames resultin distinct estimates that can not be linked via an aﬃne transformation, because most smoothing methodsare not invariant to the frame; see Remark 3.1 for further discussion. Second, for the smoothing strategy towork, the underlying covariance function shall possess certain regularity of smoothness, such as continuityor diﬀerentiability. Without reference to an ambient space or a frame, it is rather challenging to deﬁne andquantify such regularity. This problem is unique to sparsely observed data; when data are fully observedor suﬃciently dense so that each trajectory can be individually recovered, the sample covariance operatorserves as an estimate for the covariance structure (Lin and Yao, 2019), which requires no smoothing.The major contribution of this paper is to develop a novel framework to intrinsically model and estimatethe covariance when Riemannian functional data are sparsely and noisily recorded. The framework featuresfour innovative components.• First, a frame-independent intrinsic covariance function is developed. This is made possible by con-sidering the covariance of two random tangent vectors in diﬀerent tangent spaces as a linear operatorthat maps one tangent space into the other. This covariance function does not require reference to aframe and thus is fundamentally diﬀerent from the covariance function of coeﬃcients with respect toa frame in Lin and Yao (2019).• Second, to extend the idea of smoothing a raw covariance function (Yao et al., 2005) to the manifoldsetting, we construct a novel smooth vector bundle from the manifold, termed covariance vector bundle ,to provide an appropriate mathematical foundation for intrinsic quantiﬁcation of the regularity of theproposed covariance function, such as continuity, diﬀerentiability and smoothness. For example, itmakes statements like “ﬁnd a smooth covariance function that minimizes the mean squared error”sensible. In addition, covariance function estimation is amount to smoothing data located in a smoothvector bundle. Although there is a rich literature on smoothing Riemannian manifold-valued data, thestudy on smoothing data in a vector bundle is still in its infancy.• Third, a parallel transport on the covariance vector bundle is developed from the intrinsic geometry ofthe manifold, which also induces a covariant derivative on the bundle. The covariant derivative allowsintrinsic deﬁnition of derivatives of a function taking values in the covariance vector bundle. Suchderivatives are often needed when one analyzes theoretical properties of a smoothing procedure. Theparallel transport also enables one to move the raw observations into a common vector space in whichclassic smoothing methods apply.• Fourth, a smooth bundle metric is constructed and plays an essential role in measuring the ﬁdelity of ﬁtto data during estimation and quantifying the quality of an estimator. It is derived from the intrinsicgeometry of the underlying Riemannian manifold and utilizes the Hilbert–Schmidt inner product oflinear operators between two potentially diﬀerent Hilbert spaces. Such inner product, mathematicallywell established (e.g., Deﬁnition 2.3.3 and Proposition B.0.7 by Prévôt and Röckner, 2007), is less seenin statistics; the common one is deﬁned for operators that map a Hilbert space into itself.The frame-independent covariance function and the covariance vector bundle together pave the way forintrinsically smoothing the observed raw covariance function, while the parallel transport and the bundlemetric are critical for developing an estimation procedure for sparsely observed Riemannian functional data.As an illustration, we propose an estimator for the covariance function based on local linear smoothingand establish the point-wise and uniform convergence rates of the estimator, while emphasize that other4

COVARIANCE VECTOR BUNDLE

Intrinsic Riemannian Functional Data Analysis

In this section we brieﬂy review concepts from Riemannian manifolds that are essential for our development,while refer readers to the introductory text by Lee (1997) for further exposition.Let M be a d -dimensional topological manifold, which by deﬁnition is a topological space such that, foreach point in M there is a homeomorphism between a neighborhood of the point and an open ball of R d .A smooth atlas on M is a collection of pairs ( U α , φ α ) that are indexed by an index set J and satisfy thefollowing axioms:• Each U α is an open subset of M and ⋃ α ∈ J U α = M , i.e., the domains U α together cover M .• Each φ α is a homeomorphism between U α and the open set φ α ( U α ) = { φ α ( x ) ∈ R d ∶ x ∈ U α } of R d .• For each pair of α, β ∈ J , if U α ∩ U β ≠ ∅ , then the transition map φ α ○ φ − β ∶ φ β ( U α ∩ U β ) → φ α ( U α ∩ U β ) ,illustrated in Figure 2, is smooth, i.e., inﬁnitely diﬀerentiable; we say φ α and φ β are compatible.The pair ( U α , φ α ) or sometimes φ α itself is called a chart (or coordinate map). Two atlases are compatibleif their union is again an atlas (satisfying the above axioms). An atlas is maximal if it contains any otheratlas compatible with it. The space M together with a smooth maximal atlas is called a smooth manifold,or simply manifold in this paper. Every point in a d -dimensional manifold is associated with a d -dimensionalvector space, called the tangent space at the point. In addition, any chart ( U α , φ α ) gives rises to a basisfor the tangent space at each point in U α , and the basis smoothly varies with the point with U α . Tangentspaces at diﬀerent points of a manifold are conceptually distinct spaces, so that their elements, called tangentvectors, are incomparable; only tangent vectors from the same tangent space are comparable.A Riemannian manifold is a manifold equipped with a Riemannian metric ⟨⋅ , ⋅⟩ which deﬁnes an innerproduct ⟨⋅ , ⋅⟩ p on the tangent space T p M at each point p ∈ M , with the associated norm denoted by ∥ v ∥ p =√⟨ v, v ⟩ p for v ∈ T p M . The metric, which smoothly varies with p , induces a distance function d M on M andturns the manifold into a metric space. A geodesic in a metric space is a curve of which every suﬃcientlysmall segment is the shortest path connecting the endpoints of the segment. For v ∈ T p M , suppose thatthere is a geodesic γ ( t ) deﬁned on [ , ] with γ ( ) = p such that γ ′ ( ) = v . Here, the derivative γ ′ ( ) ofa smooth curve, well deﬁned via the atlas of the manifold, is a tangent vector at γ ( ) and represents thevelocity of the curve γ at t =

0. The exponential map Exp p at p ∈ M is then deﬁned by Exp p ( v ) = γ ( ) . Inturn, for each v ∈ T p M , γ v ( t ) ∶= Exp p ( tv ) deﬁnes a geodesic. The inverse of Exp p , when it exists, is called A function between two topological spaces is homeomorphic if it is continuous and bijective, and its inverse is also continuous. Lin, Shao and Yao M U α U β φ α ( U α ) φ β ( U β ) R d φ β ( U α ∩ U β ) φ α ( U α ∩ U β ) φ β ◦ φ − α φ α φ β Figure 2: Illustration of the chart and transition map.the Riemannian logarithmic map at p and denoted by Log p ; see the left panel of Figure 3 for a graphicalillustration.In statistical analysis, it is still sometimes desirable to compare the incomparable tangent vectors fromdiﬀerent tangent spaces. To this end, one may transport the tangent vectors into a common tangent space inwhich tangent vectors can be directly compared by vector subtraction. For a Riemannian manifold, there isa unique (parallel) transport associated with the Riemannian metric . In this paper, we exclusively considerparallel transport along shortest geodesics between two points y and z , denoted by P zy , which moves tangentvectors from the tangent space T y M to T z M in a smooth way and meanwhile preserves the inner product;see the right panel of Figure 3 for a graphical illustration. T µ M v v p q q M γ γ u u u p q q γ γ Figure 3: Left: Illustration of the tangent space, geodesic, Riemannian exponential map and logarithmicmap, where γ j ( ) = p , γ ′ j ( ) = v j , γ j ( ) = q j , q j = Exp p v j and v j = Log p q j , for j = ,

2. Right: Illustration ofparallel transport. The tangent vector u at p is parallelly transported along geodesics γ and γ to p and p , resulting in tangent vectors u and u , respectively.A smooth vector bundle, denoted by π ∶ E → M or simply E , consists of a base smooth manifold M , asmooth manifold E called total space, and a smooth bundle projection π , such that for every p ∈ M , theﬁber π − ( p ) is a k -dimensional real vector space, and there is an open neighborhood U ⊂ M of p and a The unique parallel transport is indeed determined by the Levi–Civita connection. COVARIANCE VECTOR BUNDLE

Intrinsic Riemannian Functional Data Analysis E U × R k M U π ◦ Φ − Φ π π − ( U ) Figure 4: Illustration of the vector bundle. The closed curve in the bottom represents the base manifold M and the ﬁgure on the left represents the total space E , where each vertical line represents a ﬁber. Thethickened segment U ⊂ M represents an open subset of the manifold M , while Φ is a local trivializationdeﬁned on π − ( U ) that is highlighted in gray in the total space.diﬀeomorphism Φ ∶ π − ( U ) → U × R k satisfying the property that for all z ∈ U , ( π ○ Φ − )( z, v ) = z for all v ∈ R k and the map v ↦ Φ − ( z, v ) is a linear isomorphism between R k and π − ( z ) . The map Φ is calleda local trivialization. As graphically illustrated in Figure 4, a vector bundle locally resembles the productspace U × R k for some integer k . A prominent example of vector bundle is the space composed by the union ofall tangent spaces of a manifold, which is called the tangent bundle of the manifold, where the tangent spaceat each point is a ﬁber. To identify diﬀerent ﬁbers, one can introduce a parallel transport P on a vectorbundle along a geodesic curve γ on the base manifold. Such parallel transport must satisfy the followingaxioms: 1) P pp is the identity map on π − ( p ) for all p ∈ M , 2) P γ ( t ) γ ( u ) ○ P γ ( u ) γ ( s ) = P γ ( t ) γ ( s ) , and 3) the dependenceof P on γ , s and t are smooth. The parallel transport P introduced previously for a Riemannian manifoldis indeed a parallel transport on the tangent bundle. In Section 2.4 we will construct a new type of vectorbundle and a parallel transport on it. If for each ﬁber in a smooth vector bundle there is an inner productand the inner product smoothly varies from ﬁber to ﬁber, then the inner products are collectively referredto as a smooth bundle metric. The aforementioned Riemannian metric is indeed a smooth bundle metric onthe tangent bundle. Functional data in which each function takes values in a Riemannian manifold are termed Riemannianfunctional data and modeled by the Riemannian random process (Lin and Yao, 2019). Speciﬁcally, let M be a d -dimensional smooth Riemannian manifold and X a M -valued random process indexed by a compactdomain T ∈ R , i.e., X ∶ T × Ω → M , where Ω is the sample space of the underlying probability space. Inreality, measurements of X are often corrupted by noise. To accommodate this common practice, we assumethat the actual observable process is Y which is indexed by the same domain T .The process X is said to be of second order, if for each t ∈ T , F ( p, t ) = E d M ( X ( t ) , p ) < ∞ for some p ∈ M and hence for all p ∈ M due to the triangle inequality. The minimizer of F ( p, t ) , if it exists, is called the A diﬀeomorphism is a smooth bijective function between two smooth manifolds. Lin, Shao and Yao

Fréchet mean of X ( t ) and denoted by µ ( t ) , i.e., µ ( t ) ∶= arg min p ∈M F ( p, t ) . (1)The concept of the Fréchet mean generalizes the mean from the Euclidean space to the Riemannian manifoldand plays an important role in analysis of data residing in a Riemannian manifold. Under fairly generalconditions, the Fréchet mean exists and is unique (Bhattacharya and Patrangenaru, 2003; Sturm, 2003;Afsari, 2011), for instance, when the manifold is of nonpositive sectional curvature (p.146, Lee, 1997) or dataare located in a small subspace of the manifold. Formally, we make the following assumption. Assumption 2.1.

The Fréchet mean functions of X and Y exist and are unique. As the manifold M is not a vector space, it is challenging to directly study the processes X and Y .A common strategy to circumvent this diﬃculty is to transform them into tangent spaces, in which thevector structure can facilitate the analysis, via Riemannian logarithmic maps. This requires an additionalassumption to ensure the well-posedness of the Riemannian logarithmic maps. For simplicity, we assume thefollowing suﬃcient condition, which can be relaxed by a delicate formulation via cut locus . Assumption 2.2.

There exists a geodesically convex subset Q ⊂ M such that X ( t ) , Y ( t ) ∈ Q for all t ∈ T . If the manifold is simply connected and of nonpositive sectional curvature, Q can be taken to be M and thus the above assumption becomes superﬂuous. Examples of manifolds of this kind include hyperbolicmanifolds, tori and the space of symmetric positive-deﬁnite matrices endowed with the aﬃne-invariant metric(Moakher, 2005), Log-Euclidean metric (Arsigny et al., 2007) or Log-Cholesky metric (Lin, 2019). Anexample Q for Riemannian manifolds whose sectional curvature is not nonnegative is the positive orthant Q = {( x , . . . , x k ) ∈ S k ∶ x j ≥ j = , . . . , k } in the hypersphere S k = {( x , . . . , x k ) ∈ R k + ∶ x + ⋯ + x k = } , which has applications in compositional data analysis (Dai and Müller, 2018), where k is a positive integer.Under Assumptions 2.1 and 2.2, the Riemannian logarithmic maps Log µ ( t ) { X ( t )} and Log µ ( t ) { Y ( t )} are welldeﬁned. In addition, we can further model the observed process by Y ( t ) = Exp µ ( t ) ( Log µ ( t ) { X ( t )} + ε ( t )) ,where ε ( t ) ∈ T µ ( t ) M represents the random noise in the tangent space, is independent of X , and satisﬁes E ε ( t ) = µ ( t ) ε ( t ) ∈ Q . We shall see later that the mean functions of X and Y are the same.Now we are ready to model sparsely observed Riemannian functional data. First, the sample functions X , . . . , X n are considered as i.i.d. realizations of X . However, accessible are their noisy copies Y , . . . , Y n ,rather than X , . . . , X n . To further accommodate the practice that functions are often only recorded atdiscrete points, we assume each Y i is only observed at m i time points T i, , . . . , T i,m i ∈ T with T ij i.i.d. ∼ T ,where T is a random variable in T . Speciﬁcally, the observations are {( T ij , Y ij ) ∈ T ×M ∶ ≤ i ≤ n, ≤ j ≤ m i } with Y ij = Exp µ ( T ij ) ( Log µ ( T ij ) { X i ( T ij )} + ε ij ) , where the centered random elements ε ij ∈ T µ ( T ij ) M areindependent of each other and also independent of { X i ∶ ≤ i ≤ n } and { T kl ∶ ≤ k ≤ n, ≤ l ≤ m i }/{ T ij } .In addition, { T kl ∶ ≤ k ≤ n, ≤ l ≤ m i } is independent of { X i ∶ ≤ i ≤ n } . The assumed independence andidentical distributions in the above can be relaxed at the cost of much heavier technicalities. Such relaxationdoes not lead to additional insight and thus is not pursed in this article. Roughly speaking, the cut locus of a point p in a manifold is the collection of points q of the manifold such that Log p q isnot uniquely deﬁned. A subset in a Riemannian manifold is geodesically convex if for any two points in the subset there is a unique shortestgeodesic that is contained in the subset and connects the points. COVARIANCE VECTOR BUNDLE

Intrinsic Riemannian Functional Data Analysis In addition to the Fréchet mean function, the covariance structure of Riemannian functional data is essentialfor downstream analysis, for instance, functional principal component analysis. In Lin and Yao (2019)the covariance structure is modeled by the covariance operator of Log µ (⋅) X (⋅) from the random elementperspective (Chapter 7, Hsing and Eubank, 2015) and also by the covariance function of Log µ (⋅) X (⋅) withrespect to a frame . The covariance operator is not computationally friendly to sparse data while theframe-dependent covariance function is not compatible with most smoothing methods; see Remark 3.1 fordetails.To develop a frame-independent intrinsic concept of the covariance function from the perspective ofstochastic processes, we ﬁrst revisit the covariance between two centered random vectors U and V . Whenthey are in a common Euclidean space, it is classically deﬁned as the matrix E ( U V ⊺ ) . When U and V arein diﬀerent general inner product spaces U and V , a matrix representation of the covariance is deﬁnable ifone picks an orthonormal basis for each of U and V . To eliminate the dependence on the orthonormal bases,we take an operator perspective to treat the covariance C of U and V as a linear operator between U and V characterized by ⟨ Cu, v ⟩ V ∶= E (⟨ U, u ⟩ U ⟨ V, v ⟩ V ) , ∀ u ∈ U , v ∈ V , where ⟨⋅ , ⋅⟩ U and ⟨⋅ , ⋅⟩ V denote the inner products of U and V , respectively. To simplify the notation, we write C = E ( U ⊗ V ) .Observe that Log µ (⋅) X (⋅) (Log µ X for short) is a random vector ﬁeld along the curve µ with E ( Log µ X ) = µ ( s ) X ( s ) ∈ T µ ( s ) M and Log µ ( t ) X ( t ) ∈ T µ ( t ) M , and both T µ ( s ) M and T µ ( t ) M are Hilbert spaces, we deﬁne the covariance function for X at time ( s, t ) by C( s, t ) ∶= E { Log µ ( s ) X ( s ) ⊗ Log µ ( t ) X ( t )} . (2)This covariance function is clearly independent of any frame or coordinate system. This feature fundamentallyand distinctly separates (2) from the frame-dependent covariance function (5) deﬁned in Lin and Yao (2019)for the coordinate of Log µ X with respect to a frame along the mean function. Moreover, (2) can be viewed asthe intrinsic covariance function of the covariance operator C proposed in Lin and Yao (2019). Speciﬁcally,under some measurability or continuity assumption on X and the condition that E ∫ T ∥ Log µ ( t ) X ( t )∥ µ ( t ) <∞ , the process Log µ X can be regarded as a random element in the Hilbert space T ( µ ) ∶= { Z ∶ Z (⋅) ∈ T µ (⋅) M , ∫ T ⟨ Z ( t ) , Z ( t )⟩ µ ( t ) d t < ∞} endowed with the inner product ⟪ Z , Z ⟫ µ ∶= ∫ T ⟨ Z ( t ) , Z ( t )⟩ µ ( t ) d t for Z , Z ∈ T ( µ ) . The covariance operator C ∶ T ( µ ) → T ( µ ) for X can be deﬁned by ⟪ C u, v ⟫ µ ∶= E (⟪ Log µ X, u ⟫ µ ⟪ Log µ X, v ⟫ µ ) for u, v ∈ T ( µ ) . (3)The following theorem, which generalizes Theorem 7.4.3 of Hsing and Eubank (2015) to Riemannian randomprocesses, shows that the proposed covariance function induces the covariance operator C . Theorem 2.1.

Let

C(⋅ , ⋅) and C be deﬁned in (2) and (3) , respectively. Suppose that X is mean-squarecontinuous, i.e., lim k →∞ E d ( X ( t k ) , X ( t )) = for any t ∈ T and any sequence { t k } in T converging to t .Also assume that X is jointly measurable, i.e, X ∶ T × Ω → M is measurable with respect to the product σ -ﬁeld on T × Ω , where Ω is the sample space of the underlying probability space. Then under Assumptions Recall that a frame speciﬁes a basis for each tangent space. Lin, Shao and Yao

M × M( µ ( s ) , µ ( t ))C( s, t ) L ( µ ( s ) , µ ( t )) Figure 5: Illustration of the vector bundle L . The thick bending parallelogram presents the product manifold M × M and the vertical lines represent ﬁbers. The value of C( s, t ) is located within the ﬁber L ( µ ( s ) , µ ( s )) at the point ( µ ( s ) , µ ( t )) ∈ M × M . t ∈ T and u ∈ T ( µ ) , we have ( C u )( t ) = ∫ T C( s, t ) u ( s ) d s. In light of this result, in the sequel we use the same notation C to denote both the covariance operatorand the covariance function in (2). The proposed covariance function enables us to estimate the covarianceoperator C through estimating C( s, t ) for each ( s, t ) ∈ T × T in a frame-independent fashion. The frame-independent feature is of particular importance to deriving a frame-invariant estimate in the more practicalscenario that only discrete and noisy observations are available so that smoothing is desirable; see Section 3for details. To estimate the covariance function in (2), it seems rather intuitive to perform smoothing over the rawcovariance function ˆ C i,jk ∶= Log ˆ µ ( T ij ) Y ij ⊗ Log ˆ µ ( T ik ) Y ik , where ˆ µ is an estimate of µ to be detailed in Section 3.The ﬁrst challenge encountered is that these raw observations ˆ C i,jk do not reside in a common vector space.This also gives rise to the second challenge in deﬁning the critical concept of smoothness of the function C and its estimate. To circumvent these diﬃculties, we consider the spaces L ( p, q ) consisting of all linear mapsfrom T p M to T q M , and their disjoint union L = ⋃ ( p,q )∈M L ( p, q ) . Then ˆ C i,jk are encompassed by the space L , and in addition, the covariance function C is now viewed as a L -valued function. Although the space L isnot a vector space so that the smoothness is not deﬁnable in the classic sense, we observe that L comes witha canonical smooth structure induced by the manifold M , and continuity, diﬀerentiability and smoothnessrelevant to statistics can be deﬁned with reference to this smooth structure, as follows.We ﬁrst observe that L is a vector bundle on M × M , with π ∶ L → M × M deﬁned by π ( L ( p, q )) = ( p, q ) being the bundle projection and L ( p, q ) being the ﬁber attached to the point ( p, q ) ∈ M × M ; see Figure 5for a graphical illustration. To deﬁne the smoothness structure on L induced by the manifold M , let {( U α , φ α ) ∶ α ∈ J } for an index set J be an atlas of M . Recall that each chart ( U α , φ α ) gives rise to a smoothlyvarying basis of T p M for each p ∈ U α . Such basis is denoted by B α, ( p ) , . . . , B α,d ( p ) . For ( p, q ) ∈ U α × U β , thetensor products B α,j ( p )⊗ B β,k ( q ) , j, k = , . . . , d , form a basis for the space L ( p, q ) . Each element v ∈ L ( p, q ) isthen identiﬁed with its coeﬃcients v jk with respect to this basis, i.e., v = ∑ dj,k = v jk B α,j ( p )⊗ B β,k ( q ) . For each10 COVARIANCE VECTOR BUNDLE

Intrinsic Riemannian Functional Data Analysis U α × U β , we deﬁne the map ϕ α,β ( p, q, ∑ dj,k = v jk B α,j ( p ) ⊗ B β,k ( q )) = ( φ α ( p ) , φ β ( q ) , v , v , . . . , v dd ) ∈ R d + d ,for ( p, q ) ∈ U α × U β . The collection {( π − ( U α × U β ) , ϕ α,β ) ∶ ( α, β ) ∈ J } indeed is a smooth atlas thatturns L into a smooth manifold. Moreover, L is a smooth vector bundle with the projection map π and thelocal trivializations Φ α,β ∶ π − ( U α × U β ) → U α × U β × R d deﬁned as Φ α,β ( p, q, ∑ dj,k = v jk B α,j ( p ) ⊗ B β,k ( q )) =( p, q, v , v , . . . , v dd ) . Theorem 2.2.

The collection {( π − ( U α × U β ) , ϕ α,β ) ∶ ( α, β ) ∈ J } is a smooth atlas on L . With this atlas, L is a smooth vector bundle with the smooth projection map π and smooth local trivializations Φ α,β . Inaddition, any compatible atlas of the manifold M gives rise to the same smooth vector bundle L . With the above smooth structure, the covariance function C in (2), viewed as an L -valued function, is saidto be κ -times continuously diﬀerentiable in ( s, t ) , if ( µ ( s ) , µ ( t )) ∈ U α × U β implies that ϕ α,β ( µ ( s ) , µ ( t ) , C( s, t )) is κ -times continuously diﬀerentiable in ( s, t ) , where we recall that {( π − ( U α × U β ) , ϕ α,β ) ∶ ( α, β ) ∈ J } isa smooth atlas on L . From this perspective, the constructed vector bundle L provides a framework torigorously deﬁne the regularity of C . In this framework, estimating the covariance function C is amount tosmoothing the discrete raw observations ˆ C i,jk in the vector bundle L .Although the vector bundle L provides a qualitative framework for deﬁning diﬀerentiability or othersmoothness regularity, it does not provide a quantitative characterization. Roughly speaking, the smoothvector bundle L allows one to check whether C is diﬀerentiable or smooth, but not to measure how rapidly C changes relative to ( s, t ) . In other words, derivatives, that quantify the rate of change of the function C at a given pair ( s, t ) and that are consistent across all compatible atlases for L , require an additionalstructure, as follows. We ﬁrst introduce the parallel transport on the covariance vector bundle L to identifydiﬀerent ﬁbers and to compare the elements from the ﬁbers. Suppose that ( p , q ) , ( p , q ) ∈ M × M and γ ( t ) = ( γ p ( t ) , γ q ( t )) is a shortest geodesic connecting ( p , q ) to ( p , q ) . The parallel transport P ( p ,q )( p ,q ) from a ﬁber L ( p , q ) to another ﬁber L ( p , q ) along the curve γ ( t ) is naturally constructed from the paralleltransport operators P p p and P q q on M by ( P ( p ,q )( p ,q ) C )( u ) ∶= P q q ( C (P p p u )) , (4)where C ∈ L ( p , q ) and u ∈ T p M . To distinguish between the parallel transport on the manifold and theone on the vector bundle L , notationally we use the caliligraphic symbol P for the manifold while the scriptsymbol P for the bundle. The parallel transport P further determines a covariant derivative on the bundle.For a deﬁnition of covariant derivatives, see Chapter 4 (speciﬁcally, Page 50) of Lee (1997). Theorem 2.3.

For a tangent vector V of M × M at ( p, q ) , the map ∇ V deﬁned by ∇ V W ∶= lim h → P γ ( ) γ ( h ) W ( γ ( h )) − W ( γ ( )) h ∶= dd t P γ ( ) γ ( t ) W ( γ ( t ))∣ t = for all diﬀerentiable section W (5) is a covariant derivative in the direction of V , where γ is a geodesic in M × M with initial point γ ( ) = ( p, q ) and initial velocity γ ′ ( ) = V , and a section is any function W ∶ M × M → L satisfying W ( p, q ) ∈ L ( p, q ) forall ( p, q ) ∈ M × M . The covariant derivative of a section W can be viewed as the ﬁrst derivative of the section. It quantiﬁesthe rate and direction of change of W at each point in M × M . This applies to the covariance function C since it can be viewed as a section along the surface µ × µ ∶ T × T → M × M . For example, the “partialderivative” ∂∂s C( s, t )∣ s = s of C( s, t ) with respect to s is deﬁned as the covariant derivative of C( s, t ) in thedirection V = η ′ ( s ) with η ( s ) = ( µ ( s ) , µ ( t )) being a curve on M × M . Furthermore, since the derivative112

Lin, Shao and Yao x x z z f ( x ) f ( x ) f ( x ) x -axis y - a x i s z -axis = R R x R x x x f ( x ) f ( x ) P x x ( f ( x )) f ( x ) F x F x Figure 6: Illustration of classic diﬀerentiation (left) and general covariant derivative (right). ∂∂s C( s, t ) is again a section of the vector bundle, one can deﬁne the partial derivatives of ∂∂s C( s, t ) , whichcan be regarded as the second derivatives of C . Higher-order derivatives can be deﬁned in a recursive way.To further illustrate the parallel transport and the induced covariant derivative on the vector bundle π ∶ L → M × M , consider a simple example in which M = R and the bundle L is then parameterized by ( x, y, z ) ∈ R × R . Let g ∶ M × M → L be a smooth section. For visualization, we ﬁx y = f ( x ) = g ( x, ) . For the smooth function f ( x ) shown in Figure 6, the classic deﬁnition of the derivative of f ( x ) at x is ∂∂x f ( x )∣ x = x ∶= lim x → x f ( x ) − f ( x ) x − x . From the perspective of the vector bundle, each point x in the x -axis is attached with a ﬁber R x which issimply a copy of the z -axis = R . Since f ( x ) ∈ R x while f ( x ) ∈ R x , the operation f ( x ) − f ( x ) would notbe well deﬁned if we did not identify R x with R x . The identiﬁcation between R x and R x is canonical, andnothing else but parallelly transporting R x to R x . This inspiring observation applies to general manifoldsand covariant derivatives. Speciﬁcally, the covariant derivative is deﬁned by parallel transporting f ( x ) fromthe ﬁber F x into the ﬁber F x and then performing diﬀerentiation therein, i.e., ∂∂x f ( x )∣ x = x ∶= lim x → x P x x ( f ( x )) − f ( x ) x − x ∈ F x . Finally, when smoothing the raw covariance function ˆ C i,jk , one needs to quantify the discrepancy betweenthe data and the ﬁtted. Such discrepancy is often measured by a distance function or inner product on thedata space. Fortunately, the vector bundle L comes with a natural bundle metric. Speciﬁcally, for any ( p, q ) ∈ M × M , the inner product G ( p,q ) ∶ L ( p, q ) × L ( p, q ) → R is deﬁned as the Hilbert–Schmidt innerproduct, i.e., G ( p,q ) ( L , L ) = d ∑ k = ⟨ L e k , L e k ⟩ q for L , L ∈ L ( p, q ) , (6)where e , . . . , e d denotes an orthonormal basis of T p M . One can show that the deﬁnition (6) does not dependon the choice of the orthonormal basis. In fact, G is a smooth bundle metric and the parallel transport (4)deﬁnes an isometry between any two ﬁbers, as asserted by the following result. Theorem 2.4.

The metric deﬁned in (6) is a vector bundle metric that smoothly varies with ( p, q ) ∈ M × M and is preserved by the parallel transport in (4) . The inner product (6) is deﬁned for linear operators that map a Hilbert space, such as T p M , to anotherpotentially diﬀerent Hilbert space, such as T q M . The inner product of this type, although mathematicallywell established (e.g., Deﬁnition 2.3.3 and Proposition B.0.7 by Prévôt and Röckner, 2007), is less seen in12 ESTIMATION

Intrinsic Riemannian Functional Data Analysis G also induces a norm, denoted by ∥ ⋅ ∥ G ( p,q ) or simply ∥ ⋅ ∥ G , on each ﬁber L ( p, q ) . Thisnorm in turn deﬁnes a distance on each ﬁber L ( p, q ) by ∥ A − B ∥ G ( p,q ) for A, B ∈ L ( p, q ) , which is an integratedpart of the loss function in (7) for estimating the covariance function.The smooth vector bundle L together with the covariant derivative (5) and the bundle metric (6), termed covariance vector bundle in this paper, paves the way for estimation of the covariance function (2) fromsparsely observed Riemannian functional data. The smooth structure and the covariant derivative togetherprovide an intrinsic mechanism to quantify the regularity of C . For example, it makes meaningful thestatement that the second derivatives of C( s, t ) are continuous. In the Euclidean case, statements of thiskind are often adopted as assumptions that are fundamental to theoretical analysis of estimators derivedfrom a smoothing method. The developed vector bundle and covariant derivative now enable us to extendsuch assumptions to the manifold setting, as demonstrated in Section 4.2 where we analyze the theoreticalproperties of our estimators proposed in Section 3 for the covariance function C . The parallel transport (4)and the bundle metric (6) allow an intrinsic measure of the discrepancy of objects in the covariance vectorbundle. Such measure is critical for ﬁnding an estimator for C and quantifying the quality of the estimator,as illustrated in the following sections. The ﬁrst step is to estimate the mean function, for which we adopt the local linear regression methodproposed by Petersen and Müller (2019) and also employed by Dai et al. (2020), as follows. Deﬁne the localweight function ˆ w ( T ij , t, h µ ) = σ ( t ) K h µ ( T ij − t ){ ˆ u ( t ) − ˆ u ( t )( T ij − t )} , where ˆ u k ( t ) = ∑ i λ i ∑ j K h µ ( T ij − t )( T ij − t ) k , ˆ σ ( t ) = ˆ µ ( t ) ˆ µ ( t ) − ˆ µ ( t ) and K h µ (⋅) = K (⋅/ h µ )/ h µ for a kernelfunction K with bandwidth h µ >

0. The estimate ˆ µ is deﬁned as the minimizer of the weighted functionˆ Q n ( y, t ) = ∑ ≤ i ≤ n λ i ∑ ≤ j ≤ m i ˆ w ( T ij , t, h ) d M ( Y ij , y ) , i.e., ˆ µ ( t ) = arg min y ∈M ˆ Q n ( y, t ) , where the weights { λ i } ≤ i ≤ n are subject-speciﬁc and satisfy ∑ ni = λ i m i =

1. For the Euclidean case

M = R ,the objective function ˆ Q n ( y, t ) coincides with the sum of squared error loss used in Zhang and Wang (2016).Two popular choices for λ i are λ i = ∑ ni = m i (Yao et al., 2005) that assigns equal weight to each observation,and λ i = nm i (Li and Hsing, 2010) that assigns equal weight to each subject. Further choices are discussedin Zhang and Wang (2018).Given the parallel transport introduced in Section 2.4, we are allowed to move raw observations ˆ C i,jk from diﬀerent ﬁbers into the same ﬁber and employ the classic local linear smoothing on the transportedobservations. Recall that the raw covariance is deﬁned byˆ C i,jk ∶= Log ˆ µ ( T ij ) Y ij ⊗ Log ˆ µ ( T ik ) Y ik ∈ L ( ˆ µ ( T ij ) , ˆ µ ( T ik )) . For the above to be well deﬁned, similar to Assumption 2.1, we assume the existence and uniqueness of theempirical mean function ˆ µ . Such condition is not needed for manifolds of nonpositive sectional curvature,134 Lin, Shao and Yao or may be replaced by a convexity condition on the distance function when Assumption 2.2 holds.

Assumption 3.1.

The estimated mean function ˆ µ ( t ) exists and is unique for each t ∈ T . To estimate C( s, t ) , the nearby raw observations ˆ C i,jk are parallelly transported into the ﬁber L ( ˆ µ ( s ) , ˆ µ ( t )) ,and the estimate ˆ C( s, t ) is set by ˆ C( s, t ) = ˆ β with ( ˆ β , ˆ β , ˆ β ) = arg min β ,β ,β ∈ L ( ˆ µ ( s ) , ˆ µ ( t )) ∑ i ν i ∑ j ≠ k ∥ P ( ˆ µ ( s ) , ˆ µ ( t ))( ˆ µ ( T ij ) , ˆ µ ( T ik )) ˆ C i,jk − β − β ( T ij − s ) − β ( T ik − t )∥ G ( ˆ µ ( s ) , ˆ µ ( t )) K h C ( s − T ij ) K h C ( t − T ik ) , (7)where h C > P ( ˆ µ ( s ) , ˆ µ ( t ))( ˆ µ ( T ij ) , ˆ µ ( T ik )) is the parallel transport along minimizing geodesics deﬁned in(4), and the weights { ν i } ≤ i ≤ n are subject-speciﬁc and satisfy ∑ ni = ν i m i ( m i − ) =

1. Similar to the estimationof the mean function, two popular choices for the weights are ν i = ∑ ni = m i ( m i − ) (Yao et al., 2005) that assignequal weight to each observation, and ν i = nm i ( m i − ) (Li and Hsing, 2010) that assign equal weight to eachsubject. Further options are studied in Zhang and Wang (2018).The objective function in (7) involves only intrinsic concepts and thus is fundamentally diﬀerent from theobjective function in (5) of Dai et al. (2020) in which the raw observations ˆ C i,jk are computed in an ambientspace. In addition, the quantities ˆ C i,jk in (7) are frame-independent and thus the resulting estimator isinvariant to the frame . This frame-independent feature makes our estimator distinct from the hypotheticalestimator discussed in the following Remark 3.1 in which a frame is essential and the produced estimator isnot invariant to the frame. Remark 3.1.

An “obvious” estimator for C might be obtained by utilizing a frame along ˆ µ (⋅) and thecoeﬃcient process of Lin and Yao (2019). Speciﬁcally, ﬁx a frame along ˆ µ which determines an orthonormalbasis of T ˆ µ ( t ) M for each t ∈ T . Then Log ˆ µ ( T ij ) Y ij can be represented by its coeﬃcient vector ˆ c ij with respectto the frame, and ˆ C i,jk is also represented by the observed coeﬃcient matrix ˆ c ij ˆ c ⊺ ik . Local linear smoothingor other smoothing methods can be applied on these matrices to yield an estimated coeﬃcient matrix at anypair ( s, t ) of time points, and the corresponding estimate ˆ C( s, t ) is recovered from the estimated coeﬃcientmatrix and the frame. However, this estimate is not invariant to the frame, i.e., diﬀerent frames give rise todiﬀerent estimates ˆ C( s, t ) . The reason is that, smoothing methods optimize certain objective function of theobservations which are the frame-dependent coeﬃcient matrices ˆ c ij ˆ c ⊺ ik in this context, while most objectivefunctions, like sum of squared errors, are not invariant to the frame. Note that in (7) this issue is avoidedby using a frame-free objective function. Remark 3.2.

One might attempt to endow L with a distance ρ so that the estimation is turned into aregression problem with a metric-space-valued response and the local linear method of Petersen and Müller(2019) can be adopted. Such distance is expected to have the following properties:• The distance ρ on L coincides with the ﬁber metric G for any two points on the same ﬁber. Speciﬁcally,for L , L ∈ L ( p, q ) , ρ ( L , L ) = G ( p,q ) ( L − L , L − L ) .• The distance ρ on the zero section W ( p, q ) = ∈ L ( p, q ) coincides with the geodesic distance on M×M .Speciﬁcally, for ( p , q ) , ( p , q ) ∈ M × M , ρ ( W ( p , q ) , W ( p , q )) = d M (( p , q ) , ( p , q )) .• When M is a Euclidean space, especially when M = R , the estimate derived from Petersen and Müller(2019) under the distance ρ coincides with the classic estimate, i.e., the estimate derived from the samemethod but applied to the observations ˆ C i,jk ∈ R that are treated as real-valued responses. For the computational purpose, a frame might be adopted, but the resulting estimator is independent of the choice of theframe, since the objective function in (7) does not depend on any frame. ASYMPTOTIC PROPERTIES

Intrinsic Riemannian Functional Data Analysis However, such distance ρ does not exist. On one hand, the positive-deﬁniteness of the distance suggests that ρ ( ˆ C i ,j k , ˆ C i ,j k ) ≠ as long as ˆ C i ,j k , ˆ C i ,j k ∈ L reside in diﬀerent ﬁbers, i.e., when ˆ µ ( T i j ) ≠ ˆ µ ( T i j ) or ˆ µ ( T i k ) ≠ ˆ µ ( T i k ) . On the other hand, when M = R , the quantities ˆ C i ,j k and ˆ C i ,j k are treated asreal numbers and thus their distance could be zero even when ˆ µ ( T i j ) ≠ ˆ µ ( T i j ) or ˆ µ ( T i k ) ≠ ˆ µ ( T i k ) . Once an estimate ˆ C of the covariance function C is obtained, according to Theorem 2.1, the intrinsicRiemannian functional principal component proposed in Lin and Yao (2019) can be adopted. Speciﬁcally,eigenvalues ˆ λ k and eigenfunctions ˆ ψ k of ˆ C can be obtained by eigen-decomposition of ˆ C , e.g., via the methoddescribed in Section 2.3 of Lin and Yao (2019). For estimation of the scores ξ ik = ⟪ X i , ψ k ⟫ in the intrinsicKarhunen–Loéve expansion Log µ X i = ∑ ∞ k = ξ ik ψ k proposed in Lin and Yao (2019), numerical approximationto the integral ⟪ X i , ψ j ⟫ is infeasible when the data are sparse. In the Euclidean setting, this issue isaddressed by the technique of principal analysis through conditional expectation (PACE, Yao et al., 2005).The technique was also adopted by Dai et al. (2020) for their ambient approach to Riemannian functional dataanalysis on sparsely observed data. To adapt this technique in our intrinsic framework, for each T ˆ µ ( T ij ) M ,we ﬁx an orthonormal basis B ij, , . . . , B ij,d ; in Proposition 3.1 we will show that the computed scores donot depend on the choice of the basis. Then, the observations Log ˆ µ ( T ij ) Y ij and the estimated eigenfunctionsˆ ψ k ( T ij ) can be represented by their respective coordinate vectors z ij and g k,ij with respect to the basis.Similarly, the estimated covariance function ˆ C( T ij , T il ) at ( T ij , T il ) can be represented by a matrix C i,jl ofcoeﬃcients. By treating the vectors z ij as R d -valued observations, the best linear unbiased predictor (BLUP)of ξ ik is given by ˆ ξ ik = ˆ λ k g ⊺ k,i Σ − i z i , (8)where g k,i = ( g ⊺ k,i , . . . , g ⊺ k,im i ) ⊺ , z i = ( z ⊺ i , . . . , z ⊺ im i ) ⊺ andΣ i = ˆ σ I + ⎛⎜⎜⎜⎜⎜⎝ C i, C i, ⋯ C i, m i C i, C i, ⋯ C i, m i ⋮ ⋮ ⋱ ⋮ C i,m i C i,m i ⋯ C i,m i m i ⎞⎟⎟⎟⎟⎟⎠ with ˆ σ = ∑ ni = ∑ m i j = ( ndm i ) − tr { z ij z ⊺ ij − ˆ C( T ij , T ij )} . The following invariance principle shows that the scoresˆ ξ ik in (8) are invariant to the choice of bases B ij, , . . . , B ij,d . This extends the invariance principle of Linand Yao (2019) from the full/dense design to the sparse design. Proposition 3.1.

The principal component scores ˆ ξ ik in (8) do not depend on the choice of the orthonormalbases {( B ij, , . . . , B ij,d ) ∶ i = , . . . , n, j = , . . . , m i } . The pointwise convergence rate of the estimate ˆ µ ( t ) is established in Petersen and Müller (2019), whilethe uniform convergence rate is derived by Dai et al. (2020). For completeness, we include them here, andestablish a new local uniform result that is used in the theoretical analysis of the covariance estimator, asfollows. First, we require the following assumptions, where the condition (b) may be replaced with tail andmoment conditions on the distributions of Y and X at the cost of much heavier technicalities. Assumption 4.1.

Lin, Shao and Yao (a)

The Riemannian manifold M is complete and simply connected . (b) There exists a compact subset of

K ⊂ M such that Pr { X ( t ) , Y ( t ) ∈ K for all t ∈ T } = . (c) The domain T is a compact interval. (d) The probability density function f ( t ) of T are twice diﬀerentiable and satisﬁes inf t ∈T f ( t ) > . (e) The conditional density function g y ( t ) of T given Y is twice diﬀerentiable and bounded uniformly over y ∈ M . (f) The kernel function K is smooth, symmetric and compactly supported on [− , ] . To state the assumption on the regularity of the mean function, we deﬁne u k ( t ) ∶= E { K h µ ( T − t )( T − t ) k } , σ ( t ) ∶= u ( t ) u ( t ) − u ( t ) , and w ( T, t, h µ ) ∶= σ ( t ) K h µ ( T − t )[ u ( t ) − u ( t )( T − t )] . Also, deﬁne˜ µ ( t ) ∶= arg min y ∈M ˜ Q n ( y, t ) , where ˜ Q n ( y, t ) ∶= E { w ( T, t, h µ ) d M ( Y, y )} . In the above the dependency of ˜ Q n ( y, t ) and ˜ µ ( t ) on the sample size n arises from the parameter h µ = h µ ( n ) .Let F ∗ ( y, t ) = E d M ( Y ( t ) , p ) . The following imposed regularity on the mean function or related quantitiesis adapted from Petersen and Müller (2019) and is specialized to the Riemannian manifold. Assumption 4.2. (a)

The mean curve µ ( t ) is twice continuously diﬀerentiable. The minimizer ˜ µ exists and is unique. (b) For any δ > n →∞ inf d M( y,µ ( t ))> δ t ∈T { F ∗ ( y, t ) − F ∗ ( µ ( t ) , t )} > , lim inf n →∞ inf d M( y, ˜ µ ( t ))> δ t ∈T { ˜ Q n ( y, t ) − ˜ Q n ( ˜ µ ( t ) , t )} > . (c) There exist η > and C > such that for all t ∈ T and all y with d M ( y, µ ( t )) < η , lim inf n →∞ F ∗ ( y, t ) − F ∗ ( µ ( t ) , t ) − C d M ( y, µ ( t )) ≥ . (d) There exist η > and C > such that for all t ∈ T and all y with d M ( y, ˜ µ ( t )) < η , lim inf n →∞ ˜ Q n ( y, t ) − ˜ Q n ( ˜ µ ( t ) , t ) − C d M ( y, ˜ µ ( t )) ≥ . The following proposition states the point-wise and uniform convergence rates of the estimated meanfunction, where the point-wise rate can be derived by extending the proof in Petersen and Müller (2019)and the uniform rate is established in Dai et al. (2020). Note that for a clear exposition, we assume m i = m , while emphasize that extension to more general cases is technically straightforward but notionallycomplicated (Zhang and Wang, 2016). A manifold M is simply connected if and only if it is path-connected, and for any two continuous paths γ and γ with thesame start and end points, γ can be continuously deformed into γ , i.e., there exists a a continuous function γ ∶ [ , ] → M such that γ ( s, ) = γ ( s ) and γ ( s, ) = γ ( s ) for s ∈ [ , ] . Here, a manifold is path-connected if for each pair of points thereexists a continuous path between them. ASYMPTOTIC PROPERTIES

Intrinsic Riemannian Functional Data Analysis Proposition 4.1.

Suppose that Assumptions 2.1, 2.2, 3.1, 4.1 and 4.2 hold. If h µ → and nmh µ → ∞ ,then for any ﬁxed t ∈ T , d M ( µ ( t ) , ˆ µ ( t )) = O p ( h µ + n + nmh µ ) . If h µ → and nmh µ / log n → ∞ , then sup t ∈T d M ( µ ( t ) , ˆ µ ( t )) = O p ( h µ + log nnmh µ + log nn ) . To derive the point-wise convergence rate of the estimator ˆ C( s, t ) in the next subsection, we require alocal convergence property of the estimator ˆ µ . The following Proposition 4.2, which is new in the literature,shows that the local uniform convergence rate is the same as the point-wise rate in Proposition 4.1, anddiﬀers from the global uniform convergence rate that has an additional log n factor. The reason for thisphenomenon is that E { K h µ ( T − t )} = t but E { sup t ∈T K h µ ( T − t )} = / h µ → ∞ . Therefore,the additional log n factor is needed to oﬀset this explosion in the case of global uniform convergence. In thelocal case, if h = O ( h µ ) and thus E { sup τ ∶∣ τ − t ∣≤ h K h µ ( T − τ )} = O ( h / h µ ) = O ( ) , then no oﬀset is required.The proposition also directly implies the point-wise rate in Proposition 4.1. Its proof can be found in thesupplementary material. Proposition 4.2.

Suppose that Assumptions 2.1, 2.2, 3.1, 4.1 and 4.2 hold. If h µ → and nmh µ → ∞ ,then for any ﬁxed t and h = O ( h µ ) , sup τ ∶∣ τ − t ∣≤ h d M ( µ ( τ ) , ˆ µ ( τ )) = O p ( h µ + n + nmh µ ) . To analyze the asymptotic properties of the covariance estimate ˆ C , we start with the following assumptionon the regularity of the covariance function C . As discussed in Section 2.4, such regularity condition in themanifold setting is made precise and meaningful by the constructed covariance vector bundle L and thecovariant derivative ∇ in (5). Assumption 4.3.

The covariance function C is twice diﬀerentiable and its second derivatives are continuous. To study the asymptotic properties of the estimator ˆ C , one of the major challenges that are not encoun-tered in the Euclidean setting of Zhang and Wang (2016) or the ambient case of Dai et al. (2020) is todeal with the parallel transport in (7). It turns out that we need to quantify the discrepancy between atangent vector and the parallelly transported one along a geodesic quadrilateral. We address this issue bythe following lemma which might be of independent interest. The proof of the lemma given in the appendixutilizes holonomy theory that seems relatively new in statistics. Lemma 4.1.

For a compact subset

G ⊂ M , there exists a constant c > depending only on G , such that forall p , p , q , q , y ∈ G , ∥P p q P q q Log q y − P p p Log p y ∥ p ≤ c ( d M ( p , q ) + d M ( p , q )) . With the above regularity condition and lemma, the following theorem establishes the point-wise conver-gence rate of ˆ C . 178 Lin, Shao and Yao

Theorem 4.1.

Suppose that Assumptions 2.1, 2.2, 3.1, 4.1, 4.2 and 4.3 hold. If h µ → , h C = O ( h µ ) , and min { nmh µ , nm h C } → ∞ , then for a ﬁxed t ∈ T , ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G ( µ ( s ) ,µ ( t )) = O p ( h µ + h C + n + nmh µ + nm h C ) . (9)The rate in the above theorem matches the point-wise rate in the Euclidean setting of Zhang and Wang(2016) in the case of m = ⋯ = m n = m . Unlike Zhang and Wang (2016) who assumed that the meanfunction is known in their analysis, we do not need such assumption thanks to the local uniform rate ofthe mean function stated in Proposition 4.2. In our analysis, the local uniform rate can not be replacedwith the global uniform rate in Proposition 4.1 without introducing an additional log n factor. Although thecondition h C = O ( h µ ) is required in order to utilize Proposition 4.2, it does not limit the convergence rate,as a proper choice of h µ and h C leads to the following rates that still match the rates of Zhang and Wang(2016). Corollary 4.1.

Suppose that Assumptions 2.1, 2.2, 3.1, 4.1, 4.2 and 4.3 hold. (a)

When m ≍ n / or m ≫ n / , with h µ ≍ h C ≍ n − / , one has ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G ( µ ( s ) ,µ ( t )) = O p ( n ) . (b) When m ≪ n / , with h µ ≍ h C ≍ n − / m − / , one has ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G ( µ ( s ) ,µ ( t )) = O p ( n / m / ) . Like the Euclidean case, a phase transition is observed at m ≍ n / . With a proper choice of h µ and h C , if m grows at least as fast as n / , it does not impact the convergence rate. Otherwise, the samplingrate m becomes an integrable part of the convergence rate of ˆ C . In particular, when m ≪ n / , the choice h µ ≍ n − / m − / is required to respect the condition h C = O ( h µ ) . This choice is strictly larger than theoptimal choice h µ ≍ ( nm ) − / that is implied by Proposition 4.1 in the case of m ≪ n / . This suggests thatoversmoothing in the mean function estimation is required in order to reach the optimal point-wise rate ofthe covariance estimator when m ≪ n / .The following results establish the uniform convergence rate of the covariance estimator ˆ C . Theorem 4.2.

Suppose that Assumptions 2.1, 2.2, 3.1, 4.1, 4.2 and 4.3 hold. If h µ → , nmh µ / log n → ∞ and nm h C / log n → ∞ , then sup ( s,t )∈T ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G ( µ ( s ) ,µ ( t )) = O p ( h µ + h C + log nn + log nnmh µ + log nnm h C ) . (10) Corollary 4.2.

Suppose that Assumptions 2.1, 2.2, 3.1, 4.1, 4.2 and 4.3 hold. (a)

When m ≍ n / or m ≫ n / , with h µ ≍ h C ≍ n − / , one has sup ( s,t )∈T ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G ( µ ( s ) ,µ ( t )) = O p ( log nn ) . SIMULATION STUDIES

Intrinsic Riemannian Functional Data Analysis

When m ≪ n / , with h µ ≍ n − / m − / ( log n ) / and h C ≍ n − / m − / ( log n ) / , one has sup ( s,t )∈T ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G ( µ ( s ) ,µ ( t )) = O p ( ( log n ) / n / m / ) . The rates in the above again match the uniform rates of Zhang and Wang (2016). It also coincides withthe rate in Dai and Müller (2018). It is interesting to see that, when m ≪ n / , the choice of h µ in thecorollary is the same as the optimal choice implied by Proposition 4.1, which suggests that no oversmoothingis needed in order to reach the optimal uniform rate for the covariance estimator ˆ C . This is because, thelocal uniform result of Proposition 4.2 and thus the condition h C = O ( h µ ) are not required, as the role ofProposition 4.2 in the analysis is now played by Proposition 4.1. We consider three diﬀerent manifolds for illustrating the numerical properties of the proposed covarianceestimator (7) in Section 3; the numerical performance of the mean estimator can be found in Dai et al.(2020). Namely, they are the two-dimensional unit sphere S , the manifold Sym + LC of symmetric positive-deﬁnite 2 × + AF of symmetricpositive-deﬁnite 2 × + LC and Sym + AF sharethe same collection of matrices, they are endowed with diﬀerent Riemannian metric tensors and thus havefundamentally diﬀerent Riemannian geometry. We set T = [ , ] . The sampling rate m i is randomly sampledfrom Poisson ( m ) +

2, where Poisson ( m ) is a Poisson distribution with parameter m . Conditional on m i , T i , . . . , T im i are i.i.d. sampled from the uniform distribution Uniform ( , ) . The random process X and itsmean and covariance functions are described below. Sphere S . We parameterize S = {( x, y, z ) ∈ R ∶ x + y + z = } by the polar coordinate system x ( u, v ) = cos ( u ) y ( u, v ) = sin ( u ) cos ( v ) z ( u, v ) = sin ( u ) sin ( v ) for u ∈ [ , π ] and v ∈ [ , π ) . This coordinate system also gives rise to a local chat φ ∶ U → [ , π ) × [ , π ) on U = S /{(− , , )} . Let B ( t ) = ∂φ∂u ∣ µ ( t ) and B ( t ) = ∂φ∂v ∣ µ ( t ) . The random process X is then given by X ( t ) = Exp µ ( t ) ( tZ B ( t ) + tZ B ( t )) with Z , Z i.i.d. ∼ Uniform (− . , . ) . The mean curve µ of X is µ ( t ) = ( u ( t ) , v ( t )) = ( , πt / ) in the abovepolar coordinate. The covariance function is C( s, t ) = st I under the frame ( B , B ) , where I denotes the2 × Y ij = Exp µ ( T ij ) {( T ij Z i + ε ij ) B ( T ij ) + ( T ij Z i + ε ij ) B ( T ij )} , Note that the extra term 1 /( nmh C ) in Dai and Müller (2018) is dominated by 1 / n + /( nm h C ) due to the inequality ofarithmetic and geometric means, i.e., √ ab ≤ ( a + b )/ Lin, Shao and Yao where Z i , Z i i.i.d. ∼ Uniform (− . , . ) , and ε ij i.i.d. ∼ Uniform (− a, a ) with a chosen to make SNR = ∶= E ∫ T ∥ Log µ ( t ) X ( t )∥ µ ( t ) d t E ∫ T ∥ ε ( t )∥ µ ( t ) d t . (11) Manifold

Sym + LC . The population X is set to be X ( t ) = ⎛⎝ e t + tZ t + tZ e t + tZ ⎞⎠ ⎛⎝ e t + tZ t + tZ e t + tZ ⎞⎠ with Z , Z i.i.d. ∼ Uniform (− . , . ) . The mean µ is a geodesic with µ ( ) = I and the covariance C( s, t ) can be represented as a 3 × Y ij = ⎛⎝ e T ij + T ij Z i + ε ij T ij + T ij Z i + ε ij e T ij + T ij Z i + ε ij ⎞⎠ ⎛⎝ e T ij + T ij Z i + ε ij T ij + T ij Z i + ε ij e T ij + T ij Z i + ε ij ⎞⎠ where Z ,i , Z ,i , Z ,i i.i.d. ∼ Uniform (− . , . ) and ε ij i.i.d. ∼ Uniform (− a, a ) with a set to satisfy SNR = Manifold

Sym + AF . The random process X ( t ) is set to X ( t ) = ⎛⎜⎜⎜⎝ e t + tZ + e t + tZ √ e t + tZ − √ e t + tZ √ e t + tZ − √ e t + tZ e t + tZ + e t + tZ ⎞⎟⎟⎟⎠ , for Z , Z i.i.d. ∼ Uniform (− . , . ) . The mean µ ( t ) is a geodesic starting at I , and the covariance C( s, t ) = E { Log µ ( s ) X ( s ) ⊗ Log µ ( t ) X ( t )} can be represented as a 3 × Y ij = ⎛⎜⎜⎜⎝ e T ij + T ij Z i + ε ij + e T ij + T ij Z i + ε ij √ e T ij + T ij Z i + ε ij − √ e T ij + T ij Z i + ε ij √ e T ij + T ij Z i + ε ij − √ e T ij + T ij Z i + ε ij e T ij + T ij Z i + ε ij + e T ij + T ij Z i + ε ij ⎞⎟⎟⎟⎠ , where Z i , Z i i.i.d. ∼ Uniform (− . , . ) and ε ij i.i.d. ∼ Uniform (− a, a ) with a set to satisfy SNR = n = , ,

400 and m = , , , h µ and h C are selected by GCVproposed in Dai et al. (2020). Estimation quality is measured by relative mean uniform integrated error(rMUIE) and relative root mean integrated squared error (rRMISE), deﬁned byrMUIE ∶= E sup s,t ∈T ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G sup s,t ∈T ∥C( s, t )∥ G , rRMISE ∶= { E ∫ T ∥ P ( µ ( s ) ,µ ( t ))( ˆ µ ( s ) , ˆ µ ( t )) ˆ C( s, t ) − C( s, t )∥ G d s d t } / {∫ T ∥C( s, t )∥ G } / . (12)20 APPLICATION

Intrinsic Riemannian Functional Data Analysis n rMUIE m = m = m = m = S

100 23.46(9.98) 21.01(8.60) 18.48(6.35) 18.12(6.32)200 18.23(7.22) 16.68(6.04) 15.08(4.86) 14.01(4.37)400 15.21(4.14) 13.64(4.45) 12.29(3.37) 11.79(3.31)Sym + LC

100 29.94(14.75) 26.09(9.62) 22.50(5.78) 22.45(6.85)200 22.85(7.04) 20.37(7.07) 17.44(5.39) 16.44(4.02)400 17.69(7.74) 15.37(3.84) 13.94(3.05) 13.60(2.61)Sym + AF

100 25.26(13.02) 22.18(11.07) 20.31(8.00) 18.15(5.95)200 20.15(8.86) 17.14(6.61) 14.09(5.00) 13.49(4.83)400 15.82(7.03) 14.39(5.66) 12.74(4.37) 12.61(3.55)Table 2: rRMISE and its Monte Carlo standard errors under diﬀerent settingsmanifold n rRMISE m = m = m = m = S

100 20.95(2.15) 19.36(2.50) 16.95(1.67) 16.94(1.97)200 16.33(1.56) 15.34(1.51) 14.28(1.19) 13.20(1.43)400 13.82(1.12) 12.80(1.19) 11.22(1.02) 10.97(1.10)Sym + LC

100 26.32(2.42) 23.33(1.98) 21.20(1.57) 20.98(1.71)200 20.76(1.58) 17.92(1.39) 16.56(1.15) 16.12(10.2)400 16.20(1.15) 14.37(0.90) 13.55(0.82) 13.18(0.93)Sym + LC

100 20.96(2.42) 18.97(1.95) 18.84(1.92) 17.58(1.72)200 16.67(1.61) 15.23(1.21) 13.22(1.35) 12.70(1.29)400 13.20(1.19) 12.87(1.22) 11.17(0.98) 10.84(1.00)The results, summarized in Tables 1 and 2, show that the estimation errors in terms of both rMUIE andrRMISE decrease as n or m increases, and thus demonstrate the eﬀectiveness of the proposed estimationmethod. A phase transition phenomenon is also observed: When m is increased from 5 to 10 or 20, theerrors in terms of both rMUIE and rRMISE decrease substantially, while when m is further increased to30, the decrease in errors is marginal. This phenomenon, predicted by our theoretical analysis in Section 4,suggests that for a ﬁxed sample size, when m = m =

10 the errors are primarily due to the low samplingrate m , while when m =

30 or higher the errors are mainly contributed by the limited sample size.

We apply the proposed framework to analyze longitudinal diﬀusion tensors from Alzheimer’s Disease Neu-roimaging Initiative (ADNI). Diﬀusion tensor imaging (DTI), a special kind of diﬀusion-weighted magneticresonance imaging, has been extensively adopted in brain science to investigate white matter tractography.In a DTI image, each brain voxel is associated with a 3 × http://adni.loni.usc.edu/ Lin, Shao and Yao and scientiﬁc research related to brain diseases. From a statistical perspective, diﬀusion tensors are modeledas random elements in Sym +⋆ ( ) , and have been studied extensively, such as Fillard et al. (2005); Arsignyet al. (2006); Lenglet et al. (2006); Pennec et al. (2006); Fletcher and Joshi (2007); Dryden et al. (2009); Zhuet al. (2009); Pennec (2020), among many others. In these works Sym +⋆ ( ) is endowed with a Riemannianmetric or a non-Euclidean distance that aims to alleviate or completely eliminate swelling eﬀect (Arsignyet al., 2007). However, none of them consider the longitudinal aspect of diﬀusion tensors.We focus on the hippocampus, a brain region that plays an important role in memory and is central toAlzheimer’s disease (Lindberg et al., 2012), and only include in the study subjects with at least four properlyrecorded DTI images. This results in a sample of n =

177 subjects with age ranging from 55.2 to 93.5.Among them, 42 subjects are cognitively normal (CN), while the others (AD) developed one of early mildcognitive impairment, mild cognitive impairment, late mild cognitive impairment and Alzheimer’s disease.On average, there are m = . +⋆ ( ) with the Log-Cholesky metric (Lin, 2019) and turnit into a Riemannian manifold of nonpositive sectional curvature. Under the Log-Cholesky framework thatavoids swelling eﬀect and meanwhile enjoys computational eﬃciency, the Fréchet mean of the tensors insidehippocampus is calculated for each DTI scan, which represents a coarse-grain summary of hippocampaldiﬀusion tensors. As we shall see below, this averaged mean tensor is already capable of illuminating somediﬀerences of the diﬀusion dynamics between the AD and CN groups.For the Fréchet mean trajectories, the selected bandwidths via leave-one-out cross-validation are 4.2 and5.7 for the AD and CN groups, respectively. The resulting trajectories, depicted in Figure 7, where eachtensor is visualized as an ellipsoid whose volume corresponds to the determinant of the tensor, suggest that,overall the averaged hippocampal diﬀusion tensor remains rather stable for the CN group; the tensors at age55.2 and 93.5 that markedly depart from the others could be due to boundary eﬀect, i.e., there are relativelyless data around the two boundary time points. In contrast, for the AD group, the dynamic tensor variesmore substantially, and the diﬀusion (measured by the determinant of tensors and indicated by volumeof ellipsoids) seems larger. Also, the mean trajectory of the AD group exhibits slightly lower fractionalanisotropy at each time point. Fractional anisotropy, deﬁned for each 3 × A by FA = ¿``(cid:192) ( ρ − ¯ ρ ) + ( ρ − ¯ ρ ) + ( ρ − ¯ ρ ) ρ + ρ + ρ where ρ , ρ , ρ are eigenvalues of A and ¯ ρ = ( ρ + ρ + ρ )/

3, describes the degree of anisotropy of diﬀusionof water molecules. It is close to zero unless movement of the water molecules is constrained by structuressuch as white matter ﬁbers. The below-normal fractional anisotropy might then suggest some damage onthe hippocampal structure for the AD group.For the covariance function, the selected bandwidth is 3.5 for the AD group and 4.5 for the CN group.Shown in Figure 8 are the ﬁrst three intrinsic Riemannian functional principal components that are mappedon Sym +⋆ ( ) via the Riemannian exponential maps Exp ˆ µ ( t ) . They respectively account for 40.2%, 22.2%and 7.0% of variance for the AD group, and 40.7%, 19.4% and 8.0% of variance for the CN group. Thesecomponents, compared side by side in Figure 8, exhibit diﬀerent patterns between the two cohorts. Forinstance, the Riemannian functional principal components of the AD group show relatively larger diﬀusionand more dynamics over time. In addition, they exhibit relatively lower fractional anisotropy, which suggeststhat individual diﬀusion tensor trajectories in the AD group tend to deviate from their mean trajectory alongthe direction with below-normal fractional anisotropy.22 PROOFS

Intrinsic Riemannian Functional Data Analysis Figure 7: Mean functions. Top: AD group; bottom: CN group. The color encodes fractional anisotropy.In summary, via the proposed framework we ﬁnd that there are diﬀerences in longitudinal developmentof hippocampus between cognitively normal subjects and patients with Alzheimer’s disease. In the aboveanalysis, the averaged hippocampal diﬀusion tensors do not capture the rich spatial information of all tensorswithin the hippocampus. To account for such information, all hippocampal diﬀusion tensors shall be takeninto consideration by being modeled as an Sym +⋆ ( ) -valued function deﬁned on the hippocampal regionwhich is a three-dimensional domain of R . Along with the temporal dynamics, for each subject there arespatiotemporal Riemannian manifold-valued data, with the sparseness along the temporal direction. Ourframework can be extended to analyze such data, but the extension requires substantial development and isleft for future study. A Proofs

Proof of Theorem 2.1.

According to the deﬁnition of C( s, t ) , for any u, v ∈ T ( µ ) , we have ⟪∫ T C( s, ⋅) u ( s ) d s, v ⟫ µ = ∫ T ∫ T ⟨C( s, t ) u ( s ) , v ( t )⟩ µ ( t ) d s d t = ∫ T ∫ T E {⟨ Log µ ( s ) X ( s ) , u ( s )⟩ µ ( s ) ⟨ Log µ ( t ) X ( t ) , v ( t )⟩ µ ( t ) } d s d t = E {∫ T ⟨ Log µ ( s ) X ( s ) , u ( s )⟩ µ ( s ) d s ∫ T ⟨ Log µ ( t ) X ( t ) , v ( t )⟩ µ ( t ) d t }= E (⟪ Log µ X, u ⟫ µ ⟪ Log µ X, v ⟫ µ )=⟪ C u, v ⟫ µ . According to Riesz representation theory, ( C u )( t ) = ∫ T C( s, t ) u ( s ) d s . Proof of Theorem 2.2.

To see that {( π − ( U α × U β ) , ϕ α,β ) ∶ ( α, β ) ∈ J } is a smooth atlas, it is suﬃcient tocheck the smoothness of the transition maps. Suppose that ( p, q, ∑ dj,k = v jk B α,j ( p ) ⊗ B β,k ( q )) ∈ π − ( U α × U β ) ∩ π − ( U ˜ α × U ˜ β ) is also represented by ( p, q, ∑ dj,k = ˜ v jk B ˜ α,j ( p ) ⊗ B ˜ β,k ( q )) . The transformation from thecoeﬃcient vector v = ( v , v , . . . , v dd ) to ˜ v = ( ˜ v , ˜ v , . . . , ˜ v dd ) is smooth, since ˜ v = {J ⊺ α ( p ) ⊗ J ⊺ β ( q )} v and J ⊺ α ( p ) , J ⊺ β ( q ) and their Kronecker product J ⊺ α ( p ) ⊗ J ⊺ β ( q ) are respectively smooth in p , q and ( p, q ) , where J α (⋅) denotes the Jacobian matrix that transforms the basis { B α, (⋅) , . . . , B α,d (⋅)} into { B ˜ α, (⋅) , . . . , B ˜ α,d (⋅)} .According to the vector bundle construction lemma (Lemma 5.5, Lee, 2002), it is suﬃcient to check thatwhen U ∶= ( U α × U β ) ∩ ( U ˜ α × U ˜ β ) ≠ ∅ for some indices α, β, ˜ α, ˜ β , the composite map Φ α,β ○ Φ − α, ˜ β from U × R d to itself has the form Φ α,β ○ Φ − α, ˜ β = ( p, q, J ( p, q ) v ) for a smooth map J ∶ U → GL ( d , R ) , where GL ( d , R ) isthe collection of invertible real d × d matrices. From above discussion, we have J ( p, q ) = J ⊺ α ( p ) ⊗ J ⊺ β ( q ) issmooth in ( p, q ) . In addition, J ( p, q ) ∈ GL ( d , R ) since both J ⊺ α ( p ) and J ⊺ α ( q ) are invertible and so is their234 Lin, Shao and Yao

A PROOFS

Figure 8: The ﬁrst principal component of AD group (Row 1) and CN group (Row 2), the second principalcomponent of AD group (Row 3) and CN group (Row 4), and the third principal component of AD group(Row 5) and CN group (Row 6). The color encodes fractional anisotropy.Kronecker product. Note that the vector bundle construction lemma also asserts that any compatible atlasfor M gives rise to the same smooth structure on L . Proof of Theorem 2.3.

One can show that the parallel transport deﬁned in (4) is a genuine parallel transportsatisfying the property of Deﬁnition A.54 of Rodrigues and Capelas de Oliveira (2016) on the vector bundle.Then the conclusion directly follows from Deﬁnitions A.55 and A.57 of Rodrigues and Capelas de Oliveira(2016) and the remarks right below them.

Proof of Theorem 2.4.

We ﬁrst show that the deﬁnition (6) is invariant to the choice of orthonormal bases.To this end, ﬁx an orthonormal basis in T q M , and suppose that { ˜ e , . . . , ˜ e d } is another orthonormal basisin T p M and is related to { e , . . . , e d } by a d × d unitary matrix O . Let A and A be the respective matrixrepresentation of L and L under the basis { e , . . . , e d } . Then their matrix representation under the basis { ˜ e , . . . , ˜ e d } is ˜ A = O A and ˜ A = O A , respectively. The inner product G p,q ( L , L ) is then calculated bytr ( ˜ A ⊺ ˜ A ) = tr ( A ⊺ O ⊺ O A ) = tr ( A ⊺ A ) , which shows that G p,q ( L , L ) is invariant to the choice of bases in T p M . Its invariance to the choice of bases in T q M can be proved in a similar fashion.The smoothness of G can be established by an argument similar to the one leading to Theorem 2.2 inconjunction with smoothness of the trace of matrices. To see that the parallel transport (4) preserves thebundle metric and thus deﬁnes isometries among ﬁbers of L , i.e., for any L , L ∈ L ( p , q ) , G ( p ,q ) ( L , L ) = G ( p ,q ) ( P ( p ,q )( p ,q ) L , P ( p ,q )( p ,q ) L ) , suppose that { e , . . . , e d } is an orthogonal basis of T p M . Then {P p p e , . . . , P p p e d } is an orthogonal basis of24 PROOFS

Intrinsic Riemannian Functional Data Analysis T p M . Therefore, G ( p ,q ) ( P ( p ,q )( p ,q ) L , P ( p ,q )( p ,q ) L ) = d ∑ k = ⟨( P ( p ,q )( p ,q ) L )(P p p e k ) , ( P ( p ,q )( p ,q ) L )(P p p e k )⟩ q = d ∑ k = ⟨P q q [ L ( e k )] , P q q [ L ( e k )]⟩ q = d ∑ k = ⟨ L ( e k ) , L ( e k )⟩ q = G ( p ,q ) ( L , L ) , which completes the proof. Proof of Proposition 3.1.

Suppose that ( ˜ B ij, , . . . , ˜ B ij,d ) is another orthonormal basis for T ˆ µ ( T ij ) M , and O ij is the unitary matrix that relates ( B ij, , . . . , B ij,d ) to ( ˜ B ij, , . . . , ˜ B ij,d ) . Then the coeﬃcient vectors ˜ z ij and˜ g k,ij of Log ˆ µ ( T ij ) Y ij and ˆ ψ k ( T ij ) under the basis ( ˜ B ij, , . . . , ˜ B ij,d ) are linked to z ij and g k,ij by ˜ z ij = O ij z ij and ˜ g k,ij = O ij g k,ij , respectively. Similarly, ˜ C i,jl is linked to C i,jl by ˜ C i,jl = O ij C i,jl O ⊺ il . More concisely, ifwe put O i = ⎛⎜⎜⎜⎜⎜⎝ O i O i ⋱ O im i ⎞⎟⎟⎟⎟⎟⎠ , then ˜ z i = O i z i , ˜ g k,i = O i g k,i and ˜Σ i = O i Σ i O ⊺ i , which are the counterpart of z i , g k,i and Σ i under the bases ( ˜ B ij, , . . . , ˜ B ij,d ) , respectively. Note that ˜Σ − i = ( O i Σ i O ⊺ i ) − = O −⊺ i Σ − i O − i = O i Σ − i O ⊺ i since O ij are unitarymatrices and thus O − i = O ⊺ i . Now we see that ˜ g ⊺ k,i ˜Σ − i ˜ z i = g ⊺ k,i O ⊺ i O i Σ − i O ⊺ i O i z i = g ⊺ k,i Σ − i z i , which clearlyimplies that the scores ˆ ξ ik calculated under the bases ( ˜ B ij, , . . . , ˜ B ij,d ) is identical to the one computed underthe bases ( B ij, , . . . , B ij,d ) . Proof of Lemma 4.1.

Notice that ∥P p q P q q Log q y − P p p Log p y ∥ p =∥P q p P p p P p q P q q Log q y − P q p Log p y ∥ q ≤∥P q p P p p P p q P q q Log q y − Log q y ∥ q + ∥ Log q y − P q p Log p y ∥ q , where the equality follows from the fact that parallel transport preserves the inner product. Note that theoperator P q p P p p P p q P q q moves a tangent vector parallelly along a geodesic quadrilateral deﬁned by the thepoints p , p , q , q . The holonomy theory (Eq (6), Nichols et al., 2016) and the compactness of G suggeststhat there exists a constant c > G , such that for any v ∈ T q M with ∥ v ∥ q ≤ diam (G) , ∥P q p P p p P p q v − v ∥ q ≤ c ∥ Log q p ∥ q = c d M ( p , q ) , ∥P q p P p q P q q v − v ∥ q ≤ c ∥ Log q p ∥ q = c d M ( p , q ) , which further imply that ∥P q p P p p P p q P q q v − v ∥ q = ∥(P q p P p p P p q )(P q p P p q P q q ) v − v ∥ q ≤ ∥(P q p P p p P p q )(P q p P p q P q q ) v − (P q p P p q P q q ) v ∥ q + ∥(P q p P p q P q q ) v − v ∥ q ≤ c ( d M ( p , q ) + d M ( p , q )) . Lin, Shao and Yao

A PROOFS

According to Theorem 3 in Pennec (2019), we have ∥ Log q y − P q p Log p y ∥ q ≤ c ∥ Log q p ∥ q ≤ c d M ( p , q ) for some constant c > G . The proof is then completed by taking c = c + c . Proof of Proposition 4.2.

The proposition follows directly from Claims A.1 and A.2, and the observationsup τ ∶∣ τ − t ∣< h d M ( µ ( τ ) , ˆ µ ( τ )) ≤ sup τ ∶∣ τ − t ∣< h d M ( ˜ µ ( τ ) , µ ( τ )) + sup τ ∶∣ τ − t ∣< h d M ( ˆ µ ( τ ) , ˜ µ ( τ )) . Claim A.1.

If the assumptions in Proposition 4.2 hold, then sup t ∈T d M ( µ ( t ) , ˜ µ ( t )) = O ( h µ ) . To establish the above claim, we ﬁrst show thatlim n →∞ sup t ∈T d M ( ˜ µ ( t ) , µ ( t )) = . (13)To this end, we observe that u k ( t ) = E { K h µ ( T − t ) × ( T − t ) k } = ∫ K h µ ( s − t ) × ( s − t ) k f ( s ) d s = ∫ K ( u ) × u k h kµ × f ( t + uh µ ) d u = h kµ f ( t ) K k + h k + µ f ′ ( t ) K ,k + + O ( h k + µ ) , (14) τ y,k ( t ) ∶= E { K h µ ( T − t ) × ( T − t ) k ∣ Y = y } = h kµ g y ( t ) K k + h k + µ g ′ y ( t ) K ,k + + O ( h k + µ ) , (15)where K a,k ∶= ∫ K a ( u ) × u k du and the O ( h k + µ ) terms are uniform in t ∈ T . Simple computation andAssumption 4.1 suggest that there exists a constant c > n ,sup t ∈T ,y ∈K ∣ u ( t ) τ y, ( t )− u ( t ) τ y, ( t ) σ ( t ) − g y ( t ) f ( t ) ∣ < c h µ . (16)In addition, if F T,Y denotes the joint distribution of ( T, Y ) , F Y is the distribution of Y , and F Y ∣ T , F T ∣ Y are conditional distributions, then d F Y ∣ T ( t, z )/ d F Y ( z ) = g z ( t )/ f ( t ) , as shown in the proof of Theorem 3 ofPetersen and Müller (2019). This implies that E { d M ( Y ( t ) , y )} = F ( y, t ) = ∫ d M ( z, y ) d F Y ∣ T ( t, z ) = ∫ d M ( z, y ) g z ( t ) f ( t ) d F Y ( z ) = E { d M ( Y, y ) g Y ( t ) f ( t ) } . Based on the above, we deduce that˜ Q n ( y, t ) − E { d M ( Y ( t ) , y )}= E { w ( T, t, h µ ) d M ( Y, y )} − E { d M ( Y, y ) g Y ( t ) f ( t ) }= E { E { w ( T, t, h µ ) d M ( Y, y ) ∣ Y }} − E { d M ( Y, y ) g Y ( t ) f ( t ) }= E { d M ( Y, y ) ∫ T g Y ( s ) σ ( t ) K h µ ( s − t ) × [ u ( t ) − u ( t )( s − t )] d s } − E { d M ( Y, y ) g Y ( t ) f ( t ) }= E [ d M ( Y, y ){ u ( t ) τ Y, ( t )− u ( t ) τ Y, ( t ) σ ( t ) − g Y ( t ) f ( t ) }] , (17)26 PROOFS

Intrinsic Riemannian Functional Data Analysis t ∈T ,y ∈K ∣ ˜ Q n ( y, t ) − E { d M ( Y ( t ) , y )}∣ ≤ c diam (K) h µ . (18)According to Assumption 4.2(b), for any δ > n →∞ inf d M ( y, ˜ µ ( t ))> δ ,t ∈T { ˜ Q n ( y, t ) − ˜ Q n ( ˜ µ ( t ) , t )} > . Then there exist (cid:15) > n ≥ n > n ,inf d M ( y, ˜ µ ( t ))> δ ,t ∈T { ˜ Q n ( y, t ) − ˜ Q n ( ˜ µ ( t ) , t )} > (cid:15) and sup t ∈T ,y ∈K ∣ ˜ Q n ( y, t ) − E { d M ( Y ( t ) , y )}∣ < (cid:15) . These imply thatinf d M ( y, ˜ µ ( t ))> δ E { d M ( Y ( t ) , y )} ≥ inf d M ( y, ˜ µ ( t ))> δ ˜ Q n ( y, t ) − (cid:15) > ˜ Q n ( ˜ µ ( t ) , t ) + (cid:15) > E { d M ( Y ( t ) , ˜ µ ( t ))} + (cid:15) . This further implies that the Fréchet mean of Y ( t ) , which is the minimizer of F ∗ ( y, t ) = E { d M ( Y ( t ) , y )} asa function of y , is within distance δ of ˜ µ ( t ) . Since X ( t ) and Y ( y ) share the same Fréchet mean µ ( t ) , weconclude that d M ( ˜ µ ( t ) , µ ( t )) < δ for all t ∈ T , and thus (13) follows.Now, according to Assumption 4.2(d) and (13), we deduce that˜ Q n ( µ ( t ) , t ) − ˜ Q n ( ˜ µ ( t ) , t ) > C d M ( µ ( t ) , ˜ µ ( t )) . On one hand, since µ ( t ) is the minimizer of E { d M ( Y ( t ) , y )} , we have E { d M ( Y ( t ) , ˜ µ ( t ))}− E { d M ( Y ( t ) , µ ( t ))} ≥

0. Thus, [ ˜ Q n ( µ ( t ) , t ) − E { d M ( Y ( t ) , µ ( t ))}] − [ ˜ Q n ( ˜ µ ( t ) , t ) − E { d M ( Y ( t ) , ˜ µ ( t ))}] > C d M ( µ ( t ) , ˜ µ ( t )) . On the other hand, by (17), [ ˜ Q n ( µ ( t ) , t ) − E { d M ( Y ( t ) , µ ( t ))}] − [ ˜ Q n ( ˜ µ ( t ) , t ) − E { d M ( Y ( t ) , ˜ µ ( t ))}]≤ E [{ d M ( Y, µ ( t )) − d M ( Y, ˜ µ ( t ))} × { u ( t ) τ Y, ( t )− u ( t ) τ Y, ( t ) σ ( t ) − g Y ( t ) f ( t ) }]≤ c h µ d M ( µ ( t ) , ˜ µ ( t )) × diam (K) . Therefore, sup t ∈T d M ( µ ( t ) , ˜ µ ( t )) ≤ C − c diam (K) h µ = O ( h µ ) , as needed. Claim A.2.

If the assumptions in Proposition 4.2 hold, then for any ﬁxed t , sup τ ∶∣ τ − t ∣< h d M ( ˆ µ ( t ) , ˜ µ ( t )) = O p ( h − µ m − n − + n − ) . The proof of this claim is adapted from the proof of Theorem 4 of Petersen and Müller (2019), with extracare for the supremum sup τ ∶∣ τ − t ∣< h . We divide the proof of the claim into two steps. In the ﬁrst step, weshow that sup τ ∈ B ( t ; h ) d M ( ˆ µ ( τ ) , ˜ µ ( τ )) = o p ( ) , (19)where B ( t ; h ) = { τ ∶ ∣ τ − t ∣ < h } . In the second step, we derive the precise convergence rate.278 Lin, Shao and Yao

A PROOFS

Step 1:

Recall that ˆ w ( T ij , t, h µ ) = σ K h µ ( T ij − t ) × [ ˆ u ( t ) − ˆ u ( t )( T ij − t )] and w ( T, t, h µ ) = σ K h µ ( T − t ) × [ u ( t ) − u ( t )( T − t )] . Thenˆ Q n ( y, t ) − ˜ Q n ( y, t ) = ∑ i λ i ∑ j ˆ w ( T ij , t, h µ ) d M ( Y ij , y ) − E { w ( T, t, h µ ) d M ( Y, y )}= ∑ i λ i ∑ j { ˆ w ( T ij , t, h µ ) d M ( Y ij , y ) − w ( T ij , t, h µ ) d M ( Y ij , y )}+ ∑ i λ i ∑ j w ( T ij , t, h µ ) d M ( Y ij , y ) − E { w ( T, t, h µ ) d M ( Y, y )} . (20)To deal with the ﬁrst term of (20), we observe that ∣ ˆ w ( T ij , t, h µ ) − w ( T, t, h µ )∣ ≤ ∣ ˆ u ( t ) ˆ σ ( t ) − u ( t ) σ ( t ) ∣ K h µ ( T ij − t ) + ∣ ˆ u ( t ) ˆ σ ( t ) − u ( t ) σ ( t ) ∣( T ij − t ) K h µ ( T ij − t ) . (21)From (14) and the symmetry of the kernel function K , we havesup τ ∈ B ( t ; h ) u ( τ ) = sup τ ∈ B ( t ; h ) f ( τ ) + O ( h µ ) , sup τ ∈ B ( t ; h ) u ( τ ) = sup τ ∈ B ( t ; h ) f ′ ( τ ) K , h µ + O ( h µ ) , sup τ ∈ B ( t ; h ) u ( τ ) = sup τ ∈ B ( t ; h ) f ( τ ) K , h µ + O ( h µ ) . The estimator ˆ u k ( t ) is unbiased for u k ( t ) and E sup τ ∈ B ( t ; h ) ∣ ˆ u k ( τ ) − u k ( τ )∣ ≤ ∑ i ∑ j λ i E sup τ ∈ B ( t ; h ) K h µ ( T ij − τ ) × ( T ij − τ ) k = mn ∫ sup τ ∈ B ( t ; h ) K h µ ( s − τ ) × ( s − τ ) k f ( s ) d s = O ( h k − µ mn ) . With these, simple computation leads tosup τ ∈ B ( t ; h ) ∣ ˆ u ( τ ) ˆ σ ( τ ) − u ( τ ) σ ( τ ) ∣ = O p ( h − µ m − n − ) and sup τ ∈ B ( t ; h ) ∣ ˆ u ( τ ) ˆ σ ( τ ) − u ( τ ) σ ( τ ) ∣ = O p ( h − µ m − n − ) . (22)Besides, E sup τ ∈ B ( t ; h ) K h µ ( T ij − τ ) = O ( hh µ + ) = O ( ) , E sup τ ∈ B ( t ; h ) K h µ ( T ij − τ ) × ( T ij − τ ) = O (( hh µ + ) h µ ) = O ( h µ ) . Combining them with (21) and (22), we conclude that the ﬁrst term of (20) has the ratesup τ ∈ B ( t ; h ) ,y ∈K ∑ i λ i ∑ j { ˆ w ( T ij , τ, h µ ) − w ( T ij , τ, h µ )} d M ( Y ij , y ) = O p ( h − µ m − n − ) . PROOFS

Intrinsic Riemannian Functional Data Analysis g ( y, τ ) = m m ∑ j = w ( T j , τ, h µ ) d M ( Y j , y ) , which is Lipschitz continuous in ( y, τ ) , and consider the class of random variables { g ( y, τ ) ∶ y ∈ K , τ ∈ B ( t ; h )} . The elements in the class are viewed as measurable functions mapping the sample space Ω into R . The classhas an envelope function given by H = diam (K) m m ∑ j = sup τ ∈ B ( t ; h ) ∣ w ( T j , τ, h µ )∣ . From the analysis in Step 1 one can show that E { H } = O ( mh + ) . Now, according to Theorems 2.7.11 and2.14.2 in van der Vaart and Wellner (1996) E ∣ sup y ∈K τ ∈ B ( t ; h ) ∑ i λ i ∑ j [ w ( T ij , τ, h µ ) d M ( Y ij , y ) − E { w ( T, τ, h µ ) d M ( Y, y )}]∣ = O ( n − m − h − µ + n − ) . Combining this with the result for the ﬁrst term, we deduce that,sup y ∈K sup τ ∈ B ( t ; h ) ∣ ˆ Q n ( y, τ ) − ˜ Q n ( y, τ )∣ = O p ( h − µ m − n − + n − ) = o p ( ) . Now (19) follows from the argument in the proof of Lemma 2 in Petersen and Müller (2019), if we verifythat for any κ > δ → lim sup n →∞ Pr ⎧⎪⎪⎨⎪⎪⎩ sup d M ( y ,y )< δ,τ ∈ B ( t ; h ) ∣{ ˆ Q n ( y , τ ) − ˜ Q n ( y , τ )} − { ˆ Q n ( y , τ ) − ˜ Q n ( y , τ )}∣ > κ ⎫⎪⎪⎬⎪⎪⎭ = . One can show that sup τ ∈ B ( t ; h ) ∣ ˆ ω ( T ij , τ, h µ )∣ = O p ( ) and E sup τ ∈ B ( t ; h ) ∣ ω ( T, τ, h µ )∣ = O ( ) , which furtherimply that sup τ ∈ B ( t ; h ) ∣ ˆ Q n ( y , τ ) − ˆ Q n ( y , τ )∣ = O p ( d M ( y , y )) and sup τ ∈ B ( t ; h ) ∣ ˜ Q n ( y , τ ) − ˜ Q n ( y , τ )∣ = O ( d M ( y , y )) , and thus the above equation is veriﬁed. Step 2:

Deﬁne D ( Y ij , y, t ) ∶= d M ( Y ij , y ) − d M ( Y ij , ˜ µ ( t )) and S n ( y, t ) = ˆ Q n ( y, t ) − ˜ Q n ( y, t ) , then ∣ S n ( y, t ) − S n ( ˜ µ ( t ) , t )∣=∣ ∑ i λ i ∑ j ˆ w ( T ij , t, h µ )[ d M ( Y ij , y ) − d M ( Y ij , ˜ µ ( t ))] − E { w ( T, t, h µ )[ d M ( Y ij , y ) − d M ( Y ij , ˜ µ ( t ))]}∣=∣ ∑ i λ i ∑ j ˆ w ( T ij , t, h µ ) D ( Y ij , y, t ) − E { w ( T, t, h µ ) D ( Y, y, t )}∣≤ ∑ i λ i ∑ j ∣ ˆ w ( T ij , t, h µ ) − w ( T ij , t, h µ )∣ × ∣ D ( Y ij , y, t )∣ (23) + ∑ i λ i ∑ j ∣ w ( T ij , t, h µ ) D ( Y ij , y, t ) − E { w ( T ij , t, h µ ) D ( Y ij , y, t )}∣ . (24)Since sup { d M ( y, ˜ µ ( τ ))< δ,τ ∈ B ( t ; h )} ∣ D ( Y ij , y, τ )∣ ≤ δ × diam (K) , (23) can be bounded bysup d M( y, ˜ µ ( τ ))< δ τ ∈ B ( t ; h ) ∑ i λ i ∑ j ∣ ˆ w ( T ij , τ, h µ ) − w ( T ij , τ, h µ )∣ × ∣ D ( Y ij , y, τ )∣ = O p ( δh − µ m − n − ) . Lin, Shao and Yao

A PROOFS

Thus, for R <

0, if we deﬁne the event B R ∶= ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ sup d M( y, ˜ µ ( τ ))< δ τ ∈ B ( t ; h ) ∑ i λ i ∑ j ∣ ˆ w ( T ij , τ, h µ ) − w ( T ij , τ, h µ )∣ × ∣ D ( Y ij , y, τ )∣ < Rδh − µ m − n − ⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭ , then B R satisﬁes lim R →∞ lim sup n →∞ Pr { B R } = . To bound (24), we again utilize Theorems 2.7.11 and 2.14.2 in van der Vaart and Wellner (1996). Deﬁne g ( y, τ ) = m m ∑ j = w ( T j , τ, h µ ) d M ( Y j , y ) , which is Lipschitz continuous in ( y, τ ) , and consider the class of random variables { g ( y, τ ) − g ( ˜ µ ( τ ) , τ ) ∶ d M ( y, ˜ µ ( τ )) < δ and τ ∈ B ( t ; h )} . The elements in the class are viewed as measurable functions mapping the sample space Ω into R . The classhas an envelope function given by H = δm m ∑ j = sup τ ∈ B ( t ; h ) ∣ w ( T ij , τ, h µ )∣ . One can show that E { H } = O ( δ ( mh + )) . By the aforementioned theorems in van der Vaart and Wellner(1996), for suﬃciently small δ > E ∣ sup d M( y, ˜ µ ( τ ))< δ τ ∈ B ( t ; h ) ∑ i λ i ∑ j [ w ( T ij , τ, h µ ) D ( Y ij , y, τ ) − E { w ( T ij , τ, h µ ) D ( Y ij , y, τ )}]∣ = O ( δn − m − h − µ + δn − ) . Thus, there exists a constant c > E ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ I B R sup d M( y, ˜ µ ( τ ))< δ τ ∈ B ( t ; h ) ∣ S n ( y, τ ) − S n ( ˜ µ ( t ) , τ )∣⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭ ≤ ( c + R )( δn − m − h − µ + δn − ) . According to Assumption 4.2(d), for suﬃciently large n so that d M ( ˆ µ ( t ) , ˜ µ ( t )) < η for all t ∈ T , we have˜ Q n ( ˆ µ ( t ) , t ) − ˜ Q n ( ˜ µ ( t ) , t ) ≥ C d M ( ˆ µ ( t ) , ˜ µ ( t )) . Since ˆ µ ( t ) is the minimizer of ˆ Q n ( y, t ) , ˆ Q n ( ˜ µ ( t ) , t ) − ˆ Q n ( ˆ µ ( t ) , t ) ≥

0. Thus for suﬃciently large n so that d M ( ˆ µ ( t ) , ˜ µ ( t )) < η for all t ∈ T , we have the following inequality S n ( ˜ µ ( t ) , t ) − S n ( ˆ µ ( t ) , t ) ≥ C d M ( ˆ µ ( t ) , ˜ µ ( t )) . For any M >

0, deﬁne a n = n − m − h − µ + n − , B j ∶= { y ∶ j M a n ≤ sup τ ∈ B ( t ; h ) d M ( ˆ µ ( τ ) , ˜ µ ( τ )) ≤ j + M a n } PROOFS

Intrinsic Riemannian Functional Data Analysis B C ∶= { sup τ ∈ B ( t ; h ) d M ( ˆ µ ( τ ) , ˜ µ ( τ )) > η } . Then for any R > j satisfying η < j + M a n ≤ η ,Pr { sup τ ∈ B ( t ; h ) d M ( ˆ µ ( τ ) , ˜ µ ( τ )) > M a n }≤ j ≤ j ∑ j = Pr ( B j ∩ B R ) + Pr ( B C ∩ B R ) + Pr ( Ω / B R )≤ j ≤ j ∑ j = Pr ⎛⎝ B j ∩ { sup τ ∈ B ( t ; h ) ∣ S n ( ˜ µ ( τ ) , τ ) − S n ( ˆ µ ( τ ) , τ )∣ > C ( j M a n ) } ∩ B R ⎞⎠ + Pr { B C } + Pr ( Ω / B R )≤ j ≤ j ∑ j = Pr ( I B R sup d M ( y, ˜ µ ( τ ))≤ j + Ma n ,τ ∈ B ( t ; h ) ∣ S n ( ˜ µ ( τ ) , τ ) − S n ( y, τ )∣ > C ( j M a n ) }) + Pr { B C } + Pr ( Ω / B R )≤( C + R ) ∑ j = j + Ma n C ( j Ma n ) + Pr { B C } + Pr ( Ω / B R )≤ ( C + R ) C M + Pr { B C } + Pr ( Ω / B R ) . According to (19), lim n →∞ Pr { B C } =

0. Together with lim R →∞ lim sup n →∞ Pr { B R } =

1, it implies that forany (cid:15) >

0, there exist n and R such that for any n > n ,Pr { Ω / B R } < (cid:15) { B C } < (cid:15) . For M > ( C + R ) (cid:15)C , we havePr ⎧⎪⎪⎨⎪⎪⎩ sup τ ∈ B ( t ; h ) d M ( ˆ µ ( τ ) , ˜ µ ( τ )) > M a n ⎫⎪⎪⎬⎪⎪⎭ ≤ ( C + R ) C M + Pr { B C } + Pr ( Ω / B R ) ≤ (cid:15). Therefore, lim M →∞ lim sup n →∞ Pr ⎧⎪⎪⎨⎪⎪⎩ sup τ ∈ B ( t ; h ) d M ( ˆ µ ( τ ) , ˜ µ ( τ )) > M a n ⎫⎪⎪⎬⎪⎪⎭ = . This yields the desired convergence rate of Claim A.2 and completes the proof of Proposition 4.2.

Proof of Theorem 4.1.

We divide the proof into three steps. In the ﬁrst step, we show that the convergencerate depends on some terms of the numerator of (27). In the second step and third step, we address themseparately.

Step 1.

Deﬁne γ ∶= ˆ µ to simplify notation in the sequel. Since the parallel transport P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) preservesthe ﬁber metric according to Theorem 2.4, P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) ˆ C( s, t ) = arg β min β ,β ,β ∈ L ( µ ( s ) ,µ ( t )) { ∑ i ν i ∑ j ≠ k ∥ P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − β − β ( T ij − s ) − β ( T ik − t )∥ G ( µ ( s ) ,µ ( t )) K h C ( s − T ij ) K h C ( t − T ik )} . Simple computation shows that P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) ˆ C( s, t ) − C( s, t ) = ( S S − S ) R − ( S S − S S ) R + ( S S − S S ) R ( S S − S ) S − ( S S − S S ) S + ( S S − S S ) S , (25)312 Lin, Shao and Yao

A PROOFS where R ab and S ab are deﬁned as R ab ∶= ∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) ( T ij − sh C ) a ( T ik − th C ) b { P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − C( s, t )} ,S ab ∶= ∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) ( T ij − sh C ) a ( T ik − th C ) b , with $ ( s ′ , t ′ ) = K h C ( s − s ′ ) K h C ( t − t ′ ) for s ′ , t ′ ∈ T .Deﬁne ˜ C i,jk ∶= P ( µ ( s ) ,µ ( t ))( µ ( T ij ) ,µ ( T ik )) ( Log µ ( T ij ) Y ij ⊗ Log µ ( T ik ) Y ik ) ∈ L ( µ ( s ) , µ ( t )) . We consider the decomposi-tion P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − C( s, t ) = { P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − ˜ C i,jk } + { ˜ C i,jk − C( s, t )} (26)and split R ab into R ab = R ab, + R ab, accordingly, where R ab, ∶= ∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) ( T ij − sh C ) a ( T ik − th C ) b { P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − ˜ C i,jk } ,R ab, ∶= ∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) ( T ij − sh C ) a ( T ik − th C ) b { ˜ C i,jk − C( s, t )} . Combining (25) and (26), we can see that P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) ˆ C( s, t ) − C( s, t ) = ( S S − S )[ R , + R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S ]( S S − S ) S − ( S S − S S ) S + ( S S − S S ) S − ( S S − S S )[ R , + R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S ]( S S − S ) S − ( S S − S S ) S + ( S S − S S ) S + ( S S − S S )[ R , + R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S ]( S S − S ) S − ( S S − S S ) S + ( S S − S S ) S . (27)According to Lemma 4 in Zhang and Wang (2016),sup s,t ∈T ∣ S ab − E S ab ∣ = o p ( ) and sup s,t ∈T E S ab ≍ . (28)Therefore, the convergence rate of (25) depends on R ab, and R ab, − ∂ s C( s, t ) h C S a + ,b − ∂ t C( s, t ) h C S a,b + . Step 2.

In this step, we will derive the rate of R ab, in Equation (27). According to the deﬁnitionof P in (4), the ﬁrst part in Equation (26) is (P µ ( s ) γ ( s ) P γ ( s ) γ ( T ij ) Log γ ( T ij ) Y ij ) ⊗ (P µ ( t ) γ ( t ) P γ ( t ) γ ( T ik ) Log γ ( T ik ) Y ik ) −(P µ ( s ) µ ( T ij ) Log µ ( T ij ) Y ij ) ⊗ (P µ ( t ) µ ( T ik ) Log µ ( T ik ) Y ik ) . Then according to Assumption 4.1(b) and Lemma 4.1, its rateis ∥ P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − ˜ C i,jk ∥ G = O ( sup τ ∶∣ τ − s ∣< h C or ∣ τ − t ∣< h C d M ( γ ( τ ) , µ ( τ ))) . By Proposition 4.2, we conclude that R ab, = O P ( h µ + √ n + nmh µ ) . Step 3.

In this step, we ﬁrst analyze the term R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S in (27), which equalsto U ∶= ∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) { ˜ C i,jk − C( s, t ) − ∂ s C( s, t )( T ij − s ) − ∂ t C( s, t )( T ik − t )} . PROOFS

Intrinsic Riemannian Functional Data Analysis T = { T ij ∶ i = , . . . , n, j = , . . . , m i } and observe that E ( ˜ C i,jk ∣ T ) = P ( µ ( s ) ,µ ( t ))( µ ( T ij ) ,µ ( T ik )) C( T ij , T ik ) . In addition, since C is twice diﬀerentiable and the parallel transport P is depicted by a partial diﬀerentialequation, we have the following Taylor expansion at ( s, t ) , P ( µ ( s ) ,µ ( t ))( µ ( T ij ) ,µ ( T ik )) C( T ij , T ik )= C( s, t ) + ∂ s C( s, t )( T ij − s ) + ∂ t C( s, t )( T ik − t ) + O ( h C ) (29)for all T ij , T ik such that ∣ T ij − s ∣ < h C and ∣ T ik − t ∣ < h C , where O ( h C ) is uniform over all T ij and T ik due toAssumption 4.3 and the compactness of K . Then we further deduce that E ( U ) = E ⎧⎪⎪⎨⎪⎪⎩ E ⎡⎢⎢⎢⎢⎣∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) { ˜ C i,jk − C( s, t ) − ∂ s C( s, t )( T ij − s ) − ∂ t C( s, t )( T ik − t )} ∣ T ⎤⎥⎥⎥⎥⎦⎫⎪⎪⎬⎪⎪⎭= E ⎡⎢⎢⎢⎢⎣∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) { P ( µ ( s ) ,µ ( t ))( µ ( T ij ) ,µ ( T ik )) C( T ij , T ik ) − C( s, t ) − ∂ s C( s, t )( T ij − s ) − ∂ t C( s, t )( T ik − t )}⎤⎥⎥⎥⎥⎦= E ⎧⎪⎪⎨⎪⎪⎩∑ i ν i ∑ j ≠ k $ ( T ij , T ik ) × O ( h C )⎫⎪⎪⎬⎪⎪⎭ = O ( h C ) . Next, the independence of trajectories implies that E ∥ U − E U ∥ G = n ∑ i = v i E XXXXXXXXXXX∑ j ≠ k ( U i,jk − E U i,jk )XXXXXXXXXXX G ≤ n ∑ i = v i E XXXXXXXXXXX∑ j ≠ k U i,jk XXXXXXXXXXX G , (30)where U i,jk = $ ( T ij , T ik ){ ˜ C i,jk − P ( µ ( s ) ,µ ( t ))( µ ( T ij ) ,µ ( T ik )) C( T ij , T jk )} . To ease notation, we denote ∆ i,jk = ˜ C i,jk − P ( µ ( s ) ,µ ( t ))( µ ( T ij ) ,µ ( T ik )) C( T ij , T ik ) and H ( s , s , t , t ) = K h C ( s − s ) K h C ( s − s ) K h C ( t − t ) K h C ( t − t ) . Then, wehave XXXXXXXXXXX∑ j ≠ k U i,jk XXXXXXXXXXX G = ∑ j ≠ k H ( T ij , T ij , T ik , T ik )∥ ∆ i,jk ∥ G + ∑ j ∑ k ≠ k H ( T ij , T ij , T ik , T ik ) G ( µ ( s ) ,µ ( t )) ( ∆ i,jk , ∆ i,jk )+ ∑ k ∑ j ≠ j H ( T ij , T ij , T ik , T ik ) G ( µ ( s ) ,µ ( t )) ( ∆ i,j k , ∆ i,j k )+ ∑ j ≠ j ,k ≠ k H ( T ij , T ij , T ik , T ik ) G ( µ ( s ) ,µ ( t )) ( ∆ i,j k , ∆ i,j k )≤ c K ∑ j ≠ k H ( T ij , T ij , T ik , T ik ) + c K ∑ j ∑ k ≠ k H ( T ij , T ij , T ik , T ik )+ c K ∑ k ∑ j ≠ j H ( T ij , T ij , T ik , T ik ) + c K ∑ j ≠ j ,k ≠ k H ( T ij , T ij , T ik , T ik ) , where c K > K . This implies that E XXXXXXXXXXX∑ j ≠ k U i,jk XXXXXXXXXXX G ≤ c K c K,f ( m h − C + m h − C + m ) ≤ c K c K,f ( m h − C + m ) , (31)334 Lin, Shao and Yao

REFERENCES where c K,f > K and the density function f of T . Combining(30), (31) and v i ≡ /( nm ( m − )) , we deduce that E ∥ U − E U ∥ G = O ( n − + n − m − h − C + n − m − h − C ) . Thisresult, together with E U = O ( h C ) and Markov inequality, shows that R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S = O P ( h C + n − / + n − / m − h − C ) .Similar arguments can show that the terms R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S and R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S in (27) are of the same order. The equation (9) is then obtained by inserting the results inSteps 2 and 3 into Step 1. Proof of Theorem 4.2.

Similar to the proof of Theorem 4.1, we only need to consider the uniform rate of theterm R , + R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S in (27).Due to boundedness of K and Lemma 4.1, we havesup s,t ∥P ( µ ( s ) ,µ ( t ))( γ ( s ) ,γ ( t )) P ( γ ( s ) ,γ ( t ))( γ ( T ij ) ,γ ( T ik )) ˆ C i,jk − ˜ C i,jk ∥ G ≤ C sup τ d M ( γ ( τ ) , µ ( τ )) . Therefore, according to (28) and Proposition 4.1 we deduce thatsup s,t ∥ R , ∥ G ≤ c ( sup s,t ∣ S ∣) ( sup τ d M ( γ ( τ ) , µ ( τ )) = O p ⎛⎝ h µ + ¿``(cid:192) log nnmh µ + log nn ⎞⎠ for a universal constant c > K .The uniform convergence rates of R , − ∂ s C( s, t ) h C S − ∂ t C( s, t ) h C S and other similar terms areobtained by the same arguments as in Theorem 5.2 of Zhang and Wang (2016), except that no truncationargument is needed due to Assumption 4.1(b). References

Afsari, B. (2011). Riemannian L p center of mass: Existence, uniqueness, and convexity. Proceedings of theAmerican Mathematical Society , 139(2):655–673.Aneiros, G., Cao, R., Fraiman, R., Genest, C., and Vieu, P. (2019). Recent advances in functional dataanalysis and high-dimensional statistics.

Journal of Multivariate Analysis , 170:3–9.Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2006). Log-Euclidean metrics for fast and simplecalculus on diﬀusion tensors.

Magnetic Resonance in Medicine , 56(2):411–421.Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2007). Geometric means in a novel vector spacestructure on symmetric positive-deﬁnite matrices.

SIAM Journal of Matrix Analysis and Applications ,29(1):328–347.Bhattacharya, R. and Patrangenaru, V. (2003). Large sample theory of intrinsic and extrinsic sample meanson manifolds. I.

The Annals of Statistics , 31(1):1–29.Bhattacharya, R. and Patrangenaru, V. (2005). Large sample theory of intrinsic and extrinsic sample meanson manifolds. II.

The Annals of Statistics , 33(3):1225–1259.Cornea, E., Zhu, H., Kim, P., and Ibrahim, J. G. (2017). Regression models on Riemannian symmetricspaces.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79(2):463–482.Dai, X., Lin, Z., and Müller, H.-G. (2020+). Modeling longitudinal data on riemannian manifolds.

Biomet-rics , page to appear.Dai, X. and Müller, H.-G. (2018). Principal component analysis for functional data on Riemannian manifoldsand spheres.

Annals of Statistics , 46:3334–3361. 34

EFERENCES

Intrinsic Riemannian Functional Data Analysis

The Annals of Applied Statistics , 3(3):1102–1123.Dubey, P. and Müller, H.-G. (2020). Functional models for time-varying random objects.

Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , 82(2):275?–327.Faraway, J. J. (2014). Regression for non-Euclidean data using distance matrices.

Journal of AppliedStatistics , 41(11):2342–2357.Ferraty, F. and Vieu, P. (2006).

Nonparametric Functional Data Analysis: Theory and Practice . Springer-Verlag, New York.Fillard, P., Arsigny, V., Ayache, N., and Pennec, X. (2005). A Riemannian framework for the processing oftensor-valued images. In

International Workshop on Deep Structure, Singularities, and Computer Vision ,pages 112–123.Fletcher, P. T. (2013). Geodesic regression and the theory of least squares on riemannian manifolds.

Inter-national journal of computer vision , 105(2):171–185.Fletcher, T. and Joshi, S. (2007). Riemannian geometry for the statistical analysis of diﬀusion tensor data.

Signal Processing , 87:250–262.Hein, M. (2009). Robust nonparametric regression with metric-space valued output. In

Advances in NeuralInformation Processing Systems , pages 718–726.Hinkle, J., Fletcher, P. T., and Joshi, S. (2014). Intrinsic polynomials for regression on Riemannian manifolds.

Journal of Mathematical Imaging and Vision , 50(1-2):32–52.Hsing, T. and Eubank, R. (2015).

Theoretical foundations of functional data analysis, with an introductionto linear operators . John Wiley & Sons.Kokoszka, P. and Reimherr, M. (2017).

Introduction to Functional Data Analysis . Chapman and Hall/CRC.Lee, J. M. (1997).

Riemannian Manifolds: An Introduction to Curvature . Springer-Verlag, New York.Lee, J. M. (2002).

Introduction to Smooth Manifolds . Springer, New York.Lenglet, C., Rousson, M., Deriche, R., and Faugeras, O. (2006). Statistics on the manifold of multivariatenormal distributions: Theory and application to diﬀusion tensor MRI processing.

Journal of MathematicalImaging and Vision , 25(3):423–444.Li, Y. and Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal componentanalysis in functional/longitudinal data.

The Annals of Statistics , 38(6):3321–3351.Lin, Z. (2019). Riemannian geometry of symmetric positive deﬁnite matrices via cholesky decomposition.

SIAM Journal on Matrix Analysis and Applications , 40(4):1353–1370.Lin, Z. and Müller, H.-G. (2019). Total variation regularized Frćhet regression for metric-space valued data. arxiv .Lin, Z. and Yao, F. (2019). Intrinsic Riemannian functional data analysis.

The Annals of Statistics ,47(6):3533–3577.Lindberg, O., Walterfang, M., Looi, J. C., Malykhin, N., Östberg, P., Zandbelt, B., Styner, M., Velakoulis,D., Örndahl, E., Cavallin, L., and Wahlund, L.-O. (2012). Shape analysis of the hippocampus in alzheimer’sdisease and subtypes of frontotemporal lobar degeneration.

Journal of Alzheimer’s Disease , 30(2):355–365.Moakher, M. (2005). A diﬀerential geometry approach to the geometric mean of symmetric positive-deﬁnitematrices.

SIAM Journal on Matrix Analysis and Applications , 26(3):735–747.Nichols, David, A., Vines, and Justin (2016). Properties of an aﬃne transport equation and its holonomy.

General Relativity and Gravitation , 48(127). 356

Lin, Shao and Yao

REFERENCES

Pelletier, B. (2006). Non-parametric regression estimation on closed Riemannian manifolds.

Journal ofNonparametric Statistics , 18(1):57–67.Pennec, X. (2019). Curvature eﬀects on the empirical mean in riemannian and aﬃne manifolds: a non-asymptotic high concentration expansion in the small-sample regime. arxiv .Pennec, X. (2020). Manifold-valued image processing with spd matrices. In

Riemannian Geometric Statisticsin Medical Image Analysis , pages 75–134. Elsevier.Pennec, X., Fillard, P., and Ayache, N. (2006). A Riemannian framework for tensor computing.

InternationalJournal of Computer Vision , 66(1):41–66.Petersen, A. and Müller, H.-G. (2019). Fréchet regression for random objects with Euclidean predictors.

TheAnnals of Statistics , 47(2):691–719.Prévôt, C. and Röckner, M. (2007).

A Concise Course on Stochastic Partial Diﬀerential Equations . Springer.Ramsay, J. O. and Silverman, B. W. (2005).

Functional Data Analysis . Springer Series in Statistics. Springer,New York, 2nd edition.Rodrigues, W. A. and Capelas de Oliveira, E. (2016).

The Many Faces of Maxwell, Dirac and EinsteinEquations . Springer.Schötz, C. (2019). Convergence rates for the generalized Fréchet mean via the quadruple inequality. arxiv .Shi, X., Styner, M., Lieberman, J., Ibrahim, J. G., Lin, W., and Zhu, H. (2009). Intrinsic regression modelsfor manifold-valued data. In

Medical Image Computing and Computer-Assisted Intervention - MICCAI ,volume 12, pages 192–199.Steinke, F., Hein, M., and Schölkopf, B. (2010). Nonparametric regression between general Riemannianmanifolds.

SIAM Journal on Imaging Sciences , 3(3):527–563.Sturm, K.-T. (2003). Probability measures on metric spaces of nonpositive curvature. In

Heat kernels andanalysis on manifolds, graphs, and metric spaces (Paris, 2002), vol. 338 of Contemporary Mathematics ,pages 357–390. American Mathematical Society, Providence, RI.Su, J., Kurtek, S., Klassen, E., and Srivastava, A. (2014). Statistical analysis of trajectories on riemannianmanifolds: bird migration, hurricane tracking and video surveillance.

The Annals of Applied Statistics ,8(1):530–552.van der Vaart, A. and Wellner, J. (1996).

Weak Convergence and Empirical Processes: With Applications toStatistics . Springer, New York.Wang, J.-L., Chiou, J.-M., and Müller, H.-G. (2016). Review of functional data analysis.

Annual Review ofStatistics and Its Application , 3:257–295.Yao, F., Müller, H.-G., and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data.

Journalof the American Statistical Association , 100(470):577–590.Zhang, X. and Wang, J. L. (2016). From sparse to dense functional data and beyond.

The Annals ofStatistics , 44:2281–2321.Zhang, X. and Wang, J. L. (2018). Optimal weighting schemes for longitudinal and functional data.

Statisticsand Probability Letters , 138:165–170.Zhang, Z., Klassen, E., and Srivastava, A. (2018). Phase-amplitude separation and modeling of sphericaltrajectories.

Journal of Computational and Graphical Statistics , 27(1):85–97.Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C., and Lin, W. (2009). Intrinsic regression models forpositive-deﬁnite matrices with applications to diﬀusion tensor imaging.