Statistical Inference on the Hilbert Sphere with Application to Random Densities
SStatistical Inference on the Hilbert Spherewith Application to Random Densities
Xiongtao Dai ∗ Department of Statistics, Iowa State University, Ames, Iowa 50011 USA
Abstract
The infinite-dimensional Hilbert sphere S ∞ has been widely employed to modeldensity functions and shapes, extending the finite-dimensional counterpart. We con-sider the Fr´echet mean as an intrinsic summary of the central tendency of data lyingon S ∞ . To break a path for sound statistical inference, we derive properties of theFr´echet mean on S ∞ by establishing its existence and uniqueness as well as a root- n central limit theorem (CLT) for the sample version, overcoming obstructions frominfinite-dimensionality and lack of compactness on S ∞ . Intrinsic CLTs for the esti-mated tangent vectors and covariance operator are also obtained. Asymptotic andbootstrap hypothesis tests for the Fr´echet mean based on projection and norm arethen proposed and are shown to be consistent. The proposed two-sample tests areapplied to make inference for daily taxi demand patterns over Manhattan modeledas densities, of which the square roots are analyzed on the Hilbert sphere. Numericalproperties of the proposed hypothesis tests which utilize the spherical geometry arestudied in the real data application and simulations, where we demonstrate that thetests based on the intrinsic geometry compare favorably to those based on an extrinsicor flat geometry. Keywords:
Intrinsic Mean, Functional Data Analysis, Hilbert Geometry, Riemannian Man-ifold, Large Sample Property ∗ [email protected] a r X i v : . [ m a t h . S T ] J a n Introduction
We aim to develop statistical theory and inferential methods for analyzing a sample ofrandom elements taking values on the Hilbert sphere S ∞ = { f ∈ H | (cid:107) f (cid:107) = 1 } , where H is a separable infinite-dimensional Hilbert space with inner product (cid:104)· , ·(cid:105) and norm (cid:107)·(cid:107) .The spherical Hilbert geometry gives rise to invariance properties and efficient calculationof geometric quantities, for instance the geodesics, which promotes much of its prevailingapplications to model different data objects: • Probability distributions supported on a compact interval N is represented by den-sity functions in Dens( N ) = { y : N → R | (cid:82) N y ( s ) ds = 1 , y ( s ) > , s ∈ N } . To analyzethe densities in a reparametrization-invariant fashion, Srivastava et al. (2007) proposed anonparametric Fisher–Rao metric with a reparametrization-invariant property, generalizingthe well-known finite-dimensional version (Rao, 1945). The Fisher–Rao metric is conve-niently expressed and calculated through representing the densities by the correspondingsquare root densities in X = { x : N → R | x ( s ) = (cid:112) y ( s ) , s ∈ N , y ∈ Dens( N ) } , where X is the positive orthant of the Hilbert sphere S ∞ modeled in H = L ( N ). The squareroot density framework has been widely applied in modeling the orientation distributionfunctions in high angular resolution diffusion images (Cheng et al., 2009; Du et al., 2014)and time-warping functions (Tucker et al., 2013; Yu et al., 2017). • Plane and space curves such as shape contours and motion trajectories are oftencompared in a group action-invariant fashion for pattern recognition. To compare the shapeof curves with the translation and scaling effects removed, an observed smooth parametrizedcurve in R d , d = 2 or 3 is centered and scaled to have unit length, obtaining a centered-and-scaled curve in F = { f : [0 , → R d | f (0) = 0 , (cid:82) | f (cid:48) ( s ) | ds = 1 } . Each f ∈ F isrepresented by its square root velocity function (SRVF) (Joshi et al., 2007) β f : [0 , → R d , β f ( s ) = | f (cid:48) ( s ) | − / f (cid:48) ( s ), s ∈ [0 , β f lies on the Hilbert sphere S ∞ inthe Hilbert space H = L ([0 , , R d ) = { h : [0 , → R d | (cid:82) | h ( s ) | ds < ∞} , equipped withthe inner product (cid:104) h , h (cid:105) = (cid:82) h ( s ) T h ( s ) ds , h , h ∈ H . The spherical geometry on S ∞ for the SRVFs induces a special case of the elastic metric (Younes, 1998) on the space of2entered-and-scaled curves F , so that the distance between curves are given by the squareroot of the minimal energy to transform between them. This geometry has demonstratedattractive and interpretable performance in practical curve matching tasks (Su et al., 2014;Bauer et al., 2017; Xie et al., 2017; Strait et al., 2019).Applications of Hilbert sphere in computer vision, medical imaging, and biological pro-cesses necessitates hypothesis tests backed by solid theory, but there are so far no availableasymptotic results to support hypothesis tests under an intrinsic geometry on S ∞ ; see forexample Wu and Srivastava (2014); Henning and Srivastava (2016) who applied U-statisticsto perform two-sample comparisons. For random densities, F -tests has been consider byPetersen et al. (2019) under a Wasserstein geometry and Dubey and M¨uller (2019) undera more general object-oriented framework, where the latter relies on an entropy numbercondition that has not been verified on S ∞ . This provides motivation for our work, whichaims to derive asymptotic distributional results on S ∞ , devise valid statistical inference,and showcase applications where we performed hypothesis test for random densities.Let M be a metric space and ρ : M × M → R the associated distance on M . Consideran M -valued random element X : Ω → M measurable between a complete probabilityspace (Ω , A , P ) and σ -algebra B ( M ) induced on M by ρ . The Fr´echet mean (Fr´echet,1948) is a commonly used location descriptor for non-Euclidean data objects and is termedthe intrinsic mean if M is a Riemannian manifold since it only rely on the intrinsic geometryrather than an ambient space. If there exists a unique minimizer of the Fr´echet functional M ( · ) = Eρ ( X, · ) , (1)then the minimizer µ = arg min p ∈M M ( p ) is called the Fr´echet mean of X . Similarly, forindependent realizations X , . . . , X n of X , the sample Fr´echet functional is M n ( · ) = 1 n n (cid:88) i =1 ρ ( X i , · ) (2)and the sample Fr´echet mean is ˆ µ = arg min p ∈M M n ( p ) given existence and uniqueness.This work focuses on M = S ∞ with ρ being the geodesic distance.When the Riemannian manifold M is finite-dimensional, theory and methods for theintrinsic Fr´echet mean have been well investigated. Various authors have considered, for3xample, confidence intervals (Bhattacharya and Patrangenaru, 2005), hypothesis tests(Huckemann, 2012), regression (Zhu et al., 2009), and principal component analysis (Fletcheret al., 2004) for manifold-valued data. Extensive study has also characterized the exis-tence and uniqueness (Karcher, 1977; Ziezold, 1977; Le, 2001; Afsari, 2011), consistency(Bhattacharya and Patrangenaru, 2003), and central limit theorems (CLTs) (Bhattacharyaand Patrangenaru, 2005; Bhattacharya and Lin, 2017) of the Fr´echet mean on a generalfinite-dimensional M . A slower-than- n / CLT has been studied on the circle (Hotz andHuckemann, 2015) and, more recently, high-dimensional spheres and general Riemannianmanifolds (Eltzner and Huckemann, 2019).Also well-studied are random objects lying in an infinite-dimensional Hilbert space,which are termed functional data (Wang et al., 2016). Thanks to the flat geometry and vec-tor space structure, the definition of the mean element is straightforward, and the asymp-totic theory is obtained through extending the multivariate case (Hsing and Eubank, 2015).The Hilbert mean extends to a class of Hilbert manifolds through the extrinsic (Ellingsonet al., 2013) or the transformation method (Petersen and M¨uller, 2016) by mapping theoriginal data into a Hilbert space and perform analysis there. However, these approachesdepend on the embedding or transformation and does not preserve the intrinsic geometryon S ∞ , so the theory derived for these methods cannot be applied to obtain an intrinsicanalysis for objects lying on S ∞ .Little is known about the theory for the Fr´echet mean on curved infinite-dimensionalgeometries. In particular, on the Hilbert sphere S ∞ , the existence and uniqueness of theFr´echet mean has not been established, and no distributional results are available for theinference of the intrinsic mean. Difficulties in deriving these results on S ∞ include the lackof compactness and the positive curvature, which prevents proof techniques developed for afinite-dimensional manifold (Afsari, 2011; Bhattacharya and Patrangenaru, 2005) or Hilbertspace (Hsing and Eubank, 2015) to be applicable. General results for the convergence rate ofthe sample Fr´echet mean in abstract settings has been established by Gouic et al. (2019) andAhidar-Coutrix et al. (2019); Sch¨otz (2019), where the latter two works applied empiricalprocess theory under entropy conditions, which are challenging to verify for the infinite-4imensional positively curved manifold S ∞ ; no distributional results was established wereprovided there, and the uniqueness and existence were assumed.Our contributions include establishing the uniqueness and existence of the Fr´echet meanon S ∞ and its large sample properties, and applying these results to derive valid hypothesistests for data objects. We state in Section 3 theoretical properties of the intrinsic data anal-ysis on S ∞ . The existence and uniqueness of the intrinsic mean are shown in Theorem 3.1,and large sample properties of the sample intrinsic mean ˆ µ including a strong LLN and aCLT are stated in Proposition 3.1 and Theorem 3.2, respectively. Theoretical hurdles inverifying tightness and Lipschitz continuity are overcome with utilizing weak compactness,Hilbert differential geometry (Lang, 1999), and a careful analysis of the spherical geome-try. The asymptotic covariance of the limiting Gaussian element in the CLT of ˆ µ is shownto be consistently estimated by an empirical version in Proposition 3.2. To quantify thedeviation of data observations around the intrinsic mean, we formulate intrinsic CLTs forthe tangent vectors in Corollary 3.2 and the covariance operator in Theorem 3.3 basedon parallel transport. Unlike the case in a flat Hilbert space, additional terms manifestin the asymptotic covariance of the sample covariance operator due to the curvature of S ∞ . Results on the covariance will be useful to derive theoretical properties of principalcomponent analysis (e.g. Tucker et al., 2013) on S ∞ , though the latter exposition is out ofscope of this work.Asymptotic and bootstrap hypothesis tests for the population means are proposed inSection 4 as a result of the CLTs, with consistent properties of the test derived in Corol-lary 4.1–Corollary 4.5. Two test statistics are constructed using the norm and the projec-tions of the intrinsic sample means expressed on a chart, respectively, analogous to thosedeveloped for functional data (Horv´ath et al., 2013; Aue et al., 2018). Simulation studies inSection 5 demonstrate that the hypothesis tests based on the intrinsic mean have smallerbias than that based on the extrinsic mean when data are asymmetrically distributedaround the mean. The asymptotics is shown to kick in with a moderate sample size of 50despite the infinite-dimensional S ∞ . A real data study of daily taxi demands in Manhattanis presented in Section 6, where the taxi demands are modeled as densities. The spheri-5al geometry demonstrates advantage over alternative geometries in detecting changes inthe demand patterns. The proofs of the main results are deferred to the SupplementalMaterials. The infinite-dimensional Hilbert sphere S ∞ is the unit sphere in Hilbert space H withwell-known geometry and explicit expressions for its geometric quantities. In particular,the tangent space of S ∞ at p ∈ S ∞ is the subspace T p S ∞ = { v ∈ H | (cid:104) v, p (cid:105) = 0 } of H with codimension one. The metric tensor (cid:104) u, v (cid:105) p for tangent vectors u , v ∈ T p S ∞ at p ∈ S ∞ is induced from and equal to the H -inner product (cid:104) u, v (cid:105) ; we suppress the subscript p in the metric to lighten notation. The geodesic distance ρ : S ∞ × S ∞ → R is given by ρ ( f, g ) = arccos( (cid:104) f, g (cid:105) ). The Riemannian exponential map at p ∈ S ∞ is exp p : T p S ∞ →S ∞ , exp p v = cos( (cid:107) v (cid:107) ) p + sin( (cid:107) v (cid:107) ) (cid:107) v (cid:107) − v , which preserves the distance to the origin, i.e. (cid:107) v (cid:107) = ρ ( p, exp p v ). The inverse exponential map, or the logarithm map , at p ∈ S ∞ is definedas log p : S ∞ \{− p } → T p S ∞ , log p x = arccos( (cid:104) p, x (cid:105) ) (cid:107) u (cid:107) − u , where − p is the antipodal of p and u = x − (cid:104) p, x (cid:105) p . A chart τ : U ⊂ S ∞ → E is a homeomorphism that maps U ontoopen subset τ ( U ) in a Hilbert space E . We require τ to be smooth for computation, so that τ − : τ ( U ) ⊂ E → H is a smooth diffeomorphism. For example, τ p ( · ) = log p ( · ) is a smoothchart defined on U p = S ∞ \{− p } , for p ∈ S ∞ . For general Hilbert differential geometry werefer to Lang (1999).For concreteness, here we consider in our numerical illustrations the Hilbert space H = L ( I ) = { f : I → R | (cid:82) I f ( s ) ds < ∞} of (equivalent classes of) square-integrablefunctions on a compact Euclidean set I , while we note that the definition of a Hilbertsphere is fully intrinsic through isometry (Lang, 1999).6 Properties of the Fr´echet mean on S ∞ Unlike the case in a Euclidean space, the Fr´echet mean for data lying on a manifold maynot exist and may not be unique. A neighborhood condition (A1) is needed to guaranteethe existence and uniqueness of the intrinsic means µ defined in (1).(A1) The support U ⊂ S ∞ of X satisfies P ( X ∈ U ) = 1 and sup x,y ∈U ρ ( x, y ) ≤ π/ S d , d < ∞ is that P ( ρ ( q, X ) < r ) = 1 for some q ∈ S d and r < π/
2. A lack of compactnesson the infinite-dimensional S ∞ prevents the arguments of Afsari (2011) to apply, whichutilizes compactness to show both the existence and uniqueness, the latter relying on thePoincar´e–Hopf theorem for compact manifolds. We interpret the additional concentrationin (A1) as a compensation for a lack of compactness. The next theorem establishes theuniqueness and existence of the Fr´echet mean. Theorem 3.1.
If (A1) holds, then there exists a unique intrinsic mean µ of X on S ∞ .Furthermore, P ( ρ ( µ, X ) < π/
2) = 1 . The existence proof is based on the weak compactness of the unit ball in H by theBanach–Alaoglu theorem (Rudin, 1973) and an analysis of ρ ( x, · ), the uniqueness resultmakes use of the convexity of ρ , and the proximity of µ and X is obtained through areflection argument used in Afsari (2011). With the well-definedness of the population andsample intrinsic means µ and ˆ µ guaranteed by Theorem 3.1, we derive the consistency for ˆ µ based on M -estimation and empirical process (see, e.g., van der Vaart and Wellner, 1996). Proposition 3.1.
If (A1) holds, then ρ (ˆ µ, µ ) = o (1) a.s. Some notations are needed to state the CLT for ˆ µ . Let B ( E , E ) be the Banach spaceof bounded linear operators between Banach spaces E and E . An element f in the Hilbert7pace ( E , (cid:104)· , ·(cid:105) E ) is identified with f ∗ := (cid:104) f, ·(cid:105) in the dual space E ∗ := B ( E , R ) by the Rieszrepresentation theorem. Denote A ∗ as the adjoint of a linear operator A between Hilbertspaces. The tensor product ⊗ : E ∗ × E ∗ → B ( E ∗ , E ∗ ) in the dual space E ∗ is definedas ( f ∗ ⊗ g ∗ ) h ∗ = (cid:104) f, h (cid:105) E g ∗ for f ∗ , g ∗ , h ∗ ∈ E ∗ . In a neighborhood U of µ , we express µ ,ˆ µ , and ρ on a smooth chart τ : U ⊂ S ∞ → E and write µ τ = τ ( µ ), ˆ µ τ = τ (ˆ µ ), and ρ τ : S ∞ × E → R , ρ τ ( x, e ) = ρ ( x, τ − ( e )). Let D denote the partial (Fr´echet) derivativeof a multivariate function w.r.t. the second argument, where the definition of Fr´echetderivatives is reviewed in Appendix A. Set ψ τ : S ∞ × E → B ( E , R ) , ψ τ ( x, e ) = D ρ τ ( x, e ); F τ = E [ ψ τ ( X, µ τ ) ⊗ ψ τ ( X, µ τ )] ∈ B ( E ∗ , E ∗ ); and Λ τ = ED ρ τ ( X, µ τ ) ∈ B ( E , B ( E , R )), forwhich the explicit forms are obtained in the Supplemental Materials. Theorem 3.2.
Let ( U, τ ) be a chart of S ∞ in a neighborhood of µ . Under (A1), √ n (ˆ µ τ − µ τ ) L −→ Z as n → ∞ , where Z is a Gaussian random element in E with mean zero and covarianceoperator T : E → E , T = (Λ − τ ) F τ (Λ − τ ) ∗ , satisfying (cid:104) h , T h (cid:105) E = cov( (cid:104) Λ − τ ψ τ ( X, µ τ ) , h (cid:105) E , (cid:104) Λ − τ ψ τ ( X, µ τ ) , h (cid:105) E ) for h , h ∈ E . The operator Λ τ is continuously invertible. Theorem 3.2 follows from a linearization argument applied on the chart representation M n,τ : E → R , M n,τ ( e ) = M n ( τ − ( e )) of M n ( · ) in a neighborhood of µ τ , where the uniformconvergence of the residual is obtained due to the simple dependency of the geodesic dis-tance ρ on the inner product of its arguments. The key difficulty is verifying the Lipschitzcontinuity of the criterion function under the infinite-dimensional setup by analyzing theHilbert sphere geometry. The CLT is intrinsic in the sense that if η : V → E is anotherchart with µ ∈ V , then √ n [ η (ˆ µ ) − η ( µ )] converges in law to D ( η ◦ τ − )( µ τ ) Z where Z isthe limiting Gaussian element under chart τ .The asymptotic distribution in Theorem 3.2 needs to be estimated in practice for sta-tistical inference such as in hypothesis tests that will be detailed in Section 4. Define ˆ T =( ˆΛ − τ ) ˆ F τ ( ˆΛ − τ ) ∗ , ˆ F τ = n − (cid:80) ni =1 ψ τ ( X i , ˆ µ τ ) ⊗ ψ τ ( X i , ˆ µ τ ), and ˆΛ τ = n − (cid:80) ni =1 D ρ τ ( X i , ˆ µ τ ).8or an operator C : F → F defined on a Hilbert space F , let (cid:107)C(cid:107) = sup (cid:107) h (cid:107) F =1 (cid:107)C h (cid:107) F denotethe operator norm and (cid:107)C(cid:107) = (cid:80) ∞ j =1 (cid:104) e j , ( C ∗ C ) / e j (cid:105) F the trace norm, where { e j } ∞ j =1 is anarbitrary complete orthonormal basis of F . The next result states that the asymptoticdistribution of the limiting Gaussian element can be estimated consistently. Proposition 3.2.
Under the conditions of Theorem 3.2, as n → ∞ , (cid:13)(cid:13)(cid:13) ˆ T − T (cid:13)(cid:13)(cid:13) = o (1) a.s. Let Z n and Z be zero-mean Gaussian random elements in E with covariance ˆ T and T ,respectively. Then Z n L −→ Z as n → ∞ . On a nonlinear manifold, the deviations of data observations around the intrinsic meanare commonly characterized by the corresponding tangent vectors constructed from thelogarithm maps. The tangent vector of X and its empirical version are respectively definedas V = log µ X, ˆ V = log ˆ µ X. Similarly, let V i = log µ X i and ˆ V i = log ˆ µ X i denote the corresponding quantities for anobservation X i , i = 1 , . . . , n . Proposition 3.3. If µ is the intrinsic mean of an S ∞ -valued random variable X with P ( ρ ( µ, X ) = π ) = 0 , then EV = 0 . Proposition 3.3 states that the logarithm map centers the observations around the in-trinsic mean on S ∞ , a property that has been established on finite-dimensional Riemannianmanifolds (Karcher, 1977; Bhattacharya and Patrangenaru, 2003).The tangent space T µ S ∞ at µ is a Hilbert space with inner product given by the metrictensor. We obtain first a CLT of ˆ µ on the tangent space T µ S ∞ as a corollary of Theorem 3.29y considering τ = log µ ( · ), so that ψ τ ( X, µ τ ) = − (cid:104) V, ·(cid:105) as will be shown in the proof ofProposition 3.3. Here Λ τ becomes Λ = ED ρ ( X, µ ) ∈ B ( T µ S ∞ , B ( T µ S ∞ , R )), which isidentified with the T µ S ∞ -valued linear map Λ ∈ B ( T µ S ∞ , T µ S ∞ ) so that Λ v = (cid:104) Λ v, ·(cid:105) ,for v ∈ T µ S ∞ . Corollary 3.1.
Under the conditions of Theorem 3.2, √ n log µ ˆ µ converges in distributionto a Gaussian random element on T µ S ∞ with mean zero and covariance − E [ V ⊗ V ]Λ − . The variation of X around the intrinsic mean is summarized by the covariance operator,which is a linear characterization of the intrinsic variation and has been applied to generalizethe principal component analysis to Riemannian manifolds (Fletcher et al., 2004; Lazar andLin, 2017). The population and empirical covariances are G : T µ S ∞ → T µ S ∞ , G = E ( V ⊗ V )and ˆ G : T ˆ µ S ∞ → T ˆ µ S ∞ , ˆ G = n − (cid:80) ni =1 ˆ V i ⊗ ˆ V i , respectively, where ⊗ is the tensor producton the tangent spaces, so that G h = E ( (cid:104) V, h (cid:105) V ) , ˆ G g = 1 n n (cid:88) i =1 (cid:104) ˆ V i , g (cid:105) ˆ V i for h ∈ T µ S ∞ and g ∈ T ˆ µ S ∞ .A corollary for comparing tangent vectors is needed for assessing the estimation of thecovariance operator. Parallel transport under the Levi-Civita connection on S ∞ (Lang,1999) is applied to perform intrinsic comparisons of tangent vectors on different tangentspaces. Let P yx : T x S ∞ → T y S ∞ denote the parallel transport of tangent vectors on T x S ∞ along the geodesic leaving from x to y on S ∞ , where x and y are not antipodal so thatthe geodesic is unique. Write Λ x = D ρ ( x, µ ) ∈ B ( T µ S ∞ , B ( T µ S ∞ , R )) and we identify itwith Λ x ∈ B ( T µ S ∞ , T µ S ∞ ) such that Λ x v = (cid:104) Λ x v, ·(cid:105) for any v ∈ T µ S ∞ . Recall that Λ isdefined before Corollary 3.1. Corollary 3.2.
Under (A1), v = log µ x and ˆ v = log ˆ µ x are well defined for any x ∈ S ∞ with ρ ( x, µ ) < π , the latter almost surely as n → ∞ . Moreover, √ n ( P µ ˆ µ ˆ v − v ) L −→ Z v as n → ∞ , where Z v is a Gaussian random element in T µ S ∞ with mean 0 and covariance Λ x Λ − E [ V ⊗ V ]Λ − Λ x .
10o derive a central limit theorem for the covariance, operators G and ˆ G are analyzed asHilbert–Schmidt operators on the tangent spaces and are compared through the paralleltransport. For x ∈ S ∞ lying in a small neighborhood of µ , let P µx : B ( T x S ∞ , T x S ∞ ) →B ( T µ S ∞ , T µ S ∞ ) be the parallel transport of operators from x to µ such that for any operator A ∈ B ( T x S ∞ , T x S ∞ ), P µx ( A ) v = P µx A P xµ v , v ∈ T µ S ∞ . For a Hilbert space F , let B HS ( F , F )denote the Hilbert space of Hilbert–Schmidt operators on F equipped with the inner product (cid:104)F , F (cid:105) HS = ∞ (cid:88) j =1 (cid:104)F e j , F e j (cid:105) F , where F , F are operators on F with finite Hilbert–Schmidt norm induced by this innerproduct, and { e j } ∞ j =1 is a complete orthonormal basis of F . Also let ⊗ HS denote the tensorproduct in B HS ( T µ S ∞ , T µ S ∞ ). Define H : T µ S ∞ → B HS ( T µ S ∞ , T µ S ∞ ), H ( · ) = E [Λ X ( · ) ⊗ V ] where Λ X is Λ x at a random x = X . Theorem 3.3.
Under (A1), √ n ( P µ ˆ µ ˆ G − G ) L −→ Z G in B HS ( T µ S ∞ , T µ S ∞ ) , where Z G is a Gaussian random element with mean 0 and covariance E [ G ⊗ HS G ] , where G = V ⊗ V − G + H Λ − V + ( H Λ − V ) ∗ . Theorem 3.3 is an extension of the central limit theorem for the covariance operatorin H (e.g. Theorem 8.1.2 in Hsing and Eubank, 2015) to the Hilbert sphere S ∞ , wherein the former Hilbert space the parallel transport is simply the identity. Additional termsinvolving H appear in the random Hilbert–Schmidt operator G that generates the asymp-totic covariance due to the positive curvature of S ∞ . Theorem 3.3 can be applied to deriveasymptotic theory for the principal component analysis (Tucker et al., 2013; Dai and M¨uller,2018; Lin and Yao, 2019) based on ˆ G , an exposition that is beyond the scope of this work.11 Hypothesis Tests for the Intrinsic Mean
We obtain here one- and two-sample hypothesis tests of the intrinsic mean as a resultof the CLT in Theorem 3.2; for completeness, a paired-sample test is formulated in theSupplemental materials. To handle the manifold constraint, these tests make use of a chart(
U, τ ) so that test statistics based on the intrinsic mean can be constructed analogously tothose in a Hilbert space (Berkes et al., 2009; Aue et al., 2018).
One-sample test
Given a sample X , . . . , X n on S ∞ with unknown mean µ , consider the one-sample hy-pothesis H : µ = µ , H : µ (cid:54) = µ , for a pre-specified µ ∈ S ∞ . We propose a norm-based and a projection-based test statisticdefined as, respectively, T = n (cid:107) τ (ˆ µ ) − τ ( µ ) (cid:107) E ,S = n K (cid:88) k =1 (cid:104) τ (ˆ µ ) − τ ( µ ) , ˆ φ k (cid:105) E ˆ λ k , where the projection-based test utilizes K < ∞ projections. Here S is the empiricalestimate of ˜ S = n (cid:80) Kk =1 λ − k (cid:104) τ (ˆ µ ) − τ ( µ ) , φ k (cid:105) E , and the ( λ k , φ k ) and (ˆ λ k , ˆ φ k ) are theeigenvalue–eigenfunction pairs of the true and estimated covariance operator T or ˆ T of the limiting element Z in Theorem 3.2, respectively. Note that one cannot obtain aHotelling’s T -like test statistic by normalizing the mean with the inverse of its covarianceoperator since the resulting test statistic diverges (H´ajek, 1962). A natural choice of chartis τ ( · ) = log µ ( · ), under which the norm-based test statistic reduces to T = nρ (ˆ µ, µ ).By Slutsky’s theorem and Theorem 3.2, the limiting distribution for T is (cid:107) Z (cid:107) E andthat for S is χ K . Summarizing, 12 orollary 4.1. Assume the conditions of Theorem 3.2 hold. Then T L −→ ∞ (cid:88) k =1 λ k W k ,S L −→ χ K , where W k are i.i.d. χ random variables. Two-sample test
The two-sample setting is that we have i.i.d. observations X [ g ]1 , . . . , X [ g ] n g on S ∞ fromPopulation g , g = 1 ,
2, where n g is the sample size and n = n + n is the total sample size.The unknown intrinsic mean in Population g is µ g and is estimated by the empirical meanˆ µ g . The two-sample hypothesis is H : µ = µ , H : µ (cid:54) = µ . Under chart τ , let T g = (Λ − gτ ) F gτ (Λ − gτ ) ∗ be the asymptotic covariance of τ (ˆ µ g ) given byTheorem 3.2, where F gτ = E [ ψ τ ( X [ g ] i , τ ( µ g )) ⊗ ψ τ ( X [ g ] i , τ ( µ g ))], Λ gτ = ED ρ τ ( X [ g ] i , τ (ˆ µ g )), g = 1 ,
2. The pooled covariance operator is written as T pool = ( n/n ) T + ( n/n ) T . Thefollowing corollary of Theorem 3.2 provides a basis for the two-sample tests. Corollary 4.2.
Assume n , n → ∞ and n /n → q for some q ∈ (0 , . If H is true andthe conditions of Theorem 3.2 hold for both populations, then √ n [ τ (ˆ µ ) − τ (ˆ µ )] L −→ Z , where Z is a Gaussian random element with mean and covariance T pool . The norm-based and projection-based two-sample test statistics are T = n (cid:107) τ (ˆ µ ) − τ (ˆ µ ) (cid:107) E ,S = n K (cid:88) k =1 (cid:104) τ (ˆ µ ) − τ (ˆ µ ) , ˆ φ k, pool (cid:105) E ˆ λ k, pool . Here (ˆ λ k, pool , ˆ φ k, pool ), k = 1 , , . . . are the eigenvalue–eigenfunction pairs of the estimatedpooled covariance ˆ T pool = ( n/n ) ˆ T + ( n/n ) ˆ T , where ˆ T g = ( ˆΛ − gτ ) ˆ F gτ ( ˆΛ − gτ ) ∗ , ˆ F gτ = n − g (cid:80) n g i =1 ψ τ ( X [ g ] i , τ (ˆ µ g )) ⊗ ψ τ ( X [ g ] i , τ (ˆ µ g )), and ˆΛ gτ = n − g (cid:80) n g i =1 D ρ τ ( X [ g ] i , τ (ˆ µ g )), g = 1 , orollary 4.3. Assume the conditions of Corollary 4.2 hold. Let W k be i.i.d. χ randomvariables and λ k, pool , k = 1 , , . . . be the eigenvalues of T pool . Then T L −→ ∞ (cid:88) k =1 λ k, pool W k ,S L −→ χ K . In performing asymptotic tests, the asymptotic null distributions of the norm-based teststatistics involve unknown eigenvalues, which need to be estimated. The infinite sumin the limiting distribution poses the question of whether the asymptotic p -values canbe estimated consistently. The answer is affirmative: In the one-sample scenario, theasymptotic null distribution of T is (cid:107) Z (cid:107) , which is estimated by the distribution of ˆ T := (cid:107) Z n (cid:107) = (cid:80) ∞ k =1 ˆ λ k W k . As a consequence of Proposition 3.2 and the continuous mappingtheorem, the limiting distributions of T and ˆ T are the same, so the quantiles of T can beestimated consistently by those of ˆ T in the large sample limit. The asymptotic p -valuescan thus be consistently estimated by the quantiles of ˆ T , defining a valid asymptotictest. In practice, quantiles of ˆ T are obtained by Monte Carlo. Although the asymptoticdistribution is non-pivotal, the Theorem 3.2 leads to the explicit form of the asymptoticdistribution of the test statistics, enabling efficient implementation.For the projection-based tests, the tuning parameter K needs to be chosen in order tomaximally capture the difference in the means. While an optimal choice may depend on thesample size and the stochastic structure of the data, we find in our numerical studies thatthe Fraction of Variance Explained (FVE) criterion leads to reasonable test performance,by setting K = K ∗ , K ∗ = min { K ≥ | (cid:80) Kj =1 ˆ λ j (cid:80) ∞ k =1 ˆ λ k ≥ r } , (3)with a threshold r ∈ (0 , K eigenfunctions. Our experience, in agreement with Berkes et al. (2009); Horv´ath et al.142013), is that in practice the mean difference is usually well captured by the first fewprojections. Even though the norm-based tests always capture the mean difference, theprojection-based tests are often more powerful in our numerical studies since they utilize apivotal test statistic and focus on the leading components with higher signal-to-noise ratio.The asymptotic tests in the two- and paired-sample scenarios follow analogous development. Alternative bootstrap tests are proposed to improve finite sample performance. The boot-strap tests do not require the computation of the covariance components of the norm-basedtest statistics, and enjoy a second order accuracy using the pivotal projection-based teststatistics (Hall, 1992).In the one-sample scenario, let X ∗ , . . . , X ∗ n be a nonparametric bootstrap sample drawnfrom X , . . . , X n with replacement and ˆ µ ∗ be the bootstrap sample intrinsic mean. Thebootstrap distributions are derived from T ∗ = n (cid:107) τ (ˆ µ ∗ ) − τ (ˆ µ ) (cid:107) E ,S ∗ = n K (cid:88) k =1 (cid:104) τ (ˆ µ ∗ ) − τ (ˆ µ ) , ˆ φ ∗ k (cid:105) E ˆ λ ∗ k , where the (ˆ λ ∗ k , ˆ φ ∗ k ) are the eigenpairs of ˆ T ∗ τ constructed analogously to the sample covari-ance ˆ T τ but with the bootstrap sample. The validity of the bootstrap test is guaranteed bythe following corollary of Theorem 3.2 and the bootstrap theorem for infinite-dimensional Z -estimators in Wellner and Zhan (1996). Corollary 4.4.
Under the conditions of Theorem 3.2, the bootstrap intrinsic mean ˆ µ ∗ isconsistent for µ . Furthermore, as n → ∞ , √ n [ τ (ˆ µ ∗ ) − τ (ˆ µ )] L −→ Z ∗ , where Z ∗ is a Gaussian random element sharing the same distribution with Z in Theo-rem 3.2. Hence, the asymptotic distributions are the same for T ∗ and T , as well as for S ∗ and S . X [ g ] ∗ , . . . , X [ g ] ∗ n g be a nonparametric bootstrap sample of size n g from Population g , g = 1 ,
2. Let ˆ µ ∗ g be the intrinsic mean and ˆ T ∗ g the sample covarianceoperator of the bootstrap sample in Population g , ˆ T ∗ pool = ( n/n ) ˆ T ∗ + ( n/n ) ˆ T ∗ thebootstrap pooled covariance, and (ˆ λ ∗ k, pool , ˆ φ ∗ k, pool ), k = 1 , , . . . the eigenpairs of ˆ T ∗ pool . Thebootstrap versions of for T and S are, respectively, T ∗ = n (cid:107) [ τ (ˆ µ ∗ ) − τ (ˆ µ )] − [ τ (ˆ µ ∗ ) − τ (ˆ µ )] (cid:107) E ,S ∗ = n K (cid:88) k =1 (cid:104) [ τ (ˆ µ ∗ ) − τ (ˆ µ )] − [ τ (ˆ µ ∗ ) − τ (ˆ µ )] , ˆ φ ∗ k, pool (cid:105) E ˆ λ ∗ k, pool , where the bootstrap statistics are constructed as such to increase power. Corollary 4.5.
Under the conditions of Corollary 4.2, the bootstrap intrinsic mean ˆ µ ∗ g isconsistent for µ g , g = 1 , . Moreover, √ n { [ τ (ˆ µ ∗ ) − τ (ˆ µ )] − [ τ (ˆ µ ∗ ) − τ (ˆ µ )] } L −→ Z ∗ , where Z ∗ is a Gaussian random element sharing the same distribution as Z in Corol-lary 4.2. Hence, the asymptotic distributions are the same for T ∗ and T , as well as for S ∗ and S . Simulation studies were performed on S ∞ modeled in H = L ([0 , µ to thesquare root of the Beta(2 ,
1) density function and µ = exp µ δv , where δ ∈ [ − . , .
4] isthe effect size, v = K − / µ (cid:80) K µ k =1 φ k , K µ ∈ { , , } is the number of mean components,and the φ k are orthonormal functions to be detailed shortly. Independent observations inPopulation g = 1 , X [ g ] i ( s ) = exp µ g { ( − g − (cid:80) K X k =1 ξ gik φ gk ( s ) } , s ∈ [0 ,
1] with K X = 50 components. The k th scores ξ gik were i.i.d. real-valued random16ariables with mean 0 and variance θ k = 3 − k , generated from either the normal distri-bution or the centered exponential distribution, i.e. ξ gik = η gik − Eη gik and η gik followsExponential( θ k ), for i = 1 , . . . , n g , g = 1 ,
2. The orthonormal basis functions φ gk weredefined as φ gk = R µ g ( ψ k +1 ), k = 1 , . . . , K X , where { ψ j } ∞ j =1 = { ψ ( s ) = 1 , ψ k ( s ) =2 / sin(2 kπs ) , ψ k − ( s ) = 2 / cos(2( k − πs ) , s ∈ [0 , , for k ∈ N } is the trigonometricbasis on [0 , R q : H → H is the rotation operator from ψ to q (cid:54) = − ψ along theshortest geodesic, defined by R q ( p ) = p + sin( ρ q )( (cid:104) u, p (cid:105) q − (cid:104) q, p (cid:105) u ) + (cos( ρ q ) − (cid:104) q, p (cid:105) q + (cid:104) u, p (cid:105) u ) , where ρ q = ρ ( ψ , q ) and u = ( ψ − (cid:104) ψ , q (cid:105) q ) / (1 − (cid:104) ψ , q (cid:105) ) / . The mean squared geodesicdistance from the observations to the intrinsic mean was (cid:80) K X k =1 θ k = 0 .
5, and the cumulativeFVE by the first J = 1 , . . . , J ) = (cid:80) Jj =1 θ j / (cid:80) K X k =1 θ k , were66.7%, 88.9%, 96.3%, 98.8%, and 99.6%, respectively.The simulation settings consisted of all combinations of • sample size n g ∈ { , , } ; • number of components K µ ∈ { , , } in the mean difference; and • either symmetric or asymmetric data generated around the mean, corresponding tothe normally (norm) and exponentially (exp) distributed ξ gik , respectively.We compared the proposed asymptotic and bootstrap tests which are intrinsic to S ∞ , aswell as a norm-based bootstrap test of the extrinsic means (Ellingson et al., 2013) in theambient space H projected back onto S ∞ . The number of components K for our projection-based tests were chosen according to the FVE criterion (3) with threshold r = 0 . , .
95, or0 .
99, which roughly correspond to K = 2 ,
3, and 4 projections in our settings, respectively.Calculated with 1000 Monte Carlo iterations each with 499 bootstrap samples, theempirical power curves for the two-sample tests over effect size δ ∈ [ − . , .
4] are displayedin Figure 1, noting that H holds if and only if δ = 0. As a visual aid, dark and lightpaired colors were used to denote the proposed asymptotic and bootstrap tests, respectively.Bootstrap tests were overall more conservative and had better control of the type I error17ate (size) than the asymptotic tests, which is most apparent for n g = 10 or 25. When n g = 10, the asymptotic projection-based tests were over-liberal, while the correspondingbootstrap tests properly controlled the size to be around the nominal level α = 0 .
05. Thiscan be attributed to the second-order correctness for the bootstrap tests when the teststatistic is pivotal (Hall, 1992), of which the effect is prominent in small samples. Theasymptotic and bootstrap tests had almost identical performance under n g = 50 (3rd and6th columns, Figure 1), showing that the asymptotics comes into force under this moderatesample size even if the data lie on an infinite-dimensional curved manifold S ∞ .When n g = 10, the norm-based tests had higher power than the projection-based teststhat respect the nominal level. In this small-sample scenario, the norm-based tests avoidestimating the projection directions and gain stability as compared to the projection-basedtests. With n g = 25 or 50, the projection-based tests were more powerful than the norm-based test when larger FVE thresholds were chosen to capture mean differences in multipleprojections. Specifically, when K µ = 3 and 5, the projection-based tests with FVE = 0 . .
99 were the most powerful, respectively. This is due to the fact that the projection-based tests focus on the mean differences only in the directions with high signal-to-noiseratios. When K µ = 1, the norm-based tests and the projection-based tests with FVE = 0 . δ was slightly below0. The bias for the proposed intrinsic methods reduced as n g increased in the asymmet-ric scenarios, reaching near-unbiasedness when n g = 50, but that for the extrinsic testsremained even with n g = 50. This underlines the importance of respecting the intrinsicgeometry when data is asymmetrically generated on the manifold.18 o r m , n g = r m , n g = r m , n g = x p , n g = x p , n g = x p , n g = K m = m = m = − . − . . . . − . − . . . . − . − . . . . − . − . . . . − . − . . . . − . − . . . . . . . . .
00 0 . . . . .
00 0 . . . . . d Empirical power M e t hod P B , . P A , . P B , . P A , . P B , . P A , . L B L A E x t F i g u r e : E m p i r i c a l p o w e r c u r v e s f o rt h e t w o - s a m p l e t e s t s . C o l u m n s c o rr e s p o nd t o d i ff e r e n t g e n e r a t i n g d i s tr i bu t i o n s f o r t h e p r i n c i p a l c o m p o n e n t s ξ g i k a nd s a m p l e s i ze s n g , a nd r o w s c o rr e s p o nd t o d i ff e r e n t nu m b e r s o f c o m p o n e n t s K µ i n t h e m e a nd i ff e r e n ce . T h e h o r i z o n t a l g r a y li n e s i nd i c a t e t h e n o m i n a ll e v e l α = . . Π A , r a nd Π B , r , p r o j ec t i o n - b a s e d t e s t s w i t h F V E t h r e s h o l d r ; L A a nd L B , n o r m - b a s e d t e s t s ; E x t , t h ee x tr i n s i c b oo t s tr a p t e s t . Sub s c r i p t s A a nd B s t a nd f o rt h e p r o p o s e d t e s t s i n t h e a s y m p t o t i c a ndb oo t s tr a p v e r s i o n s , r e s p ec t i v e l y . Data Application: Taxi Demands
Better understanding of the taxi demands will provide key insight into a more reliable andeconomic public transportation infrastructure. This subject has been of increasing modelinginterest (Chu and Chen, 2019; Dubey and M¨uller, 2020) as the popularity of app-based for-hire vehicle services such as Uber and Lyft rises and data become available. We analyzedthe demand patterns of taxi and other for-hire vehicles in New York City, which wereextracted from a total of 1.1 billion trips records of for-hire vehicles including the yellowand green cabs, Uber, Lyft, etc. The trip record data were made public following Freedomof Information Law (FOIL) requests, and our analysis was built upon a database compiledby Todd Schneider available on https://github.com/toddwschneider/nyc-taxi-data .The daily taxi demand is modeled as a spatial density of the passenger pick-up locations.For the interest of monitoring evolving demand patterns, our goal here is to compare thetaxi demands in the year of 2016 and 2017. We focused on the Manhattan taxi zones S and measured the daily demand by the probability density function Y gi ( s ) of pick-uplocations in the i th day of year g ∈ { , } , where s ∈ S stands for the taxi zone.Each spatial demand density is then transformed into a square root density X gi accordingto X gi ( s ) = (cid:112) Y gi ( s ), s ∈ S ∞ , so that (cid:107) X gi (cid:107) = (cid:82) S X gi ( s ) ds = (cid:82) S Y gi ( s ) ds = 1 andthus X gi lies on the unit Hilbert sphere S ∞ . Though the average demands in 2016 and2017 as measured by the intrinsic means of the square root densities X gi within each yearappear overall similar (left two panels, Figure 2), there is a substantial decrease in demandconcentrated near the Upper East Side of Manhattan (right panel, Figure 2), which isprobably due to the opening of three Second Avenue subway stations on January 1, 2017.20
016 2017 204060 2017 − 2016 −3−2−1012
Figure 2: The sample intrinsic means and their (Euclidean) difference.We investigated the effect of geometry for detecting changes in taxi demands by com-paring the size and power of two-sample hypothesis tests. The proposed intrinsic tests andthe extrinsic test were applied on the square root densities X gi utilizing the spherical geom-etry, and an alternative norm-based bootstrap test was performed for the original densities Y gi as elements in L ( S ), assuming a flat geometry. Two scenarios were considered, namelyan Equal Mean scenario where the observations for both populations were randomly sam-pled without replacement from year 2017, representing that H holds true, and an UnequalMean scenario where the observations for the two populations were sampled from 2016and 2017, respectively, representing H . The number of samples n g from each populationvaried among 10 , ,
20 and 25. The number of projections for the projection-based testswere selected according to the FVE criterion with threshold r = 0 . , .
9, and 0.95, whichcorresponds to K = 3, 5, and 9 components, respectively. The empirical sizes and powersare reported in Table 1 for the nominal level α = 0 .
05, calculated from 2000 Monte Carloiterations and 999 bootstrap samples. 21able 1: Proportions of rejected H for the taxi demands at the nominal level α = 0 . A,r and Π
B,r , projection-based tests with FVE threshold r ; L A and L B , norm-basedtests; Ext, the extrinsic bootstrap test on S ∞ ; Dens, the bootstrap test using the originaldensities. Subscripts A and B stand for the proposed asymptotic and bootstrap tests,respectively. Equal mean ( H ) n g L A Π A, . Π A, . Π A, . L B Π B, . Π B, . Π B, . Ext Dens10 0.073 0.046 0.052 0.063 0.088 0.043 0.042 0.028 0.088 0.0915 0.062 0.043 0.04 0.05 0.078 0.044 0.038 0.026 0.076 0.07620 0.057 0.039 0.046 0.047 0.064 0.044 0.044 0.031 0.064 0.06425 0.044 0.035 0.041 0.04 0.048 0.038 0.034 0.026 0.05 0.052
Unequal mean ( H ) n g L A Π A, . Π A, . Π A, . L B Π B, . Π B, . Π B, . Ext Dens10 0.203 0.215 0.596 0.846 0.259 0.215 0.572 0.776 0.259 0.12415 0.336 0.352 0.868 0.989 0.388 0.36 0.863 0.976 0.394 0.13420 0.532 0.484 0.975 0.999 0.591 0.5 0.972 0.999 0.59 0.14825 0.766 0.607 0.993 1 0.805 0.618 0.993 1 0.794 0.173Under the Equal Mean scenario ( H ), the proportion of rejection for all methods werebelow 0.1 and approached the nominal level α = 0 .
05 as n g increased, indicating thatthe tests have approximately the correct size. The proposed norm-based bootstrap tests L B was slightly more liberal than its asymptotic version L A , while the projection-basedΠ B, . was slightly more conservative than Π A, . . In the Unequal Mean scenario ( H ), alltests based on the square root densities, i.e. our proposed intrinsic tests and the extrinsictest, outperformed the bootstrap test based on the original density (last column, Table 1).This highlights the Hilbert sphere S ∞ as a more appropriate geometry for detecting smallchanges in the demand patterns than a flat space of original densities. The proposedprojection-based tests with FVE threshold r = 0 . .
95 had the highest power for allsample sizes, outperforming the intrinsic and extrinsic norm-based bootstrap tests.22 upplemental Materials . This manuscript is accompanied by Supplemental Materialsincluding proofs and additional simulations, which will be made available upon request.
AppendixA Fr´echet Derivatives
The Fr´echet derivative is reviewed here, following the definitions in Chapter I of Lang(1999). In this section, let E , E j , F be Banach spaces with norms (cid:107)·(cid:107) E , (cid:107)·(cid:107) E j , and (cid:107)·(cid:107) F ,respectively, for j = 1 , . . . , p . Denote B ( E, F ) as the space of continuous linear maps from E into F , which is a Banach space equipped with the operator norm (cid:107) g (cid:107) = sup (cid:107) e (cid:107) E =1 (cid:107) g ( e ) (cid:107) F for g ∈ B ( E, F ). Also let B ( E , . . . , E p ; F ) denote the Banach space of multilinear mapsequipped with the operator norm (cid:107) h (cid:107) = sup (cid:107) e (cid:107) E = ··· = (cid:107) e p (cid:107) Ep =1 (cid:107) h ( e , . . . , e p ) (cid:107) F , and write for short B ( E p , F ) = B ( E, . . . , E ; F ). Repeated linear operator g rep ∈ B ( E, B ( E, . . . , B ( E (cid:124) (cid:123)(cid:122) (cid:125) p times , F ) . . . ))is isometrically identified by a multilinear map g mult ∈ B ( E p , F ), as g mult ( e , . . . , e p ) = g rep ( e ) . . . ( e p ) . This identification gives rise to a Banach space isomorphism (Proposition 2.4, p7, Lang,1999); we use the same notation to denote both maps.Let f : U ⊂ E → F be a continuous map. Definition A.1.
Function f : U ⊂ E → F is said to be (Fr´echet) differentiable at a point x ∈ U if there exists a continuous linear map l of E into F such that for y ∈ E,f ( x + y ) = f ( x ) + l ( y ) + (cid:15) ( y ) , (cid:107) (cid:15) ( y ) (cid:107) F → (cid:107) y (cid:107) E →
0. The linear map l is called the (Fr´echet) derivative of f at x , denoted as Df ( x ). If f is differentiable at every point in U , then the derivative Df is a map Df : U → B ( E, F ) . Definition A.2.
Map f : U ⊂ E → F is said to be directional differentiable at a point x ∈ U if there exists a function l : E → F such that l ( y ) = lim t → f ( x + ty ) − f ( x ) t exists for all y ∈ E . The linear map l is called the directional derivative of f at x .If a map f is Fr´echet differentiable, then it is also directional differentiable and the twoderivatives match. In what follows, “differentiability” refers to Fr´echet differentiability un-less otherwise noted, and the directional differentiation is used for calculation. Higher orderderivatives and partial derivatives are defined in a recursive fashion. Since the derivative Df ( x ) is in B ( E, F ), a Banach space, the p th order derivative D p f is defined as D ( D p − f ),a map of U into B ( E, B ( E, . . . , B ( E, F ) . . . )) (cid:39) B ( E p , F ). A map is said to be smoothif the derivatives of all orders exist. For a bivariate map h : E × E → F , the partialderivative with respect to the first argument at ( x , y ) ∈ U × V ⊂ E × E is denoted as D h ( x , y ) , where D h : U × V → B ( E , F ) , D h ( x , y ) = Df y ( x ) , for f y ( x ) = h ( x, y ). The partial derivative D h w.r.t. the second argument is similarlydefined. Proposition A.1 (Chain rule, Lang (1999)) . If f : U → V is differentiable at x , and g : V → W is differentiable at f ( x ) , then g ◦ f is differentiable at x , and D ( g ◦ f )( x ) = Dg ◦ f ( x )( Df ( x )) . For f : U → B ( E, F ) and g : U → B ( F, G ) defined on an open set U ⊂ E , denote f · g as the function u (cid:55)→ f ( u ) ◦ g ( u ) . The chain rule can be compactly written as D ( g ◦ f ) = Dg ◦ f · Df. eferences Afsari, B. (2011), “Riemannian L p Center of Mass: Existence, Uniqueness, and Convexity,”
Proceedings of the American Mathematical Society , 139, 655–673.Ahidar-Coutrix, A., Le Gouic, T., and Paris, Q. (2019), “Convergence Rates for Empir-ical Barycenters in Metric Spaces: Curvature, Convexity and Extendable Geodesics,”
Probability Theory and Related Fields .Aue, A., Rice, G., and S¨onmez, O. (2018), “Detecting and Dating Structural Breaks inFunctional Data without Dimension Reduction,”
Journal of the Royal Statistical Society:Series B (Statistical Methodology) , 80, 509–529.Bauer, M., Eslitzbichler, M., and Grasmair, M. (2017), “Landmark-Guided Elastic ShapeAnalysis of Human Character Motions,”
Inverse Problems & Imaging , 11, 601–621.Berkes, I., Gabrys, R., Horv´ath, L., and Kokoszka, P. (2009), “Detecting Changes in theMean of Functional Observations,”
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 71, 927–946.Bhattacharya, R. and Lin, L. (2017), “Omnibus CLTs for Fr´echet Means and Nonpara-metric Inference on Non-Euclidean Spaces,”
Proceedings of the American MathematicalSociety , 145, 413–428.Bhattacharya, R. and Patrangenaru, V. (2003), “Large Sample Theory of Intrinsic andExtrinsic Sample Means on Manifolds - I,”
Annals of Statistics , 31, 1–29.— (2005), “Large Sample Theory of Intrinsic and Extrinsic Sample Means on Manifolds -II,”
Annals of statistics , 33, 1225–1259.Cheng, J., Ghosh, A., Jiang, T., and Deriche, R. (2009), “A Riemannian Frameworkfor Orientation Distribution Function Computing,” in
Medical Image Computing andComputer-Assisted Intervention – MICCAI 2009 , eds. Yang, G.-Z., Hawkes, D., Rueck-ert, D., Noble, A., and Taylor, C., Berlin, Heidelberg: Springer Berlin Heidelberg, vol.5761, pp. 911–918. 25hu, L. and Chen, H. (2019), “Asymptotic Distribution-Free Change-Point Detection forMultivariate and Non-Euclidean Data,”
The Annals of Statistics , 47, 382–414.Dai, X. and M¨uller, H.-G. (2018), “Principal Component Analysis for Functional Data onRiemannian Manifolds and Spheres,”
Annals of Statistics , 46, 3334–3361.Du, J., Goh, A., Kushnarev, S., and Qiu, A. (2014), “Geodesic Regression on OrientationDistribution Functions with Its Application to an Aging Study,”
NeuroImage , 87, 416–426.Dubey, P. and M¨uller, H.-G. (2019), “Fr´echet Analysis of Variance for Random Objects,”
Biometrika , 106, 803–821.— (2020), “Functional Models for Time-Varying Random Objects,”
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 82, 275–327.Ellingson, L., Patrangenaru, V., and Ruymgaart, F. (2013), “Nonparametric Estimationof Means on Hilbert Manifolds and Extrinsic Analysis of Mean Shapes of Contours,”
Journal of Multivariate Analysis , 122, 317–333.Eltzner, B. and Huckemann, S. F. (2019), “A Smeary Central Limit Theorem for Manifoldswith Application to High-Dimensional Spheres,”
The Annals of Statistics , 47, 3360–3381.Fletcher, P. T., Lu, C., Pizer, S. M., and Joshi, S. (2004), “Principal Geodesic Analysis forthe Study of Nonlinear Statistics of Shape,”
IEEE Transactions on Medical Imaging , 23,995–1005.Fr´echet, M. (1948), “Les ´El´ements Al´eatoires de Nature Quelconque Dans Un Espace Dis-tanci´e,”
Annales de l’Institut Henri Poincar´e , 10, 215–310.Gouic, T. L., Paris, Q., Rigollet, P., and Stromme, A. J. (2019), “Fast Convergence of Em-pirical Barycenters in Alexandrov Spaces and the Wasserstein Space,” arXiv:1908.00828[math, stat] . 26´ajek, J. (1962), “On Linear Statistical Problems in Stochastic Processes,”
CzechoslovakMathematical Journal , 12, 404–444.Hall, P. (1992),
The Bootstrap and Edgeworth Expansion , New York: Springer.Henning, W. and Srivastava, A. (2016), “A Two-Sample Test for Statistical Comparisonsof Shape Populations,” in , IEEE, pp. 1–9.Horv´ath, L., Kokoszka, P., and Reeder, R. (2013), “Estimation of the Mean of FunctionalTime Series and a Two-Sample Problem,”
Journal of the Royal Statistical Society: SeriesB (Statistical Methodology) , 75, 103–122.Hotz, T. and Huckemann, S. (2015), “Intrinsic Means on the Circle: Uniqueness, Locusand Asymptotics,”
Annals of the Institute of Statistical Mathematics , 67, 177–193.Hsing, T. and Eubank, R. (2015),
Theoretical Foundations of Functional Data Analysis,with an Introduction to Linear Operators , Hoboken: Wiley.Huckemann, S. F. (2012), “On the Meaning of Mean Shape: Manifold Stability, Locus andthe Two Sample Test,”
Annals of the Institute of Statistical Mathematics , 64, 1227–1259.Joshi, S. H., Klassen, E., Srivastava, A., and Jermyn, I. (2007), “A Novel Representation forRiemannian Analysis of Elastic Curves in R n ,” in , IEEE, pp. 1–7.Karcher, H. (1977), “Riemannian Center of Mass and Mollifier Smoothing,” Communica-tions on Pure and Applied Mathematics , 30, 509–541.Lang, S. (1999),
Fundamentals of Differential Geometry , New York: Springer.Lazar, D. and Lin, L. (2017), “Scale and Curvature Effects in Principal Geodesic Analysis,”
Journal of Multivariate Analysis , 153, 64–82.Le, H. (2001), “Locating Fr´echet Means with Application to Shape Spaces,”
Advances inApplied Probability , 33, 324–338. 27in, Z. and Yao, F. (2019), “Intrinsic Riemannian Functional Data Analysis,”
The Annalsof Statistics , 47, 3533–3577.Petersen, A., Liu, X., and Divani, A. A. (2019), “Wasserstein F -Tests and ConfidenceBands for the Fr´echet Regression of Density Response Curves,” arXiv .Petersen, A. and M¨uller, H.-G. (2016), “Functional Data Analysis for Density Functionsby Transformation to a Hilbert Space,” The Annals of Statistics , 44, 183–218.Rao, C. R. (1945), “Information and the Accuracy Attainable in the Estimation of Statis-tical Parameters,”
Bulletin of Calcutta Mathematical Society , 81–91.Rudin, W. (1973),
Functional Analysis , New York: McGraw-Hill.Sch¨otz, C. (2019), “Convergence Rates for the Generalized Fr´echet Mean via the QuadrupleInequality,”
Electronic Journal of Statistics , 13, 4280–4345.Srivastava, A., Jermyn, I., and Joshi, S. (2007), “Riemannian Analysis of Probability Den-sity Functions with Applications in Vision,” in , pp. 1–8.Strait, J., Chkrebtii, O., and Kurtek, S. (2019), “Automatic Detection and UncertaintyQuantification of Landmarks on Elastic Curves,”
Journal of the American StatisticalAssociation , 1–23.Su, J., Kurtek, S., Klassen, E., and Srivastava, A. (2014), “Statistical Analysis of Trajec-tories on Riemannian Manifolds: Bird Migration, Hurricane Tracking and Video Surveil-lance,”
The Annals of Applied Statistics , 8, 530–552.Tucker, J. D., Wu, W., and Srivastava, A. (2013), “Generative Models for Functional DataUsing Phase and Amplitude Separation,”
Computational Statistics & Data Analysis , 61,50–66.van der Vaart, A. and Wellner, J. (1996),
Weak Convergence and Empirical Processes:With Applications to Statistics , New York: Springer.28ang, J.-L., Chiou, J.-M., and M¨uller, H.-G. (2016), “Functional Data Analysis,”
AnnualReview of Statistics and its Application , 3, 257–295.Wellner, J. A. and Zhan, Y. (1996), “Bootstrapping Z -Estimators,” Tech. rep.Wu, W. and Srivastava, A. (2014), “Analysis of Spike Train Data: Alignment and Com-parisons Using the Extended Fisher-Rao Metric,” Electronic Journal of Statistics , 8,1776–1785.Xie, W., Kurtek, S., Bharath, K., and Sun, Y. (2017), “A Geometric Approach to Visualiza-tion of Variability in Functional Data,”
Journal of the American Statistical Association ,112, 979–993.Younes, L. (1998), “Computable Elastic Distances between Shapes,”
SIAM Journal onApplied Mathematics , 58, 565–586.Yu, Q., Lu, X., and Marron, J. (2017), “Principal Nested Spheres for Time-Warped Func-tional Data Analysis,”
Journal of Computational and Graphical Statistics , 26, 144–151.Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C., and Lin, W. (2009), “Intrinsic RegressionModels for Positive-Definite Matrices with Applications to Diffusion Tensor Imaging,”
Journal of the American Statistical Association , 104, 1203–1212.Ziezold, H. (1977), “On Expected Figures and a Strong Law of Large Numbers for RandomElements in Quasi-Metric Spaces,” in