[PDF] Robust Extrinsic Regression Analysis for Manifold Valued Data

Abstract

Recently, there has been a growing need in analyzing data on manifolds owing to their important role in diverse fields of science and engineering. In the literature of manifold-valued data analysis up till now, however, only a few works have been carried out concerning the robustness of estimation against noises, outliers, and other sources of perturbations. In this regard, we introduce a novel extrinsic framework for analyzing manifold valued data in a robust manner. First, by extending the notion of the geometric median, we propose a new robust location parameter on manifolds, so-called the extrinsic median. A robust extrinsic regression method is also developed by incorporating the conditional extrinsic median into the classical local polynomial regression method. We present the Weiszfeld's algorithm for implementing the proposed methods. The promising performance of our approach against existing methods is illustrated through simulation studies.

Full PDF

aa r X i v : . [ s t a t . M E ] J a n Robust Extrinsic Regression Analysis for ManifoldValued Data

Hwiyoung Lee [email protected]

Department of Statistics, Florida State UniversityJanuary 28, 2021

Abstract

Recently, there has been a growing need in analyzing data on manifolds owing totheir important role in diverse ﬁelds of science and engineering. In the literature ofmanifold-valued data analysis up till now, however, only a few works have been car-ried out concerning the robustness of estimation against noises, outliers, and othersources of perturbations. In this regard, we introduce a novel extrinsic framework foranalyzing manifold valued data in a robust manner. First, by extending the notionof the geometric median, we propose a new robust location parameter on manifolds,so-called the extrinsic median. A robust extrinsic regression method is also developedby incorporating the conditional extrinsic median into the classical local polynomialregression method. We present the Weiszfeld’s algorithm for implementing the pro-posed methods. The promising performance of our approach against existing methodsis illustrated through simulation studies.

Key words:

Robust statistics, Extrinsic median, Nonparametric regression, Rieman-nian manifolds

Over the past few decades, analyzing data taking vales in non-Euclidean spaces, mostly non-linear manifolds has attracted increased attention in a wide range of applications, because1t allows a richer and more accurate statistical inference based on the usage of the geomet-rical properties of the underlying data space. Examples of such data types that especiallylie on Riemannian manifolds, include directions of points on a sphere (Fisher et al., 1987;Mardia and Jupp, 1999), shapes of conﬁgurations extracted from images (Bhattacharya and Bhattacharya,2012), data sitting on Stiefel and Grassmann manifolds (Chikuse, 2003), symmetric positivedeﬁnite matrices arising as observations in diﬀusion tensor magnetic resonance imaging (DT-MRI) (Zhu et al., 2009; Yuan et al., 2012), and other types of medical images.In common with the traditional Euclidean case, statistical inference on the aforemen-tioned manifolds begins by deﬁning the notion of the mean on a certain metric space wheredata resides. Suppose a random object X is deﬁned on a metric space ( M , ρ ), and let Q ( · )be the probability measure of X . Then one may consider adopting the traditional deﬁni-tion of the mean to generalize the notion of the mean on an arbitrary metric space, i.e., E ( X ) = R M x Q ( d x ). Unfortunately, however, this attempt at generalization is not directlyapplicable to the non-Euclidean setting, because it contains non-vector valued integral, whichappears to be analytically intractable. Therefore, the conventional deﬁnition of the meanneeds to be adapted so that it can go beyond Euclidean spaces. The most commonly usedone in the literature is the Fr´echet mean (Fr´echet, 1948), in which the mean is deﬁned as aminimizer of the real valued function deﬁned on metric spaces. To be more speciﬁc, for any q ∈ M , consider the Fr´echet function of the following form F : M → R q ∈ M 7→ F ( q ) : E (cid:0) ρ ( X , q ) (cid:1) = Z M ρ ( x , q ) Q ( d x ) , (1)where ρ denotes generic metric on M . Then the Fr´echet mean is deﬁned as the minimizerof the Fr´echet function above, i.e., µ F = argmin q ∈M Z M ρ ( x , q ) Q ( d x ) . (2)2nd also, for a given observation x , · · · , x n ∈ M , which consists n independent realizationsof X , the sample Fr´echet mean is deﬁned as X = argmin q ∈M P ni =1 ρ ( x i , q ). Given therelation between the mean and the variance, the above generalization of the mean makesintuitive sense, because it gives analogous deﬁnition of the Euclidean mean which is charac-terized as the minimizer of the variance function, the sum of the squared deviation. In thisregard, the Fr´echet function itself is commonly referred to as the Fr´echet variance.But, ﬁrst and foremost, what needs to be emphasized is that a metric ρ is not uniquefor any particular manifold, and there are many possible choices. Regarding this issue,two diﬀerent types of distance functions have been typically considered in the literature ofmanifold valued data analysis. The ﬁrst possible choice is the intrinsic distance, that is thegeodesic distance associated with the Riemannian structure g on M . The other type ofdistance is the Euclidean distance induced by the embedding J : M → E d , which is alsoreferred to as the extrinsic distance. The former and the latter distances lead to the intrinsicand the extrinsic data analysis, respectively.Most of the previous works on analyzing manifold valued data have mainly focused ondeveloping statistical methods based on variants of the Fr´echet mean. For instance, by in-troducing the conditional Fr´echet mean, Petersen and M¨uller (2019) developed a regressionmodel having a random object in a metric space as a response variable. However, it is wellknown that the least squares based methods are severely degraded when there exists outliersin the data or the underlying data distribution is heavy tailed. Thus, the lack of statisti-cal robustness, incurred by the squared distance involved in (2), becomes apparent in theFr´echet mean as well. Whereas, in the Euclidean space setting, considerable eﬀorts have beendevoted to improving the robustness of estimators(see Huber, 1964; Huber and Ronchetti,2009; Hampel et al., 1986, and references therein for a review), far less attention has beenpaid to manifolds. Indeed, one simple way to enhance robustness of estimators is replac-ing the squared distance by the unsquared distance. In the case of Euclidean space, where M = R d , ρ ( x , x ′ ) = k x − x ′ k , this approach has a geometric median (Haldane, 1948) as3 special case. Along the same line, the intrinsic geometric median on Riemannian mani-folds, obtained by minimizing the Fr´echet function associated with the unsquared geodesicdistance, has been proposed by Fletcher et al. (2009). Motivated by the success of the in-trinsic geometric median, the primary contribution of this paper is to develop a novel robustlocation parameter within an extrinsic framework, which entails a computationally eﬃcientalgorithm. Moreover, adopting the concept of the classical local polynomial modeling, weimplement the robust extrinsic local regression model for manifold valued response andEuclidean predictor. This can be accomplished by extending the concept of the proposedextrinsic median to the notion of the conditional extrinsic median.The rest of this paper is organized as follows. The proposed extrinsic median is intro-duced in Section 2, along with a brief review of the extrinsic framework for manifold dataanalysis. Application of the extrinsic median to two diﬀerent manifolds is also demonstratedin Section 3. In Section 4, we develop the robust extrinsic local regression (RELR) model,with algorithmic details. In Section 5, the proposed RELR is implemented in the Kendall’splanar shape space and its promising properties are illustrated through simulation studies.Finally, we conclude the paper in Section 6 with a short discussion and possible directionsfor the future study. In this section, we develop the extrinsic median which provides a statistically robust andcomputationally eﬃcient way of estimating the center point of the data residing on manifolds.Before describing our method, it is useful to begin with a brief review of the extrinsicframework for manifold valued data analysis on which the proposed scheme is based, andthe motivation that initiated this study. 4 .1 Extrinsic framework on manifold valued data analysis

The essential idea behind the extrinsic analysis is that any d -dimensional manifold M canbe embedded in a higher-dimensional Euclidean space R D , where d < D (Whitney, 1944)via an embedding J . Thus, to understand the extrinsic approach, it is necessary to recallthe deﬁnition of embedding J . First consider a diﬀerentiable map J : M → R D , whosediﬀerential d p J : T p M → T J ( p ) R D is a one-to-one, where T p M and T J ( p ) R D denote thetangent space at p ∈ M and the tangent space at J ( p ) on R D , respectively. Note thatthe class of diﬀerentiable maps speciﬁed above is called an immersion. Then the one-to-one immersion is called the embedding if it is a homeomorphism from M to J ( M ) withthe induced topology. Also of note is that the embedding is unfortunately not unique ingeneral, and not all choices of embedding lead to a good estimation result. In this context,the extrinsic approach has been carried out under the premise that the selected embeddingpreserves intrinsic geometry of the original manifold. Therefore, the embedding satisfying thefollowing condition is typically preferred within extrinsic framework. For a Lie group G actingon M , the embedding J : M → R D is referred to as the G equivariant embedding if thereexists the group homomorphism φ : G → GL D ( R or C ) satisfying J ( g p ) = φ ( g ) J ( p ) , ∀ p ∈M , g ∈ G , where GL D ( R or C ) denotes the general linear group which is the group of D × D invertible real, or complex matrices. This deﬁnition indicates that the group action of G canbe recovered in the embedded space J ( M ) through φ . Therefore, in light of the above, a greatamount of geometric feature of the manifold is preserved in the embedded Euclidean spacevia the equivariant embedding. And the extrinsic distance between two points a manifoldcan be straightforwardly computed in an embedded space via the Euclidean norm.Considering all the notions described above, the extrinsic mean µ E on a manifold isdeﬁned as the minimizer of the Fr´echet function associated with the extrinsic distance viaan embedding J : M → R D µ E = argmin q ∈M Z M k J ( x ) − J ( q ) k Q ( d x ) . (3)5s compared to the intrinsic mean, the use of the extrinsic approach has several advantages,including (1) computational eﬃciency (Bhattacharya et al., 2011), (2) milder conditions forexistence and uniqueness of the solution. Moreover, the sample extrinsic mean often has aclosed form solution. Thus, we here derive the extrinsic mean in an explicit form. To do thisthe following deﬁnition which gives the uniqueness condition of the extrinsic mean shouldbe noted ﬁrst. A point y ∈ R D is said to be J -nonfocal if there exists a unique point p ∈ M satisfying inf x ∈M k y − J ( x ) k = k y − J ( p ) k . Then we let µ = R R D u e Q ( d u ) be the mean vectorof the induced probability measure e Q = Q ◦ J − , which is the image of Q in R D . Then theFr´echet function associated with the extrinsic distance, the right hand side of (3), also canbe written as F ( q ) = k J ( q ) − µ k + Z R D k x − µ k e Q ( d x ) . (4)Hence, we have inf q ∈M F ( q ) = inf J ( q ) ∈ f M k J ( q ) − µ k + R R D k x − µ k e Q ( d x ), where f M = J ( M ) denotes the image of the embedding. This indicates the set of points x ∈ M satisfyinginf j ( q ) ∈ f M k J ( q ) − µ k = k J ( x ) − µ k consists the extrinsic mean set. And since, (4) isminimized on f M by J ( q ) = P ( µ ), where P : R D → f M such that for ∀ u ′ ∈ f M , P ( y ) = { u ∈ f M : k u − y k ≤ k u ′ − y k} , the extrinsic mean uniquely exists if and only if the mean vector µ is a J -nonfocal point. In that case, the extrinsic mean is obtained by taking the inverse ofthe embedding, i.e., µ E = J − ( P ( µ )). Following from the above, the sample extrinsic meanis obtained in a straightforward manner. Suppose we observe x , . . . , x n ∈ M , consistingof independent and identically distributed copies of X , then the sample extrinsic mean isgiven by X E = J − {P ( J ( X )) } , where J ( X ) = P ni =1 J ( x i ) /n . Theoretical properties of thesample extrinsic mean, including asymptotic distribution, consistency, and the uniquenessconditions are well established in Bhattacharya and Patrangenaru (2003, 2005).6 .2 Extrinsic Median Before proceeding to present our proposed method, we begin by giving a quick overviewof the existing Euclidean geometric median. In the Euclidean multivariate setting, a largebody of research has been devoted to developing the robust estimation of the central point(see for a review Small, 1990), among which the geometric median, initially proposed byHaldane (1948), has received the greatest attention over the last decades due both to itsnice robustness properties and computational eﬃciency (Cardot et al., 2013, 2017). Thegeometric median of a random variable X ∈ R k is deﬁned by m = argmin q ∈ R k E k X − q k , oralternatively but equivalently, is obtained by minimizing R R k ( k x − q k − k x k ) Q ( d x ) . Notethat the latter expression has been more commonly adopted in practice, since no assumptionregarding the ﬁrst order moment of X needs to be imposed. Moreover, when k = 1 the abovedeﬁnition corresponds with the classical notion of the median which is deﬁned in terms ofthe cumulative distribution function. In this sense, the geometric median plays a role ofthe multivariate generalization of the univariate median. Now suppose that we observe X = { x , · · · , x n } consisting of n independent and identically distributed realizations of X , then the sample geometric median b m , which provides the natural estimation of m , isobtained by ﬁnding the optimal value that minimizes the sum of Euclidean distances to givendata points, i.e, b m = argmin q ∈ R k n X i =1 ( k x i − q k − k x i k ) . The above optimization problem is also known as the Fermat-Weber problem (Weber, 1929),and the numerical algorithm for solving the geometric median problem was ﬁrstly introducedby Weiszfeld (1937). It is shown in Kemperman (1987) that the sample geometric median isuniquely determined unless all the given observations do not lie on the same line.Although many nice properties have been investigated including invariance under rotationand translation, asymptotic behavior (M¨ott¨onen et al., 2010), concentration (Minsker, 2015),7owever, the most notable advantage of the geometric median over the mean is that it pro-vides a robust estimation of the centrality under the presence of noise in the data. The robust-ness of the estimator is usually measured by the breakdown point. For a given data X in theabove, we further consider the outlier-contaminated data X ∗ m = { x ∗ , · · · , x ∗ m , x m +1 , · · · , x n } ,where the ﬁrst m elements are replaced by extreme noises, then the breakdown point of theestimator T n is deﬁned by B ( T n ) = min ≤ m ≤ n ( mn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup X ∗ m k T n ( X ∗ m ) − T n ( X ) k = ∞ ) . In an intuitive sense, the breakdown point can be interpreted as the highest proportion ofcontamination that the estimator can tolerate before the diﬀerence between the estimatedresult obtained from the contaminated data and the initial result goes to inﬁnity. Beingless aﬀected by outliers, the geometric median achieves the asymptotic breakdown point of0 . /n , meaning that only one single extreme valuechanges the estimation result arbitrary.We now turn our attention to manifolds. Much of the research regarding estimation ofthe central location of data on manifolds has focused on the variants of the Fr´echet meanincluding the intrinsic, extrinsic mean. However, the common drawback of the least squarebased methods is their lack of robustness to extreme values, which makes the Fr´echet meaninevitably sensitive to heavy tailed distributions and outlying values. Nevertheless, eventhough there has been a considerable increase in the need for robust statistical methods onmanifolds, less has been done on this issue. The pioneering attempt to address this is seen inthe work of Fletcher et al. (2009), where they proposed the intrinsic median by substitutingthe squared geodesic distance employed in the Fr´echet function with the unsquared one,i.e., argmin q ∈M R M ρ ( x , q ) Q ( d x ). Although their approach has had a great deal of success8n generalizing the notion of the median to Riemannian manifolds by attaining the samebreakdown point as in the Euclidean case, it has some inherent drawbacks that may limitits application. (1) For example, it is often diﬃcult to derive conditions for the uniquenessand the existence of the intrinsic median without restrictions on its support. (2) Moreover,even when the intrinsic median exists, it requires iterated algorithms on manifolds whichmay incur a large amount of computational overhead. These drawbacks highlight the needfor the development of novel approaches aimed at giving a more computationally eﬃcientmethod in which the existence and uniqueness conditions are well established and easy tounderstand.In an attempt to address the methodological shortcomings of the intrinsic median de-scribed above, we propose the following new robust location parameter by making use of theunsquared extrinsic distance, m E = argmin q ∈M Z M k J ( x ) − J ( q ) kQ ( d x ) . (5)Given observations x , · · · , x n consisting of independent and identically distributed copiesof manifold valued random variable X , the above location parameter, which we call thepopulation extrinsic median, can be estimated by replacing Q with the empirical measure b Q = 1 /n P ni =1 δ x i , i.e, b m E = argmin q ∈M n X i =1 k J ( x i ) − J ( q ) k = J − P (cid:16) argmin m ∈ R D n X i =1 k J ( x i ) − m k (cid:17)! . (6)Unlike the sample extrinsic mean which has a closed form expression depending on the projec-tion map, the sample extrinsic median requires an iterative algorithm, called the Weiszfeld’salgorithm for solving the inner minimization problem. But taking advantage of Euclideangeometry, the proposed extrinsic approach allows us to exploit the original form of Weiszfeld9lgorithm without requiring any further modiﬁcations. Indeed, the following Algorithm 1for solving (6) can be easily derived due to the convexity of the object function associatedwith the Euclidean norm, f ( mmm ) = P ni =1 k J ( x i ) − mmm k . Algorithm 1

Extrinsic Median

Input: n observations X = { x , · · · , x n } Initialize : t = 0 , mmm and ε while k mmm t +1 − mmm t k < ε do Compute the gradient direction ∇ f ( mmm t ) n X i =1 mmm t − J ( x i ) k mmm t − J ( x i ) k Compute the step size s t = n X i =1 k mmm t − J ( x i ) k ! − Update m t +1 mmm t +1 = mmm t − s t · ∇ f ( mmm t ) t ← t + 1 end whileOutput: Estimated robust estimator b m E = J − ( P ( mmm ∗ )) ⊲ mmm ∗ denotes the optimal value.In fact, when incorporated into the extrinsic framework, without incurring the computa-tional overhead encountered in the Riemannian manifolds optimization, our approach possessa practical advantage over the intrinsic geometric median algorithm. Speciﬁcally, in contrastwith the intrinsic geometric median algorithm (Fletcher et al., 2009), mmm t +1 = Exp mmm t ( α v t ) , v t = n X i =1 Log mmm t ( x i ) ρ ( mmm t , x i ) · n X i =1 ρ ( mmm t , x i ) ! − , in which Exp : T m t → M , Log : M → T m t have to be repeatedly evaluated at each iteration,our method can reduce the additional computational cost caused by the above exponentialand logarithm mapping. As indicated above, it should be emphasized that although the datalie on manifolds, the proposed algorithm itself operates in Euclidean space without suﬀering10rom geometrical restrictions and constraints posed by non-Euclidean data domains.Also of importance is that when all given data points are not colinear (i.e., there doesn’texist y , z ∈ R p and α , · · · α n ∈ R , such that ∀ i = 1 , · · · , n , x i = y + α i z ), the Weiszfeld’salgorithm converges always to the unique optimal solution (Kuhn, 1973). For each iterationstep, mmm t

6∈ { x , · · · , x n } is typically assumed in order to ensure that the proposed algorithmconverges to the global optimal solution. Details of the algorithm including derivation andtheir convergence analysis are deferred to Section 4, in which we discuss the algorithm forsolving the robust extrinsic local regression (Algorithm 2) of which Algorithm 1 is a specialcase. In this section, the practical applicability and performance regarding robustness of the ex-trinsic median is examined through simulation studies on two important manifolds. To gainfurther insights into the extrinsic median, the experiment was carried out under diﬀerentconditions which may possibly be encountered in practice. Results are compared with com-peting methods including the extrinsic mean.

The ﬁrst and simplest application is an d -unit sphere, S d = { x ∈ R d +1 : k x k = 1 } , which is a d -dimensional submanifold of R d +1 . It can be embedded into R d +1 through the inclusion map ι : S d → R d +1 , ι ( x ) = x . The projection map P : R d +1 → S d is deﬁned by P ( µ ) = µ / k µ k ,where µ = R R d +1 x e Q ( d x ) is the mean vector calculated in the ambient space of R d +1 and e Q = Q ◦ ι − denotes the induced probability measure. Note that µ is ι -nonfocal unless µ = . For further details about statistical analysis on S d , we refer to Fisher et al. (1987),Mardia and Jupp (1999) and references therein.In the following, the performance of extrinsic median on S d is illustrated by simulation11tudies. To ease visualization of how the generated data looks like and how the extrinsicmedian is capable to provide robust estimation than the extrinsic mean, the simplest case S , for which data is observed as the form of direction on a unit circle in 2-dimensionalEuclidean plane R , is considered. Note that the data on S is typically represented byan angle measured in radians θ ∈ [0 , π ), or the unit vector x = (cos θ, sin θ ) ⊤ from theorigin. The performance of the extrinsic median is compared under two diﬀerent simulationscenarios as follows. In the ﬁrst scenario, outliers are artiﬁcially imposed to the von Mises(VM) distribution, whereas in the second scenario, heavy tailed random observations aregenerated from the general wrapped α stable (WS) distribution. The detailed description ofeach scenario is given in the following. Scenario 1) : The von Mises distribution with the Normal outliers.

We suppose θ follows the von Misese distribution, VM( µ, κ ) with the density function f ( θ ) = e κ cos( θ − µ ) πI ( κ ) , where µ, κ denote the mean direction and concentration parameter, respectively and I ( κ )is the modiﬁed Bessel function of order 0. Note that larger value of κ means higher con-centration towards µ . We ﬁrst generate random data { θ i } ni =1 consisting independent andidentically distributed copies of θ ∼ VM( µ, κ ), then n cont outliers o ′ j iid ∼ Normal( µ out , σ ),where µ out = µ , are added to the initial data set so that the contamination level satisﬁes theprespeciﬁed value r = n cont /n . Additionally normalization of the generated outliers o j = o ′ j (mod 2 π ) is required to ensure 0 ≤ o j < π . Scenario 2) : The wrapped α -stable random variable. The density function of a wrapped α -stable random variable θ is given by f ( θ ) = 12 π + 1 π ∞ X k =1 exp( − τ α k α ) cos (cid:16) k ( θ − µ ) − τ α k α β tan απ (cid:17) , (7)where 0 < α ≤ τ ≥ | β | ≤ α yield heavy tailed distributions but larger τ valuesyield more highly dispersed distributions. The beneﬁt of using wrapped α -stable distributionis that it provides a high degree of ﬂexibility in modeling directional data in the sensethat it contains many popular circular distributions as special cases, including the wrappednormal distribution ( α = 2) and the wrapped Cauchy distribution ( α = 1 , β = 0); seeJammalamadaka and SenGupta (2001) for further details.Representative illustrations of simulation scenario 1 and scenario 2 are displayed inFigure 1 (left and right panel, respectively), together with estimated values. In both sce-narios, we observed that extrinsic mean estimations were forced by outliers to be pulledfar away from the true mean direction ( µ = 0, i.e., x = (1 ,

0) in the Cartesian coordinatesystem). In Table 1, the extrinsic median is compared with the extrinsic mean in terms ofnorm of diﬀerence between the true mean direction and the estimated direction. The resultsare averaged over 20 replications. In the ﬁrst scenario, four diﬀerent settings are consideredaccording to the level of contamination, r ∈ { , . , . , . } , where 0 represents no outlierexists. As would be expected, the result obtained from the ﬁrst scenario indicates that asthe contamination level becomes higher, the extrinsic mean is far more vulnerable to thepresence of outliers than the extrinsic median. The bottom panel of the table shows theresult of the scenario 2 in which we ﬁx β = 0 for the symmetry of the distribution andvary the tail heaviness level by adjusting α from 0 . τ = 0 . ,

2. It is observed that extrinsic median not only has a betterpredictive ability in the case of heavy tailed data which corresponds to small values of α ,but also a comparable performance was achieved even in non-heavy tailed data, generatedfrom the wrapped normal distribution ( α = 2). For the second application of the extrinsic median, we consider the Kendall’s planar shapespace of k -ads, denoted by Σ k (Kendall, 1984) which is the most popular manifold in land-13 Methods Extrinsic Mean Extrinsic Median True

Outliers contaminated distribution -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0

Methods Extrinsic Mean Extrinsic Median True

General WS distribution

Figure 1: Left: an example of the Scenario 1), consisting of observations drawn from theVM distribution and outliers displayed in light grey circles and asterisks, respectively. Right:Scenario 2). In both setting, the ﬁt of the extrinsic mean and median are displayed with thetrue mean value.mark based shape analysis literature. Before proceeding to present simulation study on theplanar shape space, we give necessary preliminaries about this space.The planar shape can be deﬁned as a random object that is invariant under the Eu-clidean similarity transformation. Therefore, the planar shape is identiﬁed as the remaininggeometric information after ﬁltering out the eﬀect of translation, scaling, and rotation. Toease understanding of this nonlinear manifold, let us begin by demonstrating the geometryof the planar shape space. First, the unregistered k -ads which is a landmark conﬁgurationthat describes a shape of an object can be conveniently placed on a complex plane as a setof k complex numbers, i.e., z = ( z , · · · , z k ), where z j = x j + iy j ∈ C . Then one can obtainthe preshape of z by quotienting out the eﬀect of translation and scale u = z − h z ik z − h z ik , where h z i = (¯ z, · · · , ¯ z ), and ¯ z = k P kj =1 z j . This indicates that the preshape space isequivalent to a complex hypersphere, C S k − = n u ∈ C k | P ki =1 u j = 0 , k u k = 1 o . Then theshape [ z ] of z which is the geometric object that is invariant under a rotation eﬀect, is14 cenario 1 : The outlier contaminated caseRatio of outlilerNo outlier 0.1 0.2 0.4 N E. Mean E. Med E. Mean E. Med E. Mean E. Med E. Mean E. Med10 0.0100 0.0090 0.0550 0.0129 0.1376 0.0132 0.2778 0.016950 0.0064 0.0066 0.0663 0.0106 0.1311 0.0115 0.2702 0.0122100 0.0032 0.0033 0.0624 0.0101 0.1315 0.0087 0.2732 0.0118200 0.0025 0.0028 0.0655 0.0099 0.1331 0.0101 0.2730 0.0136

Scenario 2 :

The heavy tailed distribution α τ N E. Mean E. Med E. Mean E. Med E. Mean E. Med E. Mean E. Med0.2 10 0.3411 0.1658 0.1971 0.0540 0.1044 0.0505 0.0376 0.039350 0.1053 0.0011 0.0709 0.0135 0.0454 0.0199 0.0188 0.0197100 0.0653 0.0003 0.0432 0.0095 0.0364 0.0141 0.0123 0.0194200 0.0582 0.0003 0.0211 0.0062 0.0198 0.0091 0.0086 0.00952 10 0.3105 0.1294 0.4291 0.4054 0.4446 0.3963 0.3876 0.450150 0.1471 0.0014 0.2221 0.1112 0.1619 0.1550 0.1824 0.2097100 0.1312 0.0023 0.1434 0.0858 0.1415 0.1339 0.2075 0.2641200 0.0867 0.0005 0.0894 0.0567 0.0967 0.0845 0.1161 0.1421Table 1: Results of the experiment described in Section 3.1. The result of scenario 1 and 2are presented in the top and bottom panels of the table, respectivelyobtained by considering all rotated version of u , i.e.,[ z ] = (cid:8) e iθ u : 0 ≤ θ < π (cid:9) . As the shapeis deﬁned as the orbit of u ∈ C S k − , Kendall’s planar shape space Σ k = C S k − /SO (2) isthe quotient space of the preshape space under the action of special orthogonal group ofdimension 2, SO (2) = { A ∈ GL | A − = A ⊤ , det( A ) = } . Alternatively, the eﬀects ofscaling by a scalar r > ≤ θ < π can be simultaneously ﬁlteredout via multiplying by the complex number λ = re iθ from the centralized k -ad conﬁguration z −h z i , i.e., [ z ] = { λ ( z −h z i ) : λ ∈ C \{ }} . Due to this algebraically simpler characterization,the planar shape space is equivalently identiﬁed as the complex projective space Σ km ≃ C P k − that is the space of all complex lines through the origin in C k − . More detailed explanationof the geometrical structure of the shape manifold is provided in Dryden and Mardia (1998);15hattacharya and Bhattacharya (2012).We now describe the extrinsic approach in Σ k . Due to Kent (1992), in the Kendall’splanar shape space the Veronese–Whitney embedding is typically used, which maps Σ k intothe space of k × k complex Hermitian matrices S ( k, C ) by J : Σ k → S ( k, C )[ z ] J ([ z ]) = uu ∗ , (8)where u ∗ denotes the complex conjugate transpose of u . Furthermore, since J ( A [ z ]) = Auu ∗ A ∗ holds for any A ∈ SU ( k ), where SU ( k ) = { A ∈ GL k ( C ) | AA ∗ = I , det( A ) = } denotes the special unitary group, the Veronese-Whitney embedding is shown to be the SU ( k ) equivariant embedding, i.e., J ( A [ z ]) = φ ( A ) J ([ z ]). It follows directly by takingthe Lie group homomorphism φ : SU( k ) → GL k ( C ) such that φ ( A ) B = ABA ∗ , where B ∈ S ( k, C ). It also should be noted that the squared extrinsic distance of two planarshapes is deﬁned in terms of the Frobenius norm of a complex matrix ρ E ([ z ] , [ z ]) = k J ([ z ]) − J ([ z ]) k F = Trace (cid:16) { J ([ z ]) − J ([ z ]) } { J ([ z ]) − J ([ z ]) } ∗ (cid:17) = k X j =1 k X i =1 |{ J ([ z ]) − J ([ z ]) } i,j | . (9)Since the above extrinsic distance takes into account every k element of J ([ z ]) − J ([ z ]), itcan be viewed as the natural Euclidean distance between the two embedded shapes J ([ z ])and J ([ z ]). Lastly, the inverse and projection map of the embedding, J − ( P ( · )) in (6),remain to be identiﬁed. Let e X be the arbitrary point on the ambient Euclidean space R D , then the projection mapping of e X onto the image of the embedding f M = J (Σ k ) isgiven by γγ ∗ , where γ is the unit eigenvector of e X corresponding to the largest eigenvalue.Subsequently, the inverse map of J − ( γγ ∗ ) = [ γ ] can be obtained directly from (8) without16xtra operations.Now, in order to gauge the performance of the proposed method, we perform simulationexperiments on the planar shape space by investigating the corpus callosum (CC) data ex-tracted from the subset of ADHD-200 dataset ( http://fcon_1000.projects.nitrc.org/indi/adhd200/ ).The original dataset includes functional magnetic resonance imaging (fMRI) scans of sub-jects categorized into four diﬀerent groups based on their symptoms and conditions; (1) Typ-ically developing children, (2) ADHD-Hyperactive, (3) ADHD-Inattentive, and (4) ADHD-Combined. The CC shapes of 647 subjects which consist of 50 landmarks were preprocessedand analyzed by Huang et al. (2015) to illustrate their clustering method. In this exper-iment, however, only a subset of the data (the CC shapes extracted from 404 typicallydeveloping children) was utilized. Since the main aim of this simulation study is to see howthe extrinsic median behaves robustly in a noisy environment, where a number of landmarksare contaminated by outliers, we further manipulated the data by assigning random noisesgenerated from Normal( µ = 1000 , σ = 5) to the real parts (the x cooridinates) of the 10th ∼ r rangingfrom 0 to 0 .

4. Additionally, the extrinsic median was compared to several competing meth-ods including the maximum likelihood estimator of the isotropic oﬀset Gaussian distribution(Mardia and Dryden, 1989; Dryden and Mardia, 1991) and diﬀerent variants of the Fr´echetmeans such as intrinsic mean, the Fr´echet mean associated with the partial Procrustes dis-tance and the extrinsic mean.Figure 2 shows the CC shapes obtained from several simulated data with diﬀerent noiselevel r = { , . , . } . As shown in the left panel ( r = 0), no remarkable diﬀerence wasobserved in estimated shapes between methods. On the other hand, however, the middleand the right panels present that with the exception of the extrinsic median, other methodsappeared to be aﬀected by outliers and led to the distortion in the estimated shapes. Impor-tantly, although the deformation of the estimated shape is occurred as well in the extrinsicmedian at the highest noise level tested, we have seen that it stays much closer to its initial17 o outlier Ratio = 0.2 Ratio = 0.4 -0.2 0.0 0.2 0.4 -0.2 0.0 0.2 0.4 -0.2 0.0 0.2 0.4-0.20.00.2 Extrinsic Mean Extrinsic Median Intrinsic Mean MLE Partial Proc.

Figure 2: Examples of normal C.C. data and estimated shapes. Each individual shape isdisplayed in light grey solid line, and the results of diﬀerent methods are represented bydiﬀerent colors and line styles.result, than those of the other methods compared.We now introduce the measure that quantiﬁes the robustness of estimators on the planarshape space. To do this, we let x , b x ∗ denote the estimated shape obtained from the uncon-taminated and contaminated data, respectively. Then the full Procrustes distance between x and b x ∗ , i.e., ρ F P ( b x ∗ , x ) = p − |h b x ∗ , x i| , is considered to assess whether methods canprovide the robust estimation without being inﬂuenced by outlier values. This appears anal-ogous to that used by the breakdown point in which the Euclidean version of the foregoingquantity, k T n ( X ∗ m ) − T n ( X ) k is exploited. However, unlike in the case of the breakdownpoint, which gives the highest fraction of gross outliers in the data that can be handled byan estimator, the smaller value of ρ F P ( b x ∗ , x ) implies the method has more resistance to out-liers. Results of this simulation, averaged over 20 replications for the diﬀerent contaminationlevels, are presented in Figure 3. This illustrates that the proposed extrinsic median has aremarkable ability to resist against outliers in the case where the data are contaminated18 .1 Ratio of outlier P r o c r u s t e s d i s t an c e MethodsExtrinsic MeanExtrinsic MedianIntrinsic MeanMLEPartial Procrustes

Figure 3: Graphical result of the simulation study. Line plots give the full Procrustesdistances, ρ F P ( b x ∗ , x ) for diﬀerent methods as a function of the contamination level r .with signiﬁcant levels of noise. All the methods, however, show deterioration in performancewhich mainly caused by the squared distance term employed in models. In this section, we present the robust extrinsic local regression. To do this, we ﬁrst considera nonparametric regression model Y = f ( X ) + ε with a response Y taking value in M , aEuclidean predictor X ∈ R p , and f an unknown regression function of interest. Suppose weobserve D = { ( x , y ) , · · · , ( x n , y n ) } consisting of independent and identically distributedcopies of ( X , Y ). One of the major challenges involved in developing a regression modelhaving a manifold valued response lies in the lack of vector space structure of M , whichcauses the traditional Euclidean approaches including the least square method not to beobviously applicable. For example, since linear operations are limited on ( M , ρ ), evaluatingthe diﬀerence between the estimated value and the observed value, i.e., y i − b f ( x i ), is notpractical. Moreover, the geometrical feasibility of the estimation, i.e., b f ( x i ) ∈ M , can not19e guaranteed unless additional restrictions are imposed on the typical regression models.For the reasons outlined above, there has been a great demand for the development of aregression model having a manifold valued response, and a large body of literature addressingthis problem has accumulated over the past two decades (Shi et al., 2009; Yuan et al., 2012;Cornea et al., 2017). In particular, the extrinsic local regression (ELR) method has beeninitially established by Lin et al. (2017). More recently, Petersen and M¨uller (2019) proposedthe Fr´echet regression on general metric spaces by considering the following conditionalFr´echet mean F ( xxx ) = argmin q ∈M Z M ρ ( q , y ) Q ( d y | xxx ) , where Q ( y | xxx ) denotes the conditional distribution of Y given X = xxx . Applications of theabove framework is very broad as its usage is not limited to manifolds. However, despitepromising progress in developing regression models for a non-Euclidean valued response, allthe aforementioned methods commonly suﬀer from lack of robustness caused by the squareddistances. To remedy this problem, we propose the robust extrinsic local regression (RELR),which can be accomplished easily by linking the extrinsic median to a classical nonparametriclocal kernel regression. The remainder of this section is dedicated to presenting the detailsof RELR, together with the proposed numerical algorithm.We begin by introducing the following population robust extrinsic regression function,which extends the notion of the conditional median to manifolds, F RE ( xxx ) = argmin q ∈M Z M k J ( q ) − J ( y ) kQ ( d y | xxx ) (10)= argmin q ∈M Z f M k J ( q ) − z k e Q ( d z | xxx ) , where e Q ( ·| xxx ) = Q ( ·| xxx ) ◦ J − is the induced conditional probability measure of Y given X = xxx deﬁned on J ( M ). While the proposed extrinsic approach is similar in spirit to20hose developed in Lin et al. (2017), our work diﬀers in that it makes use of the unsquaredextrinsic distance rather than the squared one. The unknown regression function F ( · ) can beestimated at the evaluation point xxx by the classical local polynomial ﬁtting (Fan and Gijbels,1996) b F RE ( xxx ) = J − P (cid:18) argmin yyy ∈ R D n X i =1 K H ( x i − xxx ) k yyy − J ( y i ) k P nj =1 K H ( x j − xxx ) (cid:19)! . (11)In the above notation, K H : R p → R denotes the multivariate kernel function which is deﬁnedas K H ( u ) = H ) K ( H − u ), where u = ( u , · · · , u p ) ⊤ ∈ R P , H is a p × p symmetric and posi-tive deﬁnite smoothing matrix, and K : R p → R satisﬁes R R p K ( u ) d u = 1 , R R p u K ( u ) d u = 0,and R R p u K ( u ) d u < ∞ . Note that the case H = Diag( h , · · · , h p ) corresponds to us-ing a product kernel obtained by multiplying p univariate kernels with diﬀerent band-widths, i.e., K H ( u ) = Q pi =1 1 h i k i ( u i /h i ). Regarding solving the inner optimization problemin (11), note that since it takes the form of the weighted Fermat-Weber problem, wherethe weight imposed on the i th observation is formulated in terms of the kernel function w i = K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ), the robust extrinsic local regression can be readilysolved by the generalized Weiszfeld’s algorithm.We now describe the numerical algorithm for obtaining the solution of the localized robustregression estimator. As it has been assumed in the development of the extrinsic median, thenon-colinearity of the embedded responses J ( y ) , · · · , J ( y n ) is required in order to ensurethe convergence of the algorithm. We also let f ( yyy ) = P ni =1 w i k yyy − J ( y i ) k be the objectivefunction, then by the strict convexity of f , the optimal solution is attained at the stationarypoint ∇ f ( yyy ) = P ni =1 w i yyy − J ( y i ) k yyy − J ( y i ) k ≡

0. Then, since the optimal yyy ∗ satisﬁes the followingequation ( P ni =1 w i / k yyy ∗ − J ( y i ) k ) yyy ∗ = P ni =1 w i J ( y i ) / k yyy ∗ − J ( y i ) k , the iterative algorithmfor updating yyy on the embedded space has the following form yyy t +1 = n X i =1 w i k yyy t − J ( y i ) k ! − n X i =1 w i J ( y i ) k yyy t − J ( y i ) k . (12)21n Algorithm 2, after a simple algebraic calculation, we can reformulate the update rule in(12) into the form of the gradient descent with the step size s t = ( P ni =1 w i / k yyy t − J ( y i ) k ) − .Once ﬁnding the optimal solution yyy ∗ of the inner minimization problem on the ambient spacein R D , the ﬁnal regression estimator of b F RE ( xxx ) on M can be straightforwardly obtained byevaluating the projection map of yyy ∗ onto the image of the J ( M ) and taking the inverse mapof the embedding. Note that in order to prevent the algorithm from getting stuck on thenon optimal embedded points, { yyy t } t ≥

6∈ { J ( y ) , · · · , J ( y n ) } needs to be assumed. Algorithm 2

Robust extrinsic local regression (RELR)

Input: n observations D = { ( x , y ) , · · · , ( x n , y n ) } , evaluation point xxx Initialize : t = 0 , yyy and ε while k yyy t +1 − yyy t k > ε do Compute the gradient direction ∇ f ( yyy t ) n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) yyy t − J ( y i ) k yyy t − J ( y i ) k Compute the step size s t = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) , k yyy t − J ( y i ) k ! − Update yyy t +1 yyy t +1 = yyy t − s t · ∇ f ( yyy t ) t ← t + 1 end whileOutput: Estimated robust estimator b F RE ( xxx ) = J − ( P ( yyy ∗ )) ⊲ yyy ∗ the optimal valueWe are now in a position to present the convergence analysis of the proposed algorithm.Before proceeding, the following results related to convergence of the classical Weiszfeld’salgorithm should be noted. For nonsmooth convex optimization problems, most exist-ing gradient descent type algorithms are known to converge to the optimal solution at arate of O (1 / √ t ), however, with the initial value proposed by Vardi and Zhang (2001), onecan show that the Weiszfeld’s algorithm attains a sublinear convergence rate O (1 /t ) (see22eck and Sabach, 2015). More surprisingly, using the smooth approximation of the objectfunction enables the algorithm to achieve an accelerated convergence rate O (1 /t ). Addi-tionally, since the proposed algorithm for solving RELR is performed on Euclidean space,similar algorithmic techniques and convergence properties previously established for the clas-sical Weiszfelds’s algorithm can be immediately utilized.First, we will henceforth make use of the following initial point of RELR algorithm, whichis the extension of the scheme in Vardi and Zhang (2001) to a nonparametric regressionsetting. For p ∈ argmin i ∈{ , ··· ,n } f ( J ( y i )), we set the initial value for Algorithm 2 by yyy = y p + t p d p , where  R p := X i = p K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) J ( y p ) − J ( y i ) k J ( y i ) − J ( y p ) k d p = − R p k R p k t p = k R p k − K H ( x p − xxx ) / P nj =1 K H ( x j − xxx ) L ( J ( y p )) . And the operator L is given by L ( yyy ) =  n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) k yyy − J ( y i ) k , yyy

6∈ { J ( y ) , · · · , J ( y n ) } X i = p K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) k J ( y p ) − J ( y i ) k , yyy = J ( y p ) (1 ≤ p ≤ n ) . Using the above initial point, we obtain the following result. Proposition 1 is of interest inits own right, since without any further manipulation of the algorithm, the carefully chosenstarting value enables Algorithm 2 to achieve the sublinear convergence rate, O (1 /t ). Proposition 1.

Suppose that all embedded response values are not colinear. Then, for any t ≥

0, we have f ( yyy t ) − f ∗ ≤ L ( J ( y p )) k yyy − yyy ∗ k t (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) , (13)23here f ∗ is the minimum of f . Proof.

To prove the sublinear convergence rate of the proposed Algorithm 2, we have useda collection of results in Beck and Sabach (2015). We ﬁrst need to derive the upper boundof the sequence { L ( yyy t ) } t ≥ , where { yyy t } t ≥ is the sequence generated by the algorithm. Forany i = 1 , · · · , n , and yyy satisfying f ( yyy ) ≤ f ( yyy ), the following inequality k yyy − J ( y i ) k ≥ f ( J ( y i )) − f ( yyy ) is satisﬁed (Lemma 8.1 in Beck and Sabach, 2015). By combining thistogether with the monotonicity of f ( yyy t ) ≤ f ( yyy ) (Corollary 3.1 in Beck and Sabach, 2015)and f ( J ( y p )) ≤ f ( J ( y i )), we have k yyy t − J ( y i ) k ≥ f ( J ( y p )) − f ( yyy ). Then by making use ofthe deﬁnition of the operator L , we obtain the following result L ( yyy t ) = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) k yyy t − J ( y i ) k ≤ f ( J ( y p )) − f ( yyy ) ≤ L ( J ( y p )) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) . (14)In the last inequality, we have use the Lemma 7.1 in Beck and Sabach (2015) that forsome j ∈ { , · · · , n } , f ( J ( y j )) − f ( J ( y j ) + t j d j ) ≥ ( k R j k − K H ( x j − xxx ) / P ni =1 K H ( x i − xxx ) k ) / L ( J ( y j )). Finally from Lemma 5.2 in Beck and Sabach (2015), which states f ( yyy n +1 ) − f ∗ ≤ L ( yyy n ) ( k yyy n − yyy ∗ k − k yyy n +1 − yyy ∗ k ) / k yyy t +1 − yyy k ≤ k yyy t − yyy k ), the upper bound f ( yyy t ) − f ∗ can be derived in the following manner. t − X n =0 (cid:0) f ( yyy n +1 ) − f ∗ (cid:1) ≤ t − X n =0 L ( yyy n )2 (cid:0) k yyy n − yyy ∗ k − k yyy n +1 − yyy ∗ k (cid:1) ≤ L ( J ( y p )) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) (cid:0) k yyy − yyy ∗ k − k yyy t − yyy ∗ k (cid:1) ≤ L ( J ( y p )) k yyy − yyy ∗ k (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) . (15)Since the sequence { f ( yyy t ) } t ≥ is non increasing, t ( f ( yyy t ) − f ∗ ) ≤ P t − n =0 ( f ( yyy n +1 ) − f ∗ ) com-pletes the proof. 24ven though we have achieved the improved rate of convergence O (1 /t ), there still ex-ists a gap in the order of the convergence rate, as compared to O (1 /t ) which is commonlyattained by the accelerated gradient based methods for solving smooth convex optimizationproblems. To further enhance the convergence rate, we here introduce the modiﬁed versionof algorithm for RELR. Since the slow convergence rate is essentially caused by the inherentnonsmoothness of the objective function f , this can be resolved by using the smooth alter-native of f , that always gives the optimal solution exactly the same as it is supposed to be.Besides the improved convergence rate, what makes this approach is even more surprisingis that for t ≥ { yyy t } 6∈ { J ( y ) , · · · , J ( y n ) } ) made on the the sequencegenerated by Algorithm 2 is not required. Given the lack of knowledge on conditions underwhich the above assumption can be guaranteed, the smooth approximation method has thepractical advantage.We ﬁnish this section by describing the derivation of the modiﬁed Weiszfeld’s algorithmfor RELR along with the convergence analysis. To do this, we begin by considering thefollowing smooth function e f s ( yyy ) : R D → R , e f s ( yyy ) = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) g b i ( yyy − J ( y i )) , (16)where g b i ( yyy − J ( y i )) =  k yyy − J ( y i ) k , k yyy − J ( y i ) k ≥ b i k yyy − J ( y i ) k b i + b i , k yyy − J ( y i ) k < b i , (17)and b i = f ( J ( y i )) − f ( yyy ). Note ﬁrst that the function e f s ( yyy ) is convex and continuouslydiﬀerentiable over R D whose gradient is Lipschitz continuous, i.e., k∇ e f s ( yyy ) − ∇ e f s ( zzz ) k ≤ L s k yyy − zzz k ∀ yyy, zzz ∈ R D , with the Lipschitz constant L s = n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) f ( J ( y i )) − f ( yyy ) . g b i ( yyy − J ( y i )) ≥ k yyy − J ( y i ) k ∀ yyy ∈ R D , it follows that we have e f s ( yyy ) ≥ f ( yyy ) which indicates e f s ( yyy ) plays a role of the upper bound of f ( yyy ). Moreover followingfrom Lemma 8.1 in Beck and Sabach (2015), k yyy ∗ − J ( y i ) k ≥ f ( J ( y i ) − f ( yyy ) = b i holdsfor i = 1 , · · · , n , where yyy ∗ is the strict global minimizer of the original objective function f . Then, according to the construction in (17), g b i ( yyy ∗ − J ( y i )) = k y ∗ − J ( y i ) k is alwayssatisﬁed, which indicates e f s ( yyy ∗ ) = f ( yyy ∗ ) < f ( yyy ) ≤ e f s ( yyy ). Thus, it is clear to see that theminimizer of the inner optimization problem in (11) must also be a global minimizer of (16),i.e., yyy ∗ = argmin yyy e f s ( yyy ) = argmin yyy f ( yyy ). In the above sense, e f s ( yyy ) allows us to smoothlyapproximate the original objective function f ( yyy ). Therefore, rather than directly workingwith f ( yyy ), we should aim to minimize e f s ( yyy ), which leads to Algorithm 3. Algorithm 3

Fast Weiszfeld algorithm for RELR

Input: n observations ( X, Y ) = { ( x , y ) , · · · , ( x n , y n ) } , evaluation point xxx Initialize : s = 1 , uuu = yyy ∈ R d and ε while k yyy t − yyy t − k > ε do For t = 1 , , · · · Update yyy t = uuu t − L s ∇ e f s ( uuu t ), where ∇ e f s ( uuu t ) = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) ×  uuu t − J ( y i ) k uuu t − J ( y i ) k , if k uuu t − J ( y i ) k ≥ b i uuu t − J ( y i ) b i , if k uuu t − J ( y i ) k < b i Update s t +1 = 1 + p s t Update uuu t +1 = yyy t + (cid:18) s t − s t +1 (cid:19) ( yyy t − yyy t − ) end whileOutput: Estimated robust estimator b F RE ( xxx ) = J − ( P ( yyy ∗ )) ⊲ y ∗ optimal valueNow we let { yyy t } t ≥ be the sequence generated by Algorithm 3, then we obtain e f s ( yyy t ) − f ∗ ≤ L s k yyy − yyy ∗ k ( t + 1) . (18)26he above convergence result follows immediately from Theorem 9.1 in Beck and Sabach(2015), and see Beck and Teboulle (2009) for a proof. Furthermore, in our RELR setting, L s is bounded from above by L s = n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) f ( J ( y i )) − f ( yyy ) ≤ n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) f ( J ( y p )) − f ( yyy )= 1 f ( J ( y p )) − f ( yyy ) ≤ L ( J ( y p )) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) , where the last inequality uses (14). Putting all pieces together and using e f s ( yyy ) − f ∗ ≥ f ( yyy ) − f ∗ , we see that the fast Weiszfeld algorithm for RELR attains the following convergencerate of O (1 /t ) : f ( yyy t ) − f ∗ ≤ k yyy − yyy ∗ k L ( J ( y p )) n ( t + 1) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17)o , which is a substantial improvement on the convergence rate in (13). In this section, the beneﬁt of the robust extrinsic local regression over the extrinsic regressionis demonstrated on the basis of simulation studies. While simulation conducted in this paperhas focused only on the case of the planar shape, the proposed method can be applied to othermanifolds in a straight forward manner. Recalling from Section 3.2, we let Y = ( z , · · · , z k )be the response variable which is a planar shape of k -ads deﬁned on Σ k , and X ∈ R p is anEuclidean predictor. By slightly modifying the polar coordinate based scheme in Lin et al.(2017), we generate synthetic planar shape data in the following manner : Generate Covariate : X = ( X , · · · , X p ) , where X i ∼ Uniform( a, b ) Coeﬃcient : β = ( β , · · · β k ) = (1 /k , · · · , k/k ) ∈ R k enerate Intercept angles : φ = ( φ , · · · , φ k ) = (1 / , · · · , k/ ∈ R k Generate Intercept radius : γ = ( γ , · · · , γ k ) = (0 . , · · · , . ∈ R k Generate Shape angles : φ ′ j ∼ Normal φ j + β j p X i =1 X i , σ φ ! Standardize angles : φ = ( φ , · · · φ k ) , where φ j = φ ′ j (mod 2 π ) Generate Shape radius : γ = ( γ , · · · , γ k ) , where γ j ∼ Normal γ j + β j p X i =1 X i , σ γ ! Convert to complex form for the landmark : z j = γ j (cos( φ j ) + i sin( φ j )) . Further, in order to conduct simulation studies under the outlier contaminated setting,we randomly add ﬁxed number of outliers to the response variable, i.e., Y ∗ = Y + Ψ =( z ∗ , · · · , z ∗ k ). The extreme value of outlier Ψ ∈ C k was generated from the k -dimensionalcomplex normal distribution, C N ( µ , Γ ), where µ = E ( Ψ ) and Γ = E (cid:0) ( Ψ − µ )( Ψ − µ ) H (cid:1) .We note that since the contaminated response Y ∗ is no longer an element in Σ k , bothtranslation and scale eﬀects have to be ﬁltered out. To help understand the process bywhich data are generated, the representative illustration of the simulated data is presentedon the left panel of Figure 4. For illustrative purpose, planar shape data is generated underthe univariate setting, and only the ﬁrst landmark is contaminated. In the right panel of thesame ﬁgure, the estimated curves are given, along with true underlying function. As shownin the ﬁgure, the curve obtained from the usual ELR (red) substantially deviates from thetrue curve (blue), while the curve corresponding to the RELR (green) is almost overlappedwith the supposed true value.One important remaining issue of the proposed RELR model that has not been high-lighted in the previous section is the bandwidth selection. It is well known that the per-formance of the local polynomial type method signiﬁcantly relies on a tuning parameter h ,called the bandwidth which plays a crucial role in controlling the degree of smoothing. Tobe speciﬁc, large h value leads to a smooth estimation, but by failing to account for a local28 -0.250.000.250.50 -0.2 0.0 0.2 0.4 0.62.5 5.0 7.5 X -0.4-0.20.00.20.40.6 -0.2 0.0 0.2 0.4 0.6 Extrinsic Mean Extrinsic Median True

Figure 4: Left : the example of the simulated data (univariate case) with r = 0 .

2, each pointis colored according to the value of the predictor X . Right : The result of estimations.The optimal bandwidths of RELR and ELR, selected by the 5-fold cross validation are h Med = 1 .

37 and h Mean = 2 .

27, respectively.variation it may introduce a signiﬁcant estimation bias, whereas small h produces a jaggedestimation, leading to a large variance. Thus it should be properly selected to balance thetrade-oﬀ between variance and squared bias. Throughout the simulation, we consider thesmoothing matrix that gives the same bandwidth in all p dimensions, i.e., H = h I p . Thoughbandwidth selection methods have been extensively studied in the early days of nonpara-metric regression modeling, in this paper we adopt 5-fold cross validation for the sake ofsimplicity.The performance of the RELR is evaluated by comparing the results with those achievedby ELR in terms of two diﬀerent measures associated with the full Procrustes distance, ρ FP = p (1 − |h z , z i| ), where z , z ∈ Σ k . Firstly, we consider MD obs = P ni =1 ρ FP ( y i , b f ( x i )) /n ,that measures diﬀerence between the estimated value b f ( x i ) and the observed value y i .Moreover, to assess whether the estimator has the beneﬁt of capturing the true signal,it is more appropriate to examine the following root mean squared error like measureRMSE true = qP ni =1 ρ FP ( f ( x i ) , b f ( x i )) /n , which quantiﬁes the diﬀerence between the pre-29 (cid:0)(cid:1)(cid:2) True (cid:5)(cid:6)(cid:7) (cid:8)(cid:9)(cid:10)

Ratio of outlier M ea s u r e Methods Extrinsic Mean Extrinsic Median

Figure 5: Results of the univariate case, as a function of the contamination level r = [0 , . obs . Right: Averaged value of RMSE against underlying trueregression function f ( x i ). To assist in the visualization, we also give a loess smooth withMonte-Carlo standard error.dictor and the true value f ( x i ). To investigate how methods are aﬀected by outliers, thecontamination level r were varied on an evenly spaced grid over [0 , . obs and RMSE true with n = 200 , p = 1, arepresented in Figure 5 (left and right panel, respectively).Overall, the performance curves, obtained across the experimental conditions, are roughlylinear and slope upward from left to right as contamination rate runs from 0 to 0.3, whichindicates performances of both RELR and ELR degrade as r increases. We also have foundthat, values of both MD obs and RMSE true corresponding to RELR are consistently lowerthan those of ELR, except for r = 0. The result from the left panel makes intuitive sense, asRELR minimizes the empirical risk function associated with the unsquared extrinsic distancewhich only diﬀers from the full Procrustes distance by a multiplicative constant. In the rightpanel, however, the value of the slope obtained from ELR, inclined toward the upper rightpoint (0 . , .

2) starting from the origin, is 1.67 which is approximately four times greaterthan that of RELR. This clearly suggests that RELR outperforms ELR by a wide margin in30atio of outliler N Measures Methods 0 0.05 0.1 0.2 0.350 MD obs

ELR 0.0323 0.0716 0.1389 0.2481 0.3505RELR 0.0320 0.0662 0.1192 0.2052 0.2806RMSE true

ELR 0.0233 0.0348 0.0692 0.1446 0.2326RELR 0.0242 0.0257 0.0400 0.0669 0.0781100 MD obs

ELR 0.0333 0.0810 0.1306 0.2441 0.3524RELR 0.0338 0.0758 0.1200 0.2098 0.2884RMSE true

ELR 0.0242 0.0358 0.0550 0.1279 0.2219RELR 0.0255 0.0274 0.0414 0.0617 0.0698200 MD obs

ELR 0.0336 0.0802 0.1351 0.2486 0.3516RELR 0.0341 0.0768 0.1235 0.2126 0.2880RMSE true

ELR 0.0237 0.0327 0.0566 0.1226 0.2093RELR 0.0248 0.0277 0.0386 0.0586 0.0720Table 2: Comparisons of RELR and ELR in terms of MD obs and RMSE true .identifying true patterns of shape changes that are masked by noise. As shown in Table 2,similar results were obtained from further experiments in a multivariate setting ( p = 3) withdiﬀerent samples sizes. In this simulation study, the proposed RELR is seen to universallyperform very well across all examined simulation scenarios. However, contrary to RELR, itappears that the performance of ELR is prone to be adversely aﬀected by the presence ofoutliers and noises, which is consistent and in line with what we would have been expectedfrom the previous location estimation performed in Section 3.2. In this paper, we have proposed the robust statistical methods on manifolds and demon-strated that when outliers exist in the dataset, they are capable of achieving considerableimprovements over existing methods. Building upon the idea of geometric median and theextrinsic framework, our method takes the full advantage of what the two approaches canprovide. (i) the robustness property is attained by employing the unsquared extrinsic dis-31ance induced by the Euclidean embedding, which prevents the estimator from amplifyingthe eﬀects of noise and outliers. (ii) Our approach also can be universally adapted to anymanifold, on which the proper Euclidean embedding is available. For example, the RELRcan be straight-forwardly implemented into the space of p × p symmetric positive deﬁnite(SPD) matrices, which especially emerged as the form of data in neuro imaging. To be spe-ciﬁc, 3 × → Sym(3); X = UΛU − ∈ SPD(3) log( X ) = U log ( Λ ) U − , and the extrinsic distance is deﬁned as the Frobenius norm, i.e., ρ E ( X , X ) = k log ( X ) − log ( X ) k F .While our main focus was on developing robust statistical methods for estimating thecentral location and regression problem on manifolds, some important problems still remainto be further investigated as part of future work. Now, we end the paper by outlining somepromising future directions for research that may immediately beneﬁt from our proposedframework. First, borrowing ideas from the method of K -medians algorithms (Cardot et al.,2012), clustering on manifolds can be carried out in a more robust manner. Second, anintriguing research question raised by our study is the possible extension of the notion ofthe quantile to manifolds. Unlike a measure of the central tendency (mean and median), thegeneralization of the quantile on non-Euclidean space is nontrivial, because it has to takeinto account the direction and magnitude of changes from the central location. To the bestof our knowledge, there has been no reported research in this context yet, which may be inpart due to the diﬃculty in deﬁning the direction on the surface of manifolds. The challengeabove may be tackled by utilizing the geometric quantile (Chaudhuri, 1996), which has thegeometric median as a special case and our extrinsic framework. We expect that it will beespecially useful in medical imaging analysis. Because using the quantile information enablesus to quantify pathological state or the abnormality of a certain organ shape, it will be auseful tool for diagnosing and prognosticating critically ill patients32 eferences Beck, A. and Sabach, S. (2015). Weiszfeld’s method: Old and new results.

Journal ofOptimization Theory and Applications , 164:1–40.Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linearinverse problems.

SIAM J. Imaging Sci. , 2:183–202.Bhattacharya, A. and Bhattacharya, R. (2012).

Nonparametric Inference on Manifolds :With Applications to Shape Spaces. IMS Monograph . Cambridge University Press.Bhattacharya, R. N., Ellingson, L., Liu, X., Patrangenaru, V., and Crane, M. (2011). Extrin-sic analysis on manifolds is computationally faster than intrinsic analysis with applicationsto quality control by machine vision.

Applied Stochastic Models in Business and Industry ,28:222–235.Bhattacharya, R. N. and Patrangenaru, V. (2003). Large sample theory of intrinsic andextrinsic sample means on manifolds-part i.

Annals of Statistics , 31:1–29.Bhattacharya, R. N. and Patrangenaru, V. (2005). Large sample theory of intrinsic andextrinsic sample means on manifolds- part ii.

Annals of Statistics , 33:1211–1245.Cardot, H., C´enac, P., and Godichon-Baggioni, A. (2017). Online estimation of the geometricmedian in hilbert spaces: Nonasymptotic conﬁdence balls.

Ann. Statist. , 45(2):591–614.Cardot, H., C´enac, P., and Monnez, J. M. (2012). A fast and recursive algorithm for cluster-ing large datasets with k-medians.

Computational Statistics & Data Analysis , 56(6):1434– 1449.Cardot, H., C´enac, P., and Zitt, P.-A. (2013). Eﬃcient and fast estimation of the geomet-ric median in hilbert spaces with an averaged stochastic gradient algorithm.

Bernoulli ,19(1):18–43. 33haudhuri, P. (1996). On a geometric notion of quantiles for multivariate data.

Journal ofthe American Statistical Association , 91(434):862–872.Chikuse, Y. (2003).

Statistics on Special Manifolds . Springer-Verlag New York.Cornea, E., Zhu, H., Kim, P., Ibrahim, J. G., and the Alzheimer’s Disease Neuroimaging Ini-tiative (2017). Regression models on riemannian symmetric spaces.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 79(2):463–482.Dryden, I. L. and Mardia, K. V. (1991). General shape distributions in a plane.

Advancesin Applied Probability , 23:259–276.Dryden, I. L. and Mardia, K. V. (1998).

Statistical Shape Analysis . Wiley.Fan, J. and Gijbels, I. (1996).

Local Polynomial Modelling and Its Applications . Chapman& Hall, London.Fisher, N. I., Lewis, T., and Embleton, B. J. J. (1987).

Statistical Analysis of Spherical Data .Cambridge University Press.Fletcher, P. T., Venkatasubramanian, S., and Joshi, S. (2009). The geometric median onriemannian manifolds with application to robust atlas estimation.

NeuroImage , 45:S143–S152.Fr´echet, M. (1948). Les ´el´ements al eatories de nature quelconque dans un espace distanci´e.

In Annales de l’Institut Henri Pointcar´e , 10:215–310.Haldane, J. B. S. (1948). Note on the median of a multivariate distribution.

Biometrika ,35(3-4):414–417.Hampel, F., Ronchetti, E., Rousseeuw, P., and Stahel, W. (1986).

Robust Statistics: TheApproach Based on Inﬂuence Functions . Wiley.34uang, C., Styner, M., and Zhu, H. (2015). Clustering high-dimensional landmark-based two-dimensional shape data.

Journal of the American Statistical Association , 110(511):946–961.Huber, P. J. (1964). Robust estimation of a location parameter.

Ann. Math. Statist. ,35(1):73–101.Huber, P. J. and Ronchetti, E. M. (2009).

Robust Statistics . Wiley, 2 edition.Jammalamadaka, S. R. and SenGupta, A. (2001).

Topics in Circular Statistics. Series onMultivariate analysis Vol.5 . World Scientiﬁc Press, Singapore.Kemperman, J. (1987). The median of a ﬁnite measure on a banach space. In

StatisticalData Analysis Based on the L -norm and Related Methods . Amsterdam : North-Holland.Kendall, D. G. (1984). Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society , 16:81–121.Kent, J. T. (1992).

New directions in shape analysis. In: The Art of Statistical Science . JohnWiley & Sons, Ltd, Chichester.Kuhn, H. W. (1973). A note on fermat’s problem.

Math. Program. , 4:98–107.Lin, L., Thomas, B. S., Zhu, H., and Dunson, D. B. (2017). Extrinsic local regression onmanifold-valued data.

Journal of the American Statistical Association , 112(519):1261–1273.Lopuha¨a, H. P. and Rousseeuw, P. J. (1991). Breakdown points of aﬃne equivariant estima-tors of multivariate location and covariance matrices.

Ann. Statist. , 19(1):229–248.Mardia, K. V. and Dryden, I. L. (1989). The statistical analysis of shape data.

Biometrika ,76:271–281.Mardia, K. V. and Jupp, P. E. (1999).

Directional Statistics . Wiley.35insker, S. (2015). Geometric median and robust estimation in banach spaces.

Bernoulli ,21(4):2308–2335.M¨ott¨onen, J., Nordhausen, K., and Oja, H. (2010). Asymptotic theory of the spatial median.In

Nonparametrics and Robustness in Modern Statistical Inference and Time Series Anal-ysis: A Festschrift in honor of Professor Jana Jureˇckov´a , volume 7 of

IMS Collections ,pages 182–193. Institute of Mathematical Statistics, Beachwood, Ohio, USA.Petersen, A. and M¨uller, H.-G. (2019). Fr´echet regression for random objects with euclideanpredictors.

Ann. Statist. , 47(2):691–719.Shi, X., Styner, M., Lieberman, J., Ibrahim, J. G., Lin, W., and Zhu, H. (2009). Intrin-sic regression models for manifold-valued data. In Yang, G.-Z., Hawkes, D., Rueckert,D., Noble, A., and Taylor, C., editors,

Medical Image Computing and Computer-AssistedIntervention – MICCAI 2009 , pages 192–199, Berlin, Heidelberg. Springer Berlin Heidel-berg.Small, C. G. (1990). A survey of multidimensional medians.

International Statistical Review ,58(3):263–277.Vardi, Y. and Zhang, C. H. (2001). A modiﬁed weiszfeld algorithm for the fremat-weberlocation problem.

Mathematical Programming , 90:559–566.Weber, A. (1929).

Uber Den Standort der Industrien (Alfred Weber’s Theory of the Locationof Industries . Univ. Chicago Press.Weiszfeld, E. (1937). Sur le point pour lequel la somme des distances denpoints donn´es estminimum.

Tohoku Mathematics Journal , 43:355–386.Whitney, H. (1944). The self-intersections of a smooth n-manifold in 2n-space.

The Annalsof Mathematics , 45:220–246. 36uan, Y., Zhu, H., Lin, W., and Marron, J. S. (2012). Local polynomial regression forsymmetric positive deﬁnite matrices.

Journal of the Royal Statistical Society. Series B(Statistical Methodology) , 74(4):697–719.Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C., and Lin, W. (2009). Intrinsic regressionmodels for positive-deﬁnite matrices with applications to diﬀusion tensor imaging.