aa r X i v : . [ s t a t . M E ] J a n Robust Extrinsic Regression Analysis for ManifoldValued Data
Hwiyoung Lee [email protected]
Department of Statistics, Florida State UniversityJanuary 28, 2021
Abstract
Recently, there has been a growing need in analyzing data on manifolds owing totheir important role in diverse fields of science and engineering. In the literature ofmanifold-valued data analysis up till now, however, only a few works have been car-ried out concerning the robustness of estimation against noises, outliers, and othersources of perturbations. In this regard, we introduce a novel extrinsic framework foranalyzing manifold valued data in a robust manner. First, by extending the notionof the geometric median, we propose a new robust location parameter on manifolds,so-called the extrinsic median. A robust extrinsic regression method is also developedby incorporating the conditional extrinsic median into the classical local polynomialregression method. We present the Weiszfeld’s algorithm for implementing the pro-posed methods. The promising performance of our approach against existing methodsis illustrated through simulation studies.
Key words:
Robust statistics, Extrinsic median, Nonparametric regression, Rieman-nian manifolds
Over the past few decades, analyzing data taking vales in non-Euclidean spaces, mostly non-linear manifolds has attracted increased attention in a wide range of applications, because1t allows a richer and more accurate statistical inference based on the usage of the geomet-rical properties of the underlying data space. Examples of such data types that especiallylie on Riemannian manifolds, include directions of points on a sphere (Fisher et al., 1987;Mardia and Jupp, 1999), shapes of configurations extracted from images (Bhattacharya and Bhattacharya,2012), data sitting on Stiefel and Grassmann manifolds (Chikuse, 2003), symmetric positivedefinite matrices arising as observations in diffusion tensor magnetic resonance imaging (DT-MRI) (Zhu et al., 2009; Yuan et al., 2012), and other types of medical images.In common with the traditional Euclidean case, statistical inference on the aforemen-tioned manifolds begins by defining the notion of the mean on a certain metric space wheredata resides. Suppose a random object X is defined on a metric space ( M , ρ ), and let Q ( · )be the probability measure of X . Then one may consider adopting the traditional defini-tion of the mean to generalize the notion of the mean on an arbitrary metric space, i.e., E ( X ) = R M x Q ( d x ). Unfortunately, however, this attempt at generalization is not directlyapplicable to the non-Euclidean setting, because it contains non-vector valued integral, whichappears to be analytically intractable. Therefore, the conventional definition of the meanneeds to be adapted so that it can go beyond Euclidean spaces. The most commonly usedone in the literature is the Fr´echet mean (Fr´echet, 1948), in which the mean is defined as aminimizer of the real valued function defined on metric spaces. To be more specific, for any q ∈ M , consider the Fr´echet function of the following form F : M → R q ∈ M 7→ F ( q ) : E (cid:0) ρ ( X , q ) (cid:1) = Z M ρ ( x , q ) Q ( d x ) , (1)where ρ denotes generic metric on M . Then the Fr´echet mean is defined as the minimizerof the Fr´echet function above, i.e., µ F = argmin q ∈M Z M ρ ( x , q ) Q ( d x ) . (2)2nd also, for a given observation x , · · · , x n ∈ M , which consists n independent realizationsof X , the sample Fr´echet mean is defined as X = argmin q ∈M P ni =1 ρ ( x i , q ). Given therelation between the mean and the variance, the above generalization of the mean makesintuitive sense, because it gives analogous definition of the Euclidean mean which is charac-terized as the minimizer of the variance function, the sum of the squared deviation. In thisregard, the Fr´echet function itself is commonly referred to as the Fr´echet variance.But, first and foremost, what needs to be emphasized is that a metric ρ is not uniquefor any particular manifold, and there are many possible choices. Regarding this issue,two different types of distance functions have been typically considered in the literature ofmanifold valued data analysis. The first possible choice is the intrinsic distance, that is thegeodesic distance associated with the Riemannian structure g on M . The other type ofdistance is the Euclidean distance induced by the embedding J : M → E d , which is alsoreferred to as the extrinsic distance. The former and the latter distances lead to the intrinsicand the extrinsic data analysis, respectively.Most of the previous works on analyzing manifold valued data have mainly focused ondeveloping statistical methods based on variants of the Fr´echet mean. For instance, by in-troducing the conditional Fr´echet mean, Petersen and M¨uller (2019) developed a regressionmodel having a random object in a metric space as a response variable. However, it is wellknown that the least squares based methods are severely degraded when there exists outliersin the data or the underlying data distribution is heavy tailed. Thus, the lack of statisti-cal robustness, incurred by the squared distance involved in (2), becomes apparent in theFr´echet mean as well. Whereas, in the Euclidean space setting, considerable efforts have beendevoted to improving the robustness of estimators(see Huber, 1964; Huber and Ronchetti,2009; Hampel et al., 1986, and references therein for a review), far less attention has beenpaid to manifolds. Indeed, one simple way to enhance robustness of estimators is replac-ing the squared distance by the unsquared distance. In the case of Euclidean space, where M = R d , ρ ( x , x ′ ) = k x − x ′ k , this approach has a geometric median (Haldane, 1948) as3 special case. Along the same line, the intrinsic geometric median on Riemannian mani-folds, obtained by minimizing the Fr´echet function associated with the unsquared geodesicdistance, has been proposed by Fletcher et al. (2009). Motivated by the success of the in-trinsic geometric median, the primary contribution of this paper is to develop a novel robustlocation parameter within an extrinsic framework, which entails a computationally efficientalgorithm. Moreover, adopting the concept of the classical local polynomial modeling, weimplement the robust extrinsic local regression model for manifold valued response andEuclidean predictor. This can be accomplished by extending the concept of the proposedextrinsic median to the notion of the conditional extrinsic median.The rest of this paper is organized as follows. The proposed extrinsic median is intro-duced in Section 2, along with a brief review of the extrinsic framework for manifold dataanalysis. Application of the extrinsic median to two different manifolds is also demonstratedin Section 3. In Section 4, we develop the robust extrinsic local regression (RELR) model,with algorithmic details. In Section 5, the proposed RELR is implemented in the Kendall’splanar shape space and its promising properties are illustrated through simulation studies.Finally, we conclude the paper in Section 6 with a short discussion and possible directionsfor the future study. In this section, we develop the extrinsic median which provides a statistically robust andcomputationally efficient way of estimating the center point of the data residing on manifolds.Before describing our method, it is useful to begin with a brief review of the extrinsicframework for manifold valued data analysis on which the proposed scheme is based, andthe motivation that initiated this study. 4 .1 Extrinsic framework on manifold valued data analysis
The essential idea behind the extrinsic analysis is that any d -dimensional manifold M canbe embedded in a higher-dimensional Euclidean space R D , where d < D (Whitney, 1944)via an embedding J . Thus, to understand the extrinsic approach, it is necessary to recallthe definition of embedding J . First consider a differentiable map J : M → R D , whosedifferential d p J : T p M → T J ( p ) R D is a one-to-one, where T p M and T J ( p ) R D denote thetangent space at p ∈ M and the tangent space at J ( p ) on R D , respectively. Note thatthe class of differentiable maps specified above is called an immersion. Then the one-to-one immersion is called the embedding if it is a homeomorphism from M to J ( M ) withthe induced topology. Also of note is that the embedding is unfortunately not unique ingeneral, and not all choices of embedding lead to a good estimation result. In this context,the extrinsic approach has been carried out under the premise that the selected embeddingpreserves intrinsic geometry of the original manifold. Therefore, the embedding satisfying thefollowing condition is typically preferred within extrinsic framework. For a Lie group G actingon M , the embedding J : M → R D is referred to as the G equivariant embedding if thereexists the group homomorphism φ : G → GL D ( R or C ) satisfying J ( g p ) = φ ( g ) J ( p ) , ∀ p ∈M , g ∈ G , where GL D ( R or C ) denotes the general linear group which is the group of D × D invertible real, or complex matrices. This definition indicates that the group action of G canbe recovered in the embedded space J ( M ) through φ . Therefore, in light of the above, a greatamount of geometric feature of the manifold is preserved in the embedded Euclidean spacevia the equivariant embedding. And the extrinsic distance between two points a manifoldcan be straightforwardly computed in an embedded space via the Euclidean norm.Considering all the notions described above, the extrinsic mean µ E on a manifold isdefined as the minimizer of the Fr´echet function associated with the extrinsic distance viaan embedding J : M → R D µ E = argmin q ∈M Z M k J ( x ) − J ( q ) k Q ( d x ) . (3)5s compared to the intrinsic mean, the use of the extrinsic approach has several advantages,including (1) computational efficiency (Bhattacharya et al., 2011), (2) milder conditions forexistence and uniqueness of the solution. Moreover, the sample extrinsic mean often has aclosed form solution. Thus, we here derive the extrinsic mean in an explicit form. To do thisthe following definition which gives the uniqueness condition of the extrinsic mean shouldbe noted first. A point y ∈ R D is said to be J -nonfocal if there exists a unique point p ∈ M satisfying inf x ∈M k y − J ( x ) k = k y − J ( p ) k . Then we let µ = R R D u e Q ( d u ) be the mean vectorof the induced probability measure e Q = Q ◦ J − , which is the image of Q in R D . Then theFr´echet function associated with the extrinsic distance, the right hand side of (3), also canbe written as F ( q ) = k J ( q ) − µ k + Z R D k x − µ k e Q ( d x ) . (4)Hence, we have inf q ∈M F ( q ) = inf J ( q ) ∈ f M k J ( q ) − µ k + R R D k x − µ k e Q ( d x ), where f M = J ( M ) denotes the image of the embedding. This indicates the set of points x ∈ M satisfyinginf j ( q ) ∈ f M k J ( q ) − µ k = k J ( x ) − µ k consists the extrinsic mean set. And since, (4) isminimized on f M by J ( q ) = P ( µ ), where P : R D → f M such that for ∀ u ′ ∈ f M , P ( y ) = { u ∈ f M : k u − y k ≤ k u ′ − y k} , the extrinsic mean uniquely exists if and only if the mean vector µ is a J -nonfocal point. In that case, the extrinsic mean is obtained by taking the inverse ofthe embedding, i.e., µ E = J − ( P ( µ )). Following from the above, the sample extrinsic meanis obtained in a straightforward manner. Suppose we observe x , . . . , x n ∈ M , consistingof independent and identically distributed copies of X , then the sample extrinsic mean isgiven by X E = J − {P ( J ( X )) } , where J ( X ) = P ni =1 J ( x i ) /n . Theoretical properties of thesample extrinsic mean, including asymptotic distribution, consistency, and the uniquenessconditions are well established in Bhattacharya and Patrangenaru (2003, 2005).6 .2 Extrinsic Median Before proceeding to present our proposed method, we begin by giving a quick overviewof the existing Euclidean geometric median. In the Euclidean multivariate setting, a largebody of research has been devoted to developing the robust estimation of the central point(see for a review Small, 1990), among which the geometric median, initially proposed byHaldane (1948), has received the greatest attention over the last decades due both to itsnice robustness properties and computational efficiency (Cardot et al., 2013, 2017). Thegeometric median of a random variable X ∈ R k is defined by m = argmin q ∈ R k E k X − q k , oralternatively but equivalently, is obtained by minimizing R R k ( k x − q k − k x k ) Q ( d x ) . Notethat the latter expression has been more commonly adopted in practice, since no assumptionregarding the first order moment of X needs to be imposed. Moreover, when k = 1 the abovedefinition corresponds with the classical notion of the median which is defined in terms ofthe cumulative distribution function. In this sense, the geometric median plays a role ofthe multivariate generalization of the univariate median. Now suppose that we observe X = { x , · · · , x n } consisting of n independent and identically distributed realizations of X , then the sample geometric median b m , which provides the natural estimation of m , isobtained by finding the optimal value that minimizes the sum of Euclidean distances to givendata points, i.e, b m = argmin q ∈ R k n X i =1 ( k x i − q k − k x i k ) . The above optimization problem is also known as the Fermat-Weber problem (Weber, 1929),and the numerical algorithm for solving the geometric median problem was firstly introducedby Weiszfeld (1937). It is shown in Kemperman (1987) that the sample geometric median isuniquely determined unless all the given observations do not lie on the same line.Although many nice properties have been investigated including invariance under rotationand translation, asymptotic behavior (M¨ott¨onen et al., 2010), concentration (Minsker, 2015),7owever, the most notable advantage of the geometric median over the mean is that it pro-vides a robust estimation of the centrality under the presence of noise in the data. The robust-ness of the estimator is usually measured by the breakdown point. For a given data X in theabove, we further consider the outlier-contaminated data X ∗ m = { x ∗ , · · · , x ∗ m , x m +1 , · · · , x n } ,where the first m elements are replaced by extreme noises, then the breakdown point of theestimator T n is defined by B ( T n ) = min ≤ m ≤ n ( mn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup X ∗ m k T n ( X ∗ m ) − T n ( X ) k = ∞ ) . In an intuitive sense, the breakdown point can be interpreted as the highest proportion ofcontamination that the estimator can tolerate before the difference between the estimatedresult obtained from the contaminated data and the initial result goes to infinity. Beingless affected by outliers, the geometric median achieves the asymptotic breakdown point of0 . /n , meaning that only one single extreme valuechanges the estimation result arbitrary.We now turn our attention to manifolds. Much of the research regarding estimation ofthe central location of data on manifolds has focused on the variants of the Fr´echet meanincluding the intrinsic, extrinsic mean. However, the common drawback of the least squarebased methods is their lack of robustness to extreme values, which makes the Fr´echet meaninevitably sensitive to heavy tailed distributions and outlying values. Nevertheless, eventhough there has been a considerable increase in the need for robust statistical methods onmanifolds, less has been done on this issue. The pioneering attempt to address this is seen inthe work of Fletcher et al. (2009), where they proposed the intrinsic median by substitutingthe squared geodesic distance employed in the Fr´echet function with the unsquared one,i.e., argmin q ∈M R M ρ ( x , q ) Q ( d x ). Although their approach has had a great deal of success8n generalizing the notion of the median to Riemannian manifolds by attaining the samebreakdown point as in the Euclidean case, it has some inherent drawbacks that may limitits application. (1) For example, it is often difficult to derive conditions for the uniquenessand the existence of the intrinsic median without restrictions on its support. (2) Moreover,even when the intrinsic median exists, it requires iterated algorithms on manifolds whichmay incur a large amount of computational overhead. These drawbacks highlight the needfor the development of novel approaches aimed at giving a more computationally efficientmethod in which the existence and uniqueness conditions are well established and easy tounderstand.In an attempt to address the methodological shortcomings of the intrinsic median de-scribed above, we propose the following new robust location parameter by making use of theunsquared extrinsic distance, m E = argmin q ∈M Z M k J ( x ) − J ( q ) kQ ( d x ) . (5)Given observations x , · · · , x n consisting of independent and identically distributed copiesof manifold valued random variable X , the above location parameter, which we call thepopulation extrinsic median, can be estimated by replacing Q with the empirical measure b Q = 1 /n P ni =1 δ x i , i.e, b m E = argmin q ∈M n X i =1 k J ( x i ) − J ( q ) k = J − P (cid:16) argmin m ∈ R D n X i =1 k J ( x i ) − m k (cid:17)! . (6)Unlike the sample extrinsic mean which has a closed form expression depending on the projec-tion map, the sample extrinsic median requires an iterative algorithm, called the Weiszfeld’salgorithm for solving the inner minimization problem. But taking advantage of Euclideangeometry, the proposed extrinsic approach allows us to exploit the original form of Weiszfeld9lgorithm without requiring any further modifications. Indeed, the following Algorithm 1for solving (6) can be easily derived due to the convexity of the object function associatedwith the Euclidean norm, f ( mmm ) = P ni =1 k J ( x i ) − mmm k . Algorithm 1
Extrinsic Median
Input: n observations X = { x , · · · , x n } Initialize : t = 0 , mmm and ε while k mmm t +1 − mmm t k < ε do Compute the gradient direction ∇ f ( mmm t ) n X i =1 mmm t − J ( x i ) k mmm t − J ( x i ) k Compute the step size s t = n X i =1 k mmm t − J ( x i ) k ! − Update m t +1 mmm t +1 = mmm t − s t · ∇ f ( mmm t ) t ← t + 1 end whileOutput: Estimated robust estimator b m E = J − ( P ( mmm ∗ )) ⊲ mmm ∗ denotes the optimal value.In fact, when incorporated into the extrinsic framework, without incurring the computa-tional overhead encountered in the Riemannian manifolds optimization, our approach possessa practical advantage over the intrinsic geometric median algorithm. Specifically, in contrastwith the intrinsic geometric median algorithm (Fletcher et al., 2009), mmm t +1 = Exp mmm t ( α v t ) , v t = n X i =1 Log mmm t ( x i ) ρ ( mmm t , x i ) · n X i =1 ρ ( mmm t , x i ) ! − , in which Exp : T m t → M , Log : M → T m t have to be repeatedly evaluated at each iteration,our method can reduce the additional computational cost caused by the above exponentialand logarithm mapping. As indicated above, it should be emphasized that although the datalie on manifolds, the proposed algorithm itself operates in Euclidean space without suffering10rom geometrical restrictions and constraints posed by non-Euclidean data domains.Also of importance is that when all given data points are not colinear (i.e., there doesn’texist y , z ∈ R p and α , · · · α n ∈ R , such that ∀ i = 1 , · · · , n , x i = y + α i z ), the Weiszfeld’salgorithm converges always to the unique optimal solution (Kuhn, 1973). For each iterationstep, mmm t
6∈ { x , · · · , x n } is typically assumed in order to ensure that the proposed algorithmconverges to the global optimal solution. Details of the algorithm including derivation andtheir convergence analysis are deferred to Section 4, in which we discuss the algorithm forsolving the robust extrinsic local regression (Algorithm 2) of which Algorithm 1 is a specialcase. In this section, the practical applicability and performance regarding robustness of the ex-trinsic median is examined through simulation studies on two important manifolds. To gainfurther insights into the extrinsic median, the experiment was carried out under differentconditions which may possibly be encountered in practice. Results are compared with com-peting methods including the extrinsic mean.
The first and simplest application is an d -unit sphere, S d = { x ∈ R d +1 : k x k = 1 } , which is a d -dimensional submanifold of R d +1 . It can be embedded into R d +1 through the inclusion map ι : S d → R d +1 , ι ( x ) = x . The projection map P : R d +1 → S d is defined by P ( µ ) = µ / k µ k ,where µ = R R d +1 x e Q ( d x ) is the mean vector calculated in the ambient space of R d +1 and e Q = Q ◦ ι − denotes the induced probability measure. Note that µ is ι -nonfocal unless µ = . For further details about statistical analysis on S d , we refer to Fisher et al. (1987),Mardia and Jupp (1999) and references therein.In the following, the performance of extrinsic median on S d is illustrated by simulation11tudies. To ease visualization of how the generated data looks like and how the extrinsicmedian is capable to provide robust estimation than the extrinsic mean, the simplest case S , for which data is observed as the form of direction on a unit circle in 2-dimensionalEuclidean plane R , is considered. Note that the data on S is typically represented byan angle measured in radians θ ∈ [0 , π ), or the unit vector x = (cos θ, sin θ ) ⊤ from theorigin. The performance of the extrinsic median is compared under two different simulationscenarios as follows. In the first scenario, outliers are artificially imposed to the von Mises(VM) distribution, whereas in the second scenario, heavy tailed random observations aregenerated from the general wrapped α stable (WS) distribution. The detailed description ofeach scenario is given in the following. Scenario 1) : The von Mises distribution with the Normal outliers.
We suppose θ follows the von Misese distribution, VM( µ, κ ) with the density function f ( θ ) = e κ cos( θ − µ ) πI ( κ ) , where µ, κ denote the mean direction and concentration parameter, respectively and I ( κ )is the modified Bessel function of order 0. Note that larger value of κ means higher con-centration towards µ . We first generate random data { θ i } ni =1 consisting independent andidentically distributed copies of θ ∼ VM( µ, κ ), then n cont outliers o ′ j iid ∼ Normal( µ out , σ ),where µ out = µ , are added to the initial data set so that the contamination level satisfies theprespecified value r = n cont /n . Additionally normalization of the generated outliers o j = o ′ j (mod 2 π ) is required to ensure 0 ≤ o j < π . Scenario 2) : The wrapped α -stable random variable. The density function of a wrapped α -stable random variable θ is given by f ( θ ) = 12 π + 1 π ∞ X k =1 exp( − τ α k α ) cos (cid:16) k ( θ − µ ) − τ α k α β tan απ (cid:17) , (7)where 0 < α ≤ τ ≥ | β | ≤ α yield heavy tailed distributions but larger τ valuesyield more highly dispersed distributions. The benefit of using wrapped α -stable distributionis that it provides a high degree of flexibility in modeling directional data in the sensethat it contains many popular circular distributions as special cases, including the wrappednormal distribution ( α = 2) and the wrapped Cauchy distribution ( α = 1 , β = 0); seeJammalamadaka and SenGupta (2001) for further details.Representative illustrations of simulation scenario 1 and scenario 2 are displayed inFigure 1 (left and right panel, respectively), together with estimated values. In both sce-narios, we observed that extrinsic mean estimations were forced by outliers to be pulledfar away from the true mean direction ( µ = 0, i.e., x = (1 ,
0) in the Cartesian coordinatesystem). In Table 1, the extrinsic median is compared with the extrinsic mean in terms ofnorm of difference between the true mean direction and the estimated direction. The resultsare averaged over 20 replications. In the first scenario, four different settings are consideredaccording to the level of contamination, r ∈ { , . , . , . } , where 0 represents no outlierexists. As would be expected, the result obtained from the first scenario indicates that asthe contamination level becomes higher, the extrinsic mean is far more vulnerable to thepresence of outliers than the extrinsic median. The bottom panel of the table shows theresult of the scenario 2 in which we fix β = 0 for the symmetry of the distribution andvary the tail heaviness level by adjusting α from 0 . τ = 0 . ,
2. It is observed that extrinsic median not only has a betterpredictive ability in the case of heavy tailed data which corresponds to small values of α ,but also a comparable performance was achieved even in non-heavy tailed data, generatedfrom the wrapped normal distribution ( α = 2). For the second application of the extrinsic median, we consider the Kendall’s planar shapespace of k -ads, denoted by Σ k (Kendall, 1984) which is the most popular manifold in land-13 Methods Extrinsic Mean Extrinsic Median True
Outliers contaminated distribution -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0
Methods Extrinsic Mean Extrinsic Median True
General WS distribution
Figure 1: Left: an example of the Scenario 1), consisting of observations drawn from theVM distribution and outliers displayed in light grey circles and asterisks, respectively. Right:Scenario 2). In both setting, the fit of the extrinsic mean and median are displayed with thetrue mean value.mark based shape analysis literature. Before proceeding to present simulation study on theplanar shape space, we give necessary preliminaries about this space.The planar shape can be defined as a random object that is invariant under the Eu-clidean similarity transformation. Therefore, the planar shape is identified as the remaininggeometric information after filtering out the effect of translation, scaling, and rotation. Toease understanding of this nonlinear manifold, let us begin by demonstrating the geometryof the planar shape space. First, the unregistered k -ads which is a landmark configurationthat describes a shape of an object can be conveniently placed on a complex plane as a setof k complex numbers, i.e., z = ( z , · · · , z k ), where z j = x j + iy j ∈ C . Then one can obtainthe preshape of z by quotienting out the effect of translation and scale u = z − h z ik z − h z ik , where h z i = (¯ z, · · · , ¯ z ), and ¯ z = k P kj =1 z j . This indicates that the preshape space isequivalent to a complex hypersphere, C S k − = n u ∈ C k | P ki =1 u j = 0 , k u k = 1 o . Then theshape [ z ] of z which is the geometric object that is invariant under a rotation effect, is14 cenario 1 : The outlier contaminated caseRatio of outlilerNo outlier 0.1 0.2 0.4 N E. Mean E. Med E. Mean E. Med E. Mean E. Med E. Mean E. Med10 0.0100 0.0090 0.0550 0.0129 0.1376 0.0132 0.2778 0.016950 0.0064 0.0066 0.0663 0.0106 0.1311 0.0115 0.2702 0.0122100 0.0032 0.0033 0.0624 0.0101 0.1315 0.0087 0.2732 0.0118200 0.0025 0.0028 0.0655 0.0099 0.1331 0.0101 0.2730 0.0136
Scenario 2 :
The heavy tailed distribution α τ N E. Mean E. Med E. Mean E. Med E. Mean E. Med E. Mean E. Med0.2 10 0.3411 0.1658 0.1971 0.0540 0.1044 0.0505 0.0376 0.039350 0.1053 0.0011 0.0709 0.0135 0.0454 0.0199 0.0188 0.0197100 0.0653 0.0003 0.0432 0.0095 0.0364 0.0141 0.0123 0.0194200 0.0582 0.0003 0.0211 0.0062 0.0198 0.0091 0.0086 0.00952 10 0.3105 0.1294 0.4291 0.4054 0.4446 0.3963 0.3876 0.450150 0.1471 0.0014 0.2221 0.1112 0.1619 0.1550 0.1824 0.2097100 0.1312 0.0023 0.1434 0.0858 0.1415 0.1339 0.2075 0.2641200 0.0867 0.0005 0.0894 0.0567 0.0967 0.0845 0.1161 0.1421Table 1: Results of the experiment described in Section 3.1. The result of scenario 1 and 2are presented in the top and bottom panels of the table, respectivelyobtained by considering all rotated version of u , i.e.,[ z ] = (cid:8) e iθ u : 0 ≤ θ < π (cid:9) . As the shapeis defined as the orbit of u ∈ C S k − , Kendall’s planar shape space Σ k = C S k − /SO (2) isthe quotient space of the preshape space under the action of special orthogonal group ofdimension 2, SO (2) = { A ∈ GL | A − = A ⊤ , det( A ) = } . Alternatively, the effects ofscaling by a scalar r > ≤ θ < π can be simultaneously filteredout via multiplying by the complex number λ = re iθ from the centralized k -ad configuration z −h z i , i.e., [ z ] = { λ ( z −h z i ) : λ ∈ C \{ }} . Due to this algebraically simpler characterization,the planar shape space is equivalently identified as the complex projective space Σ km ≃ C P k − that is the space of all complex lines through the origin in C k − . More detailed explanationof the geometrical structure of the shape manifold is provided in Dryden and Mardia (1998);15hattacharya and Bhattacharya (2012).We now describe the extrinsic approach in Σ k . Due to Kent (1992), in the Kendall’splanar shape space the Veronese–Whitney embedding is typically used, which maps Σ k intothe space of k × k complex Hermitian matrices S ( k, C ) by J : Σ k → S ( k, C )[ z ] J ([ z ]) = uu ∗ , (8)where u ∗ denotes the complex conjugate transpose of u . Furthermore, since J ( A [ z ]) = Auu ∗ A ∗ holds for any A ∈ SU ( k ), where SU ( k ) = { A ∈ GL k ( C ) | AA ∗ = I , det( A ) = } denotes the special unitary group, the Veronese-Whitney embedding is shown to be the SU ( k ) equivariant embedding, i.e., J ( A [ z ]) = φ ( A ) J ([ z ]). It follows directly by takingthe Lie group homomorphism φ : SU( k ) → GL k ( C ) such that φ ( A ) B = ABA ∗ , where B ∈ S ( k, C ). It also should be noted that the squared extrinsic distance of two planarshapes is defined in terms of the Frobenius norm of a complex matrix ρ E ([ z ] , [ z ]) = k J ([ z ]) − J ([ z ]) k F = Trace (cid:16) { J ([ z ]) − J ([ z ]) } { J ([ z ]) − J ([ z ]) } ∗ (cid:17) = k X j =1 k X i =1 |{ J ([ z ]) − J ([ z ]) } i,j | . (9)Since the above extrinsic distance takes into account every k element of J ([ z ]) − J ([ z ]), itcan be viewed as the natural Euclidean distance between the two embedded shapes J ([ z ])and J ([ z ]). Lastly, the inverse and projection map of the embedding, J − ( P ( · )) in (6),remain to be identified. Let e X be the arbitrary point on the ambient Euclidean space R D , then the projection mapping of e X onto the image of the embedding f M = J (Σ k ) isgiven by γγ ∗ , where γ is the unit eigenvector of e X corresponding to the largest eigenvalue.Subsequently, the inverse map of J − ( γγ ∗ ) = [ γ ] can be obtained directly from (8) without16xtra operations.Now, in order to gauge the performance of the proposed method, we perform simulationexperiments on the planar shape space by investigating the corpus callosum (CC) data ex-tracted from the subset of ADHD-200 dataset ( http://fcon_1000.projects.nitrc.org/indi/adhd200/ ).The original dataset includes functional magnetic resonance imaging (fMRI) scans of sub-jects categorized into four different groups based on their symptoms and conditions; (1) Typ-ically developing children, (2) ADHD-Hyperactive, (3) ADHD-Inattentive, and (4) ADHD-Combined. The CC shapes of 647 subjects which consist of 50 landmarks were preprocessedand analyzed by Huang et al. (2015) to illustrate their clustering method. In this exper-iment, however, only a subset of the data (the CC shapes extracted from 404 typicallydeveloping children) was utilized. Since the main aim of this simulation study is to see howthe extrinsic median behaves robustly in a noisy environment, where a number of landmarksare contaminated by outliers, we further manipulated the data by assigning random noisesgenerated from Normal( µ = 1000 , σ = 5) to the real parts (the x cooridinates) of the 10th ∼ r rangingfrom 0 to 0 .
4. Additionally, the extrinsic median was compared to several competing meth-ods including the maximum likelihood estimator of the isotropic offset Gaussian distribution(Mardia and Dryden, 1989; Dryden and Mardia, 1991) and different variants of the Fr´echetmeans such as intrinsic mean, the Fr´echet mean associated with the partial Procrustes dis-tance and the extrinsic mean.Figure 2 shows the CC shapes obtained from several simulated data with different noiselevel r = { , . , . } . As shown in the left panel ( r = 0), no remarkable difference wasobserved in estimated shapes between methods. On the other hand, however, the middleand the right panels present that with the exception of the extrinsic median, other methodsappeared to be affected by outliers and led to the distortion in the estimated shapes. Impor-tantly, although the deformation of the estimated shape is occurred as well in the extrinsicmedian at the highest noise level tested, we have seen that it stays much closer to its initial17 o outlier Ratio = 0.2 Ratio = 0.4 -0.2 0.0 0.2 0.4 -0.2 0.0 0.2 0.4 -0.2 0.0 0.2 0.4-0.20.00.2 Extrinsic Mean Extrinsic Median Intrinsic Mean MLE Partial Proc.
Figure 2: Examples of normal C.C. data and estimated shapes. Each individual shape isdisplayed in light grey solid line, and the results of different methods are represented bydifferent colors and line styles.result, than those of the other methods compared.We now introduce the measure that quantifies the robustness of estimators on the planarshape space. To do this, we let x , b x ∗ denote the estimated shape obtained from the uncon-taminated and contaminated data, respectively. Then the full Procrustes distance between x and b x ∗ , i.e., ρ F P ( b x ∗ , x ) = p − |h b x ∗ , x i| , is considered to assess whether methods canprovide the robust estimation without being influenced by outlier values. This appears anal-ogous to that used by the breakdown point in which the Euclidean version of the foregoingquantity, k T n ( X ∗ m ) − T n ( X ) k is exploited. However, unlike in the case of the breakdownpoint, which gives the highest fraction of gross outliers in the data that can be handled byan estimator, the smaller value of ρ F P ( b x ∗ , x ) implies the method has more resistance to out-liers. Results of this simulation, averaged over 20 replications for the different contaminationlevels, are presented in Figure 3. This illustrates that the proposed extrinsic median has aremarkable ability to resist against outliers in the case where the data are contaminated18 .1 Ratio of outlier P r o c r u s t e s d i s t an c e MethodsExtrinsic MeanExtrinsic MedianIntrinsic MeanMLEPartial Procrustes
Figure 3: Graphical result of the simulation study. Line plots give the full Procrustesdistances, ρ F P ( b x ∗ , x ) for different methods as a function of the contamination level r .with significant levels of noise. All the methods, however, show deterioration in performancewhich mainly caused by the squared distance term employed in models. In this section, we present the robust extrinsic local regression. To do this, we first considera nonparametric regression model Y = f ( X ) + ε with a response Y taking value in M , aEuclidean predictor X ∈ R p , and f an unknown regression function of interest. Suppose weobserve D = { ( x , y ) , · · · , ( x n , y n ) } consisting of independent and identically distributedcopies of ( X , Y ). One of the major challenges involved in developing a regression modelhaving a manifold valued response lies in the lack of vector space structure of M , whichcauses the traditional Euclidean approaches including the least square method not to beobviously applicable. For example, since linear operations are limited on ( M , ρ ), evaluatingthe difference between the estimated value and the observed value, i.e., y i − b f ( x i ), is notpractical. Moreover, the geometrical feasibility of the estimation, i.e., b f ( x i ) ∈ M , can not19e guaranteed unless additional restrictions are imposed on the typical regression models.For the reasons outlined above, there has been a great demand for the development of aregression model having a manifold valued response, and a large body of literature addressingthis problem has accumulated over the past two decades (Shi et al., 2009; Yuan et al., 2012;Cornea et al., 2017). In particular, the extrinsic local regression (ELR) method has beeninitially established by Lin et al. (2017). More recently, Petersen and M¨uller (2019) proposedthe Fr´echet regression on general metric spaces by considering the following conditionalFr´echet mean F ( xxx ) = argmin q ∈M Z M ρ ( q , y ) Q ( d y | xxx ) , where Q ( y | xxx ) denotes the conditional distribution of Y given X = xxx . Applications of theabove framework is very broad as its usage is not limited to manifolds. However, despitepromising progress in developing regression models for a non-Euclidean valued response, allthe aforementioned methods commonly suffer from lack of robustness caused by the squareddistances. To remedy this problem, we propose the robust extrinsic local regression (RELR),which can be accomplished easily by linking the extrinsic median to a classical nonparametriclocal kernel regression. The remainder of this section is dedicated to presenting the detailsof RELR, together with the proposed numerical algorithm.We begin by introducing the following population robust extrinsic regression function,which extends the notion of the conditional median to manifolds, F RE ( xxx ) = argmin q ∈M Z M k J ( q ) − J ( y ) kQ ( d y | xxx ) (10)= argmin q ∈M Z f M k J ( q ) − z k e Q ( d z | xxx ) , where e Q ( ·| xxx ) = Q ( ·| xxx ) ◦ J − is the induced conditional probability measure of Y given X = xxx defined on J ( M ). While the proposed extrinsic approach is similar in spirit to20hose developed in Lin et al. (2017), our work differs in that it makes use of the unsquaredextrinsic distance rather than the squared one. The unknown regression function F ( · ) can beestimated at the evaluation point xxx by the classical local polynomial fitting (Fan and Gijbels,1996) b F RE ( xxx ) = J − P (cid:18) argmin yyy ∈ R D n X i =1 K H ( x i − xxx ) k yyy − J ( y i ) k P nj =1 K H ( x j − xxx ) (cid:19)! . (11)In the above notation, K H : R p → R denotes the multivariate kernel function which is definedas K H ( u ) = H ) K ( H − u ), where u = ( u , · · · , u p ) ⊤ ∈ R P , H is a p × p symmetric and posi-tive definite smoothing matrix, and K : R p → R satisfies R R p K ( u ) d u = 1 , R R p u K ( u ) d u = 0,and R R p u K ( u ) d u < ∞ . Note that the case H = Diag( h , · · · , h p ) corresponds to us-ing a product kernel obtained by multiplying p univariate kernels with different band-widths, i.e., K H ( u ) = Q pi =1 1 h i k i ( u i /h i ). Regarding solving the inner optimization problemin (11), note that since it takes the form of the weighted Fermat-Weber problem, wherethe weight imposed on the i th observation is formulated in terms of the kernel function w i = K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ), the robust extrinsic local regression can be readilysolved by the generalized Weiszfeld’s algorithm.We now describe the numerical algorithm for obtaining the solution of the localized robustregression estimator. As it has been assumed in the development of the extrinsic median, thenon-colinearity of the embedded responses J ( y ) , · · · , J ( y n ) is required in order to ensurethe convergence of the algorithm. We also let f ( yyy ) = P ni =1 w i k yyy − J ( y i ) k be the objectivefunction, then by the strict convexity of f , the optimal solution is attained at the stationarypoint ∇ f ( yyy ) = P ni =1 w i yyy − J ( y i ) k yyy − J ( y i ) k ≡
0. Then, since the optimal yyy ∗ satisfies the followingequation ( P ni =1 w i / k yyy ∗ − J ( y i ) k ) yyy ∗ = P ni =1 w i J ( y i ) / k yyy ∗ − J ( y i ) k , the iterative algorithmfor updating yyy on the embedded space has the following form yyy t +1 = n X i =1 w i k yyy t − J ( y i ) k ! − n X i =1 w i J ( y i ) k yyy t − J ( y i ) k . (12)21n Algorithm 2, after a simple algebraic calculation, we can reformulate the update rule in(12) into the form of the gradient descent with the step size s t = ( P ni =1 w i / k yyy t − J ( y i ) k ) − .Once finding the optimal solution yyy ∗ of the inner minimization problem on the ambient spacein R D , the final regression estimator of b F RE ( xxx ) on M can be straightforwardly obtained byevaluating the projection map of yyy ∗ onto the image of the J ( M ) and taking the inverse mapof the embedding. Note that in order to prevent the algorithm from getting stuck on thenon optimal embedded points, { yyy t } t ≥
6∈ { J ( y ) , · · · , J ( y n ) } needs to be assumed. Algorithm 2
Robust extrinsic local regression (RELR)
Input: n observations D = { ( x , y ) , · · · , ( x n , y n ) } , evaluation point xxx Initialize : t = 0 , yyy and ε while k yyy t +1 − yyy t k > ε do Compute the gradient direction ∇ f ( yyy t ) n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) yyy t − J ( y i ) k yyy t − J ( y i ) k Compute the step size s t = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) , k yyy t − J ( y i ) k ! − Update yyy t +1 yyy t +1 = yyy t − s t · ∇ f ( yyy t ) t ← t + 1 end whileOutput: Estimated robust estimator b F RE ( xxx ) = J − ( P ( yyy ∗ )) ⊲ yyy ∗ the optimal valueWe are now in a position to present the convergence analysis of the proposed algorithm.Before proceeding, the following results related to convergence of the classical Weiszfeld’salgorithm should be noted. For nonsmooth convex optimization problems, most exist-ing gradient descent type algorithms are known to converge to the optimal solution at arate of O (1 / √ t ), however, with the initial value proposed by Vardi and Zhang (2001), onecan show that the Weiszfeld’s algorithm attains a sublinear convergence rate O (1 /t ) (see22eck and Sabach, 2015). More surprisingly, using the smooth approximation of the objectfunction enables the algorithm to achieve an accelerated convergence rate O (1 /t ). Addi-tionally, since the proposed algorithm for solving RELR is performed on Euclidean space,similar algorithmic techniques and convergence properties previously established for the clas-sical Weiszfelds’s algorithm can be immediately utilized.First, we will henceforth make use of the following initial point of RELR algorithm, whichis the extension of the scheme in Vardi and Zhang (2001) to a nonparametric regressionsetting. For p ∈ argmin i ∈{ , ··· ,n } f ( J ( y i )), we set the initial value for Algorithm 2 by yyy = y p + t p d p , where R p := X i = p K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) J ( y p ) − J ( y i ) k J ( y i ) − J ( y p ) k d p = − R p k R p k t p = k R p k − K H ( x p − xxx ) / P nj =1 K H ( x j − xxx ) L ( J ( y p )) . And the operator L is given by L ( yyy ) = n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) k yyy − J ( y i ) k , yyy
6∈ { J ( y ) , · · · , J ( y n ) } X i = p K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) k J ( y p ) − J ( y i ) k , yyy = J ( y p ) (1 ≤ p ≤ n ) . Using the above initial point, we obtain the following result. Proposition 1 is of interest inits own right, since without any further manipulation of the algorithm, the carefully chosenstarting value enables Algorithm 2 to achieve the sublinear convergence rate, O (1 /t ). Proposition 1.
Suppose that all embedded response values are not colinear. Then, for any t ≥
0, we have f ( yyy t ) − f ∗ ≤ L ( J ( y p )) k yyy − yyy ∗ k t (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) , (13)23here f ∗ is the minimum of f . Proof.
To prove the sublinear convergence rate of the proposed Algorithm 2, we have useda collection of results in Beck and Sabach (2015). We first need to derive the upper boundof the sequence { L ( yyy t ) } t ≥ , where { yyy t } t ≥ is the sequence generated by the algorithm. Forany i = 1 , · · · , n , and yyy satisfying f ( yyy ) ≤ f ( yyy ), the following inequality k yyy − J ( y i ) k ≥ f ( J ( y i )) − f ( yyy ) is satisfied (Lemma 8.1 in Beck and Sabach, 2015). By combining thistogether with the monotonicity of f ( yyy t ) ≤ f ( yyy ) (Corollary 3.1 in Beck and Sabach, 2015)and f ( J ( y p )) ≤ f ( J ( y i )), we have k yyy t − J ( y i ) k ≥ f ( J ( y p )) − f ( yyy ). Then by making use ofthe definition of the operator L , we obtain the following result L ( yyy t ) = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) k yyy t − J ( y i ) k ≤ f ( J ( y p )) − f ( yyy ) ≤ L ( J ( y p )) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) . (14)In the last inequality, we have use the Lemma 7.1 in Beck and Sabach (2015) that forsome j ∈ { , · · · , n } , f ( J ( y j )) − f ( J ( y j ) + t j d j ) ≥ ( k R j k − K H ( x j − xxx ) / P ni =1 K H ( x i − xxx ) k ) / L ( J ( y j )). Finally from Lemma 5.2 in Beck and Sabach (2015), which states f ( yyy n +1 ) − f ∗ ≤ L ( yyy n ) ( k yyy n − yyy ∗ k − k yyy n +1 − yyy ∗ k ) / k yyy t +1 − yyy k ≤ k yyy t − yyy k ), the upper bound f ( yyy t ) − f ∗ can be derived in the following manner. t − X n =0 (cid:0) f ( yyy n +1 ) − f ∗ (cid:1) ≤ t − X n =0 L ( yyy n )2 (cid:0) k yyy n − yyy ∗ k − k yyy n +1 − yyy ∗ k (cid:1) ≤ L ( J ( y p )) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) (cid:0) k yyy − yyy ∗ k − k yyy t − yyy ∗ k (cid:1) ≤ L ( J ( y p )) k yyy − yyy ∗ k (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) . (15)Since the sequence { f ( yyy t ) } t ≥ is non increasing, t ( f ( yyy t ) − f ∗ ) ≤ P t − n =0 ( f ( yyy n +1 ) − f ∗ ) com-pletes the proof. 24ven though we have achieved the improved rate of convergence O (1 /t ), there still ex-ists a gap in the order of the convergence rate, as compared to O (1 /t ) which is commonlyattained by the accelerated gradient based methods for solving smooth convex optimizationproblems. To further enhance the convergence rate, we here introduce the modified versionof algorithm for RELR. Since the slow convergence rate is essentially caused by the inherentnonsmoothness of the objective function f , this can be resolved by using the smooth alter-native of f , that always gives the optimal solution exactly the same as it is supposed to be.Besides the improved convergence rate, what makes this approach is even more surprisingis that for t ≥ { yyy t } 6∈ { J ( y ) , · · · , J ( y n ) } ) made on the the sequencegenerated by Algorithm 2 is not required. Given the lack of knowledge on conditions underwhich the above assumption can be guaranteed, the smooth approximation method has thepractical advantage.We finish this section by describing the derivation of the modified Weiszfeld’s algorithmfor RELR along with the convergence analysis. To do this, we begin by considering thefollowing smooth function e f s ( yyy ) : R D → R , e f s ( yyy ) = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) g b i ( yyy − J ( y i )) , (16)where g b i ( yyy − J ( y i )) = k yyy − J ( y i ) k , k yyy − J ( y i ) k ≥ b i k yyy − J ( y i ) k b i + b i , k yyy − J ( y i ) k < b i , (17)and b i = f ( J ( y i )) − f ( yyy ). Note first that the function e f s ( yyy ) is convex and continuouslydifferentiable over R D whose gradient is Lipschitz continuous, i.e., k∇ e f s ( yyy ) − ∇ e f s ( zzz ) k ≤ L s k yyy − zzz k ∀ yyy, zzz ∈ R D , with the Lipschitz constant L s = n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) f ( J ( y i )) − f ( yyy ) . g b i ( yyy − J ( y i )) ≥ k yyy − J ( y i ) k ∀ yyy ∈ R D , it follows that we have e f s ( yyy ) ≥ f ( yyy ) which indicates e f s ( yyy ) plays a role of the upper bound of f ( yyy ). Moreover followingfrom Lemma 8.1 in Beck and Sabach (2015), k yyy ∗ − J ( y i ) k ≥ f ( J ( y i ) − f ( yyy ) = b i holdsfor i = 1 , · · · , n , where yyy ∗ is the strict global minimizer of the original objective function f . Then, according to the construction in (17), g b i ( yyy ∗ − J ( y i )) = k y ∗ − J ( y i ) k is alwayssatisfied, which indicates e f s ( yyy ∗ ) = f ( yyy ∗ ) < f ( yyy ) ≤ e f s ( yyy ). Thus, it is clear to see that theminimizer of the inner optimization problem in (11) must also be a global minimizer of (16),i.e., yyy ∗ = argmin yyy e f s ( yyy ) = argmin yyy f ( yyy ). In the above sense, e f s ( yyy ) allows us to smoothlyapproximate the original objective function f ( yyy ). Therefore, rather than directly workingwith f ( yyy ), we should aim to minimize e f s ( yyy ), which leads to Algorithm 3. Algorithm 3
Fast Weiszfeld algorithm for RELR
Input: n observations ( X, Y ) = { ( x , y ) , · · · , ( x n , y n ) } , evaluation point xxx Initialize : s = 1 , uuu = yyy ∈ R d and ε while k yyy t − yyy t − k > ε do For t = 1 , , · · · Update yyy t = uuu t − L s ∇ e f s ( uuu t ), where ∇ e f s ( uuu t ) = n X i =1 K H ( x i − xxx ) P nj =1 K H ( x j − xxx ) × uuu t − J ( y i ) k uuu t − J ( y i ) k , if k uuu t − J ( y i ) k ≥ b i uuu t − J ( y i ) b i , if k uuu t − J ( y i ) k < b i Update s t +1 = 1 + p s t Update uuu t +1 = yyy t + (cid:18) s t − s t +1 (cid:19) ( yyy t − yyy t − ) end whileOutput: Estimated robust estimator b F RE ( xxx ) = J − ( P ( yyy ∗ )) ⊲ y ∗ optimal valueNow we let { yyy t } t ≥ be the sequence generated by Algorithm 3, then we obtain e f s ( yyy t ) − f ∗ ≤ L s k yyy − yyy ∗ k ( t + 1) . (18)26he above convergence result follows immediately from Theorem 9.1 in Beck and Sabach(2015), and see Beck and Teboulle (2009) for a proof. Furthermore, in our RELR setting, L s is bounded from above by L s = n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) f ( J ( y i )) − f ( yyy ) ≤ n X i =1 K H ( x i − xxx ) / P nj =1 K H ( x j − xxx ) f ( J ( y p )) − f ( yyy )= 1 f ( J ( y p )) − f ( yyy ) ≤ L ( J ( y p )) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17) , where the last inequality uses (14). Putting all pieces together and using e f s ( yyy ) − f ∗ ≥ f ( yyy ) − f ∗ , we see that the fast Weiszfeld algorithm for RELR attains the following convergencerate of O (1 /t ) : f ( yyy t ) − f ∗ ≤ k yyy − yyy ∗ k L ( J ( y p )) n ( t + 1) (cid:16) k R p k − K H ( x p − xxx ) P nj =1 K H ( x j − xxx ) (cid:17)o , which is a substantial improvement on the convergence rate in (13). In this section, the benefit of the robust extrinsic local regression over the extrinsic regressionis demonstrated on the basis of simulation studies. While simulation conducted in this paperhas focused only on the case of the planar shape, the proposed method can be applied to othermanifolds in a straight forward manner. Recalling from Section 3.2, we let Y = ( z , · · · , z k )be the response variable which is a planar shape of k -ads defined on Σ k , and X ∈ R p is anEuclidean predictor. By slightly modifying the polar coordinate based scheme in Lin et al.(2017), we generate synthetic planar shape data in the following manner : Generate Covariate : X = ( X , · · · , X p ) , where X i ∼ Uniform( a, b ) Coefficient : β = ( β , · · · β k ) = (1 /k , · · · , k/k ) ∈ R k enerate Intercept angles : φ = ( φ , · · · , φ k ) = (1 / , · · · , k/ ∈ R k Generate Intercept radius : γ = ( γ , · · · , γ k ) = (0 . , · · · , . ∈ R k Generate Shape angles : φ ′ j ∼ Normal φ j + β j p X i =1 X i , σ φ ! Standardize angles : φ = ( φ , · · · φ k ) , where φ j = φ ′ j (mod 2 π ) Generate Shape radius : γ = ( γ , · · · , γ k ) , where γ j ∼ Normal γ j + β j p X i =1 X i , σ γ ! Convert to complex form for the landmark : z j = γ j (cos( φ j ) + i sin( φ j )) . Further, in order to conduct simulation studies under the outlier contaminated setting,we randomly add fixed number of outliers to the response variable, i.e., Y ∗ = Y + Ψ =( z ∗ , · · · , z ∗ k ). The extreme value of outlier Ψ ∈ C k was generated from the k -dimensionalcomplex normal distribution, C N ( µ , Γ ), where µ = E ( Ψ ) and Γ = E (cid:0) ( Ψ − µ )( Ψ − µ ) H (cid:1) .We note that since the contaminated response Y ∗ is no longer an element in Σ k , bothtranslation and scale effects have to be filtered out. To help understand the process bywhich data are generated, the representative illustration of the simulated data is presentedon the left panel of Figure 4. For illustrative purpose, planar shape data is generated underthe univariate setting, and only the first landmark is contaminated. In the right panel of thesame figure, the estimated curves are given, along with true underlying function. As shownin the figure, the curve obtained from the usual ELR (red) substantially deviates from thetrue curve (blue), while the curve corresponding to the RELR (green) is almost overlappedwith the supposed true value.One important remaining issue of the proposed RELR model that has not been high-lighted in the previous section is the bandwidth selection. It is well known that the per-formance of the local polynomial type method significantly relies on a tuning parameter h ,called the bandwidth which plays a crucial role in controlling the degree of smoothing. Tobe specific, large h value leads to a smooth estimation, but by failing to account for a local28 -0.250.000.250.50 -0.2 0.0 0.2 0.4 0.62.5 5.0 7.5 X -0.4-0.20.00.20.40.6 -0.2 0.0 0.2 0.4 0.6 Extrinsic Mean Extrinsic Median True
Figure 4: Left : the example of the simulated data (univariate case) with r = 0 .
2, each pointis colored according to the value of the predictor X . Right : The result of estimations.The optimal bandwidths of RELR and ELR, selected by the 5-fold cross validation are h Med = 1 .
37 and h Mean = 2 .
27, respectively.variation it may introduce a significant estimation bias, whereas small h produces a jaggedestimation, leading to a large variance. Thus it should be properly selected to balance thetrade-off between variance and squared bias. Throughout the simulation, we consider thesmoothing matrix that gives the same bandwidth in all p dimensions, i.e., H = h I p . Thoughbandwidth selection methods have been extensively studied in the early days of nonpara-metric regression modeling, in this paper we adopt 5-fold cross validation for the sake ofsimplicity.The performance of the RELR is evaluated by comparing the results with those achievedby ELR in terms of two different measures associated with the full Procrustes distance, ρ FP = p (1 − |h z , z i| ), where z , z ∈ Σ k . Firstly, we consider MD obs = P ni =1 ρ FP ( y i , b f ( x i )) /n ,that measures difference between the estimated value b f ( x i ) and the observed value y i .Moreover, to assess whether the estimator has the benefit of capturing the true signal,it is more appropriate to examine the following root mean squared error like measureRMSE true = qP ni =1 ρ FP ( f ( x i ) , b f ( x i )) /n , which quantifies the difference between the pre-29 (cid:0)(cid:1)(cid:2) True (cid:5)(cid:6)(cid:7) (cid:8)(cid:9)(cid:10)
Ratio of outlier M ea s u r e Methods Extrinsic Mean Extrinsic Median
Figure 5: Results of the univariate case, as a function of the contamination level r = [0 , . obs . Right: Averaged value of RMSE against underlying trueregression function f ( x i ). To assist in the visualization, we also give a loess smooth withMonte-Carlo standard error.dictor and the true value f ( x i ). To investigate how methods are affected by outliers, thecontamination level r were varied on an evenly spaced grid over [0 , . obs and RMSE true with n = 200 , p = 1, arepresented in Figure 5 (left and right panel, respectively).Overall, the performance curves, obtained across the experimental conditions, are roughlylinear and slope upward from left to right as contamination rate runs from 0 to 0.3, whichindicates performances of both RELR and ELR degrade as r increases. We also have foundthat, values of both MD obs and RMSE true corresponding to RELR are consistently lowerthan those of ELR, except for r = 0. The result from the left panel makes intuitive sense, asRELR minimizes the empirical risk function associated with the unsquared extrinsic distancewhich only differs from the full Procrustes distance by a multiplicative constant. In the rightpanel, however, the value of the slope obtained from ELR, inclined toward the upper rightpoint (0 . , .
2) starting from the origin, is 1.67 which is approximately four times greaterthan that of RELR. This clearly suggests that RELR outperforms ELR by a wide margin in30atio of outliler N Measures Methods 0 0.05 0.1 0.2 0.350 MD obs
ELR 0.0323 0.0716 0.1389 0.2481 0.3505RELR 0.0320 0.0662 0.1192 0.2052 0.2806RMSE true
ELR 0.0233 0.0348 0.0692 0.1446 0.2326RELR 0.0242 0.0257 0.0400 0.0669 0.0781100 MD obs
ELR 0.0333 0.0810 0.1306 0.2441 0.3524RELR 0.0338 0.0758 0.1200 0.2098 0.2884RMSE true
ELR 0.0242 0.0358 0.0550 0.1279 0.2219RELR 0.0255 0.0274 0.0414 0.0617 0.0698200 MD obs
ELR 0.0336 0.0802 0.1351 0.2486 0.3516RELR 0.0341 0.0768 0.1235 0.2126 0.2880RMSE true
ELR 0.0237 0.0327 0.0566 0.1226 0.2093RELR 0.0248 0.0277 0.0386 0.0586 0.0720Table 2: Comparisons of RELR and ELR in terms of MD obs and RMSE true .identifying true patterns of shape changes that are masked by noise. As shown in Table 2,similar results were obtained from further experiments in a multivariate setting ( p = 3) withdifferent samples sizes. In this simulation study, the proposed RELR is seen to universallyperform very well across all examined simulation scenarios. However, contrary to RELR, itappears that the performance of ELR is prone to be adversely affected by the presence ofoutliers and noises, which is consistent and in line with what we would have been expectedfrom the previous location estimation performed in Section 3.2. In this paper, we have proposed the robust statistical methods on manifolds and demon-strated that when outliers exist in the dataset, they are capable of achieving considerableimprovements over existing methods. Building upon the idea of geometric median and theextrinsic framework, our method takes the full advantage of what the two approaches canprovide. (i) the robustness property is attained by employing the unsquared extrinsic dis-31ance induced by the Euclidean embedding, which prevents the estimator from amplifyingthe effects of noise and outliers. (ii) Our approach also can be universally adapted to anymanifold, on which the proper Euclidean embedding is available. For example, the RELRcan be straight-forwardly implemented into the space of p × p symmetric positive definite(SPD) matrices, which especially emerged as the form of data in neuro imaging. To be spe-cific, 3 × → Sym(3); X = UΛU − ∈ SPD(3) log( X ) = U log ( Λ ) U − , and the extrinsic distance is defined as the Frobenius norm, i.e., ρ E ( X , X ) = k log ( X ) − log ( X ) k F .While our main focus was on developing robust statistical methods for estimating thecentral location and regression problem on manifolds, some important problems still remainto be further investigated as part of future work. Now, we end the paper by outlining somepromising future directions for research that may immediately benefit from our proposedframework. First, borrowing ideas from the method of K -medians algorithms (Cardot et al.,2012), clustering on manifolds can be carried out in a more robust manner. Second, anintriguing research question raised by our study is the possible extension of the notion ofthe quantile to manifolds. Unlike a measure of the central tendency (mean and median), thegeneralization of the quantile on non-Euclidean space is nontrivial, because it has to takeinto account the direction and magnitude of changes from the central location. To the bestof our knowledge, there has been no reported research in this context yet, which may be inpart due to the difficulty in defining the direction on the surface of manifolds. The challengeabove may be tackled by utilizing the geometric quantile (Chaudhuri, 1996), which has thegeometric median as a special case and our extrinsic framework. We expect that it will beespecially useful in medical imaging analysis. Because using the quantile information enablesus to quantify pathological state or the abnormality of a certain organ shape, it will be auseful tool for diagnosing and prognosticating critically ill patients32 eferences Beck, A. and Sabach, S. (2015). Weiszfeld’s method: Old and new results.
Journal ofOptimization Theory and Applications , 164:1–40.Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linearinverse problems.
SIAM J. Imaging Sci. , 2:183–202.Bhattacharya, A. and Bhattacharya, R. (2012).
Nonparametric Inference on Manifolds :With Applications to Shape Spaces. IMS Monograph . Cambridge University Press.Bhattacharya, R. N., Ellingson, L., Liu, X., Patrangenaru, V., and Crane, M. (2011). Extrin-sic analysis on manifolds is computationally faster than intrinsic analysis with applicationsto quality control by machine vision.
Applied Stochastic Models in Business and Industry ,28:222–235.Bhattacharya, R. N. and Patrangenaru, V. (2003). Large sample theory of intrinsic andextrinsic sample means on manifolds-part i.
Annals of Statistics , 31:1–29.Bhattacharya, R. N. and Patrangenaru, V. (2005). Large sample theory of intrinsic andextrinsic sample means on manifolds- part ii.
Annals of Statistics , 33:1211–1245.Cardot, H., C´enac, P., and Godichon-Baggioni, A. (2017). Online estimation of the geometricmedian in hilbert spaces: Nonasymptotic confidence balls.
Ann. Statist. , 45(2):591–614.Cardot, H., C´enac, P., and Monnez, J. M. (2012). A fast and recursive algorithm for cluster-ing large datasets with k-medians.
Computational Statistics & Data Analysis , 56(6):1434– 1449.Cardot, H., C´enac, P., and Zitt, P.-A. (2013). Efficient and fast estimation of the geomet-ric median in hilbert spaces with an averaged stochastic gradient algorithm.
Bernoulli ,19(1):18–43. 33haudhuri, P. (1996). On a geometric notion of quantiles for multivariate data.
Journal ofthe American Statistical Association , 91(434):862–872.Chikuse, Y. (2003).
Statistics on Special Manifolds . Springer-Verlag New York.Cornea, E., Zhu, H., Kim, P., Ibrahim, J. G., and the Alzheimer’s Disease Neuroimaging Ini-tiative (2017). Regression models on riemannian symmetric spaces.
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 79(2):463–482.Dryden, I. L. and Mardia, K. V. (1991). General shape distributions in a plane.
Advancesin Applied Probability , 23:259–276.Dryden, I. L. and Mardia, K. V. (1998).
Statistical Shape Analysis . Wiley.Fan, J. and Gijbels, I. (1996).
Local Polynomial Modelling and Its Applications . Chapman& Hall, London.Fisher, N. I., Lewis, T., and Embleton, B. J. J. (1987).
Statistical Analysis of Spherical Data .Cambridge University Press.Fletcher, P. T., Venkatasubramanian, S., and Joshi, S. (2009). The geometric median onriemannian manifolds with application to robust atlas estimation.
NeuroImage , 45:S143–S152.Fr´echet, M. (1948). Les ´el´ements al eatories de nature quelconque dans un espace distanci´e.
In Annales de l’Institut Henri Pointcar´e , 10:215–310.Haldane, J. B. S. (1948). Note on the median of a multivariate distribution.
Biometrika ,35(3-4):414–417.Hampel, F., Ronchetti, E., Rousseeuw, P., and Stahel, W. (1986).
Robust Statistics: TheApproach Based on Influence Functions . Wiley.34uang, C., Styner, M., and Zhu, H. (2015). Clustering high-dimensional landmark-based two-dimensional shape data.
Journal of the American Statistical Association , 110(511):946–961.Huber, P. J. (1964). Robust estimation of a location parameter.
Ann. Math. Statist. ,35(1):73–101.Huber, P. J. and Ronchetti, E. M. (2009).
Robust Statistics . Wiley, 2 edition.Jammalamadaka, S. R. and SenGupta, A. (2001).
Topics in Circular Statistics. Series onMultivariate analysis Vol.5 . World Scientific Press, Singapore.Kemperman, J. (1987). The median of a finite measure on a banach space. In
StatisticalData Analysis Based on the L -norm and Related Methods . Amsterdam : North-Holland.Kendall, D. G. (1984). Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society , 16:81–121.Kent, J. T. (1992).
New directions in shape analysis. In: The Art of Statistical Science . JohnWiley & Sons, Ltd, Chichester.Kuhn, H. W. (1973). A note on fermat’s problem.
Math. Program. , 4:98–107.Lin, L., Thomas, B. S., Zhu, H., and Dunson, D. B. (2017). Extrinsic local regression onmanifold-valued data.
Journal of the American Statistical Association , 112(519):1261–1273.Lopuha¨a, H. P. and Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estima-tors of multivariate location and covariance matrices.
Ann. Statist. , 19(1):229–248.Mardia, K. V. and Dryden, I. L. (1989). The statistical analysis of shape data.
Biometrika ,76:271–281.Mardia, K. V. and Jupp, P. E. (1999).
Directional Statistics . Wiley.35insker, S. (2015). Geometric median and robust estimation in banach spaces.
Bernoulli ,21(4):2308–2335.M¨ott¨onen, J., Nordhausen, K., and Oja, H. (2010). Asymptotic theory of the spatial median.In
Nonparametrics and Robustness in Modern Statistical Inference and Time Series Anal-ysis: A Festschrift in honor of Professor Jana Jureˇckov´a , volume 7 of
IMS Collections ,pages 182–193. Institute of Mathematical Statistics, Beachwood, Ohio, USA.Petersen, A. and M¨uller, H.-G. (2019). Fr´echet regression for random objects with euclideanpredictors.
Ann. Statist. , 47(2):691–719.Shi, X., Styner, M., Lieberman, J., Ibrahim, J. G., Lin, W., and Zhu, H. (2009). Intrin-sic regression models for manifold-valued data. In Yang, G.-Z., Hawkes, D., Rueckert,D., Noble, A., and Taylor, C., editors,
Medical Image Computing and Computer-AssistedIntervention – MICCAI 2009 , pages 192–199, Berlin, Heidelberg. Springer Berlin Heidel-berg.Small, C. G. (1990). A survey of multidimensional medians.
International Statistical Review ,58(3):263–277.Vardi, Y. and Zhang, C. H. (2001). A modified weiszfeld algorithm for the fremat-weberlocation problem.
Mathematical Programming , 90:559–566.Weber, A. (1929).
Uber Den Standort der Industrien (Alfred Weber’s Theory of the Locationof Industries . Univ. Chicago Press.Weiszfeld, E. (1937). Sur le point pour lequel la somme des distances denpoints donn´es estminimum.
Tohoku Mathematics Journal , 43:355–386.Whitney, H. (1944). The self-intersections of a smooth n-manifold in 2n-space.
The Annalsof Mathematics , 45:220–246. 36uan, Y., Zhu, H., Lin, W., and Marron, J. S. (2012). Local polynomial regression forsymmetric positive definite matrices.
Journal of the Royal Statistical Society. Series B(Statistical Methodology) , 74(4):697–719.Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C., and Lin, W. (2009). Intrinsic regressionmodels for positive-definite matrices with applications to diffusion tensor imaging.