[PDF] Nonparametric Spherical Regression Using Diffeomorphic Mappings

Abstract

Spherical regression explores relationships between variables on spherical domains. We develop a nonparametric model that uses a diffeomorphic map from a sphere to itself. The restriction of this mapping to diffeomorphisms is natural in several settings. The model is estimated in a penalized maximum-likelihood framework using gradient-based optimization. Towards that goal, we specify a first-order roughness penalty using the Jacobian of diffeomorphisms. We compare the prediction performance of the proposed model with state-of-the-art methods using simulated and real data involving cloud deformations, wind directions, and vector-cardiograms. This model is found to outperform others in capturing relationships between spherical variables.

Full PDF

NNonparametric Spherical Regression Using Diﬀeomorphic Mappings

M. Rosenthal a , W. Wu b , E. Klassen, c , Anuj Srivastava b a Naval Surface Warfare Center, Panama City Division - X23, 110 Vernon Avenue, Panama City, FL 32407-7001 b Department of Statistics, Florida State University, Tallahassee, FL 32306 c Department of Mathematics, Florida State University, Tallahassee, FL 32306

Abstract

Spherical regression explores relationships between variables on spherical domains. We develop anonparametric model that uses a diﬀeomorphic map from a sphere to itself. The restriction of thismapping to diﬀeomorphisms is natural in several settings. The model is estimated in a penalizedmaximum-likelihood framework using gradient-based optimization. Towards that goal, we specifya ﬁrst-order roughness penalty using the Jacobian of diﬀeomorphisms. We compare the predictionperformance of the proposed model with state-of-the-art methods using simulated and real datainvolving cloud deformations, wind directions, and vector-cardiograms. This model is found tooutperform others in capturing relationships between spherical variables.

Keywords:

Nonlinear; Nonparametric; Riemannian Geometry; Spherical Regression.

1. Introduction

Spherical data arises naturally in a variety of settings. For instance, a random vector withunit norm constraint is naturally studied as a point on a unit sphere. The statistical analysis ofsuch random variables was pioneered by Mardia and colleagues (1972; 2000), in the context ofdirectional data. Common application areas where such data originates include geology, gaming,meteorology, computer vision, and bioinformatics. Examples from geographical domains includeplate tectonics (McKenzie, 1957; Chang, 1986), animal migrations, and tracking of weather for-mations. As mobile devices become increasingly advanced and prevalent, an abundance of newspherical data is being collected in the form of geographical coordinates. Another source of spher-ical data studies directions, e.g., vector-cardiograms studied in Downs (2003). Directional dataalso characterizes the orientations of objects or limbs, which are particularly relevant to biometricapplications including the study of human kinematics (Rancourt et al., 2000) and gait data in thecontext gait identiﬁcation (Boyd and Little, 2005).Spherical regression is an analysis of paired data on a unit hyper-spherical domain S d − = { z ∈ R d : (cid:107) z (cid:107) = 1 } , where (cid:107) · (cid:107) indicates the Euclidean norm. Given n paired observations( x i , y i ) ∈ S d − × S d − for i = 1 , . . . , n , one wishes to describe the relationship between predictor x and response y in order to make predictions and inferences. To deﬁne a spherical regressionmodel, one must decide on the functional form of µ ( x ) the mean of y for a given x . This function µ : S d − → S d − characterizes the expected relationship between the x and y variables, and can Email addresses: [email protected] (M. Rosenthal), [email protected] (W. Wu), [email protected] (E. Klassen,), [email protected] (Anuj Srivastava) a r X i v : . [ s t a t . O T ] F e b ake a parametric, semi-parametric, or a nonparametric form. The estimation of µ additionallydepends on the chosen spherical error distribution for y given x .In past work, the function µ has predominantly taken a parametric form. Several parametricmodels have been proposed for the unit circle S , including the one in Rivest (1997) and Downsand Mardia (2002), but the choice gets limited for higher dimensions. There, the rigid rotationmodel has been the most common choice (Chang, 1986, 1989; Rivest, 1989; Kim, 1991; Prenticeand Mardia, 1995). Downs (2003) used complex M¨obius transformations on S . More recently,Rosenthal et al. (2014) expanded the parametric class by including projective linear transformation.Additionally, there has also been some progress in nonparametric formulations. For instance,Marzio et al. (2014) used a model that modiﬁes standard kernel-smoothing methods to derive ageneral nonparametric spherical regression model.Parametric models, especially with a small number of parameters, are often too restrictiveto adequately capture broad correspondences observed in real data. On the other hand, the highdimensionality of certain nonparametric models can lead to overﬁtting and, thus, hinder model per-formance. Our proposed framework includes ﬂexible classes of deformations using diﬀeomoprhicnonparametric representations which avoid over-ﬁtting by using penalty functions. Smooth bijec-tive relationships occur naturally in many systems. An example is ﬂuid motion, as the mass canbe compressed or expanded but is not generally allowed to occupy the same space simultaneously.For the same reason, the relationships between atmospheric variables over short intervals can bemodeled using diﬀeomorphisms. Since diﬀeomorphisms are invertible mappings, they are also use-ful in situations where the roles of x and y can be reversed. In this paper, we take a nonparametricapproach using a diﬀeomorphism from S to itself. The technique we develop are restricted to S due to the optimization procedure we have used, although the underlying penalized-likelihoodframework itself is valid for arbitrary dimensions.

2. Previous Methods and Their Limitations

A majority of past work in spherical regression have imposed parametric forms for µ . Chang(1986, 1989) created this ﬁeld in the 80’s using µ ( x ) = Ax where A is a rotation in SO( d ) = { A ∈ S d × d : det( A ) = 1 , A T A = AA T = I d } . This approach was followed by Rivest (1989) and Kim(1991). The solution which minimizes the sum of squared errors is given by McKenzie (1957) andStephens (1979) as the Procrustes rotation. The rigid rotation model can characterize changes inlocation and orientation, but is unable to handle any diﬀerence in spread between the predictorand response variable. It is akin to using the ﬁxed addition model y = ( x + a ) + e in the classicalEuclidean case.Downs (2003) proposed an extension that uses M¨obius transformations, a larger parametricfamily than rotations. This model can handle some additional diﬀerences in spread between theresponse and predictor variable, but this framework has not been extended to dimensions higherthan two. A recent model that is applicable to arbitrary S d − uses the group of projective lineartransformations. A projective linear transformation is a map x → A/ | Ax | and is parametrized bythe transformation matrix A ∈ SL( d ), the set of all matrices with determinant one. In Rosenthalet al. (2014), the authors derive an intrinsic Newton Raphson algorithm for maximum-likelihoodestimation and present an asymptotic analysis of the estimator under the von Mises Fisher errordistribution. 2hese parametric models are likely to be useful for a variety of inference and prediction ap-plications. However, more general situations require more ﬂexible models in order to adequatelycharacterize relationships between spherical variables. While one possibility is to explore evenlarger classes of parametric families, a more natural approach is nonparametric regression. Nonparametric methods allow ﬂexible expressions of µ at the cost of model parsimony. In recentwork, Marzio et al. (2014) developed a nonparametric regression model that locally ﬁts weightedpolynomials at each point on the sphere. They demonstrate improvements in predictive perfor-mance over simpler kernel smoothing and rigid rotations. This model is applicable across arbitraryhyper-spheres, dimensions, and geometries. For example, it can be used to deﬁne relationshipsbetween spherical and linear domains. As with most kernel smoothing approaches, it does notensure invertibility of regression maps. This limitation becomes evident in applications such asmeteorological studies, where the short term changes are often diﬀeomorphic, and where data isoften scarce over large areas of the sphere. This is a common occurrence with data generated fromsatellites. In these cases, kernel smoothing methods can overﬁt data in ways that the bandwidthor smoothing parameter cannot adequately address. A restriction to the group of diﬀeomorphismsis a natural alternative in such settings.Our proposed approach is similiar to Glaun`es et al. (2004) which utilizes diﬀeomorphic maps onthe sphere in the context of landmark matching. They deﬁne an objective function for landmark-guided point registration, but this function is not motivated from a regression model perspective.Thus, their approach is not directly applicable to prediction and model estimation. The spheri-cal regression setting is additionally concerned with prediction accuracy, model parsimony, inter-pretability, and statistical inference. Our approach will use a diﬀerent objective function, roughnesspenalty, and optimization strategy that are more suitable for spherical regression.

3. Proposed Regression Framework

The proposed method characterizes relationships between spherical data on S d − using diﬀeo-morphisms. The set of such diﬀeomorphisms Γ consists of mappings γ : S d − → S d − that aresmooth and invertible with smooth inverses, and forms a group under composition. The identityelement of Γ is the mapping γ id ( x ) = x . The group Γ forms a very ﬂexible class of deformationswhich includes the subgroup of rigid rotations and the projective linear group. Additionally, thecomplex Mobius maps, considered as maps on S , are diﬀeomorphisms of the Riemann sphere. Formore details on diﬀeomorphisms see Kac (1990).We will need elements of the Riemannian geometry of S d − and Γ to derive estimation algo-rithms. The tangent space at p ∈ S d − is denoted by T p ( S d − ) = { x ∈ R d : (cid:104) x, p (cid:105) = 0 } and is avector space of dimension d −

1. Similarly, the tangent space of Γ at γ id is denoted by T γ id (Γ) andis inﬁnite dimensional. This tangent space T γ id (Γ) is the set of smooth tangent vector ﬁelds on S d − . That is, for any V ∈ T γ id (Γ) and p ∈ S d − , V ( p ) ∈ T p ( S d − ) is a tangent vector, and it variessmoothly with p .We assume that the error distribution of y given x follows a von Mises Fisher distribution withmean direction γ ( x ) ( γ ∈ Γ) and concentration parameter κ >

0. That is, p x ( y ) = C d ( κ ) exp { κy T γ ( x ) } , C d ( κ ) = κ d/ − (2 π ) d/ I d/ − ( κ ) , (1)3ith y, x, γ ( x ) ∈ S d − . Here, C d is the normalizing constant, and I ν is the modiﬁed Bessel functionof the ﬁrst kind and order ν . The von Mises Fisher density is isotropic about its mean direction γ ( x ). The parameter κ measures the degree of concentration: κ = 0 implies a uniform density on S d − while κ = ∞ implies a Dirac delta at γ ( x ). For further details see Mardia and Jupp (2000).The normalizing constant C d ( κ ) does not depend on the mean direction, so maximum likelihoodestimation of γ separates from that of κ Since the group of diﬀeomorphisms is an inﬁnite-dimensional function space, the maximumlikelihood estimate may over-ﬁt the data. To overcome this problem, we seek a penalized maximumlog-likelihood solution solution that takes the formˆ γ = argmax γ ∈ Γ (cid:32) n − n (cid:88) i =1 y Ti γ ( x i ) − λR ( γ ) (cid:33) . The term f n ( γ ) = n − (cid:80) ni =1 y Ti γ ( x i ) is the log-likelihood from equation (1), and the function R : Γ → R + measures the roughness of γ . We introduce a scale n − in the log-likelihood termin order to compare it across sample sizes. The scalar λ > R ( γ ). It is common in nonparametric statistics involving Euclidean variables to use ﬁrst or secondderivatives of functions to deﬁne their roughness. The ﬁrst order penalty is zero for translationsand penalizes large slopes. The second order penalty is zero for linear maps and penalizes cur-vatures. The interpretation of transformations in spherical domains is diﬀerent. Translations inthe Euclidean domain become rigid rotations and reﬂections in spherical domains, but the analogsof higher-order transformations are not clear. We require that if the diﬀeomorphism is either arotation or a reﬂection, i.e. γ ( p ) = Op , O ∈ O ( d ) = { O ∈ R d × d : O T O = OO T = I d } , with I d denoting the d × d identity matrix , then its roughness measure should be zero. For this purpose, itis suﬃcient to impose a penalty that uses the ﬁrst derivative, or the Jacobian, of γ . Not only do wenot have a proper interpretation for the second derivative of γ , but it also becomes computationallycomplex. Consequently, we will derive a ﬁrst-order roughness penalty on γ .Let J p ( γ ) denote the Jacobian of γ ∈ Γ evaluated at a point p ∈ S d − . By deﬁnition, J p ( γ ) isa linear mapping from the tangent space T p ( S d − ) to T γ ( p ) ( S d − ), each one of these tangent spacesbeing ( d − d − × ( d −

1) matrixwith respect to the chosen bases for these tangent spaces, as elaborated next. Any linear mapbetween the two tangent spaces can ﬁrst be expressed in R d coordinates as u (cid:55)→ Bu for B ∈ R d × d and each u ∈ T p ( S d − ) viewed as an element of R d . Let E, F ∈ R d × ( d − denote orthonormal basesfor T p ( S d − ) and T γ ( p ) ( S d − ), respectively. Then, under the chosen bases, the Jacobian matrixbecomes J p = F T BE ∈ R ( d − × ( d − . In case γ ( p ) = Op for an O ∈ O ( d ), then OE ∈ R d × ( d − also forms an orthonormal basis of T γ ( p ) ( S d − ). Since F and OE are two orthonormal bases ofthe same space, there exists an orthogonal matrix A ∈ O ( d −

1) such that F = OEA . Thus, theexpression for the Jacobian in this case reduces to J p = F T OE = A T E T O T OE = A T ∈ O ( d − p is (cid:107) J p ( γ ) T J p ( γ ) − I d − (cid:107) , where (cid:107)·(cid:107) denotes the Frobenius matrix norm. The roughness at p is zero if and only if J p ( γ ) is an orthogonalmatrix, i.e., a rotation or a reﬂection. The resulting ﬁrst-order roughness measure for the full map γ is then: Q ( γ ) = (cid:82) S d − (cid:107) J p ( γ ) T J p ( γ ) − I d − (cid:107) dp . Another interesting property of this deﬁnition is4hat the natural action of the subgroup O ( d ) on Γ leaves the roughness measure unchanged. Thatis, for any γ ∈ Γ and O ∈ O ( d ), if we deﬁne a new ˜ γ ( p ) = Oγ ( p ), then Q (˜ γ ) = Q ( γ ).Another measure of the distance from isometry can be computed as (cid:107) logm( J Tp J p ) (cid:107) wherelogm denotes the matrix log and (cid:107) · (cid:107) denotes the matrix Frobenius norm. Unlike the previousdistance from isometry measure, this will diverge to inﬁnity as the Jacobian becomes singular.The resulting roughness measure is deﬁned as R ( γ ) = (cid:82) S d − (cid:107) logm { J p ( γ ) T J p ( γ ) }(cid:107) dp . This hasthe added property that the roughness measure will be inﬁnity if the Jacobians are singular overa set of positive measure. As a penalty term, this will ensure that the deformation’s Jacobean hasnonzero determinant everywhere except perhaps on a set of measure zero.Because of the group structure of Γ, any ﬁnite sequence of composed diﬀeomorphisms willresult in a diﬀeomorphism. In the limit, an inﬁnite sequence of compositions may converge to anon-diﬀeomorphic map. For example, a set of points with positive measure may converge to aset of measure zero in the limit. On the other hand, the proposed roughness measure of such adeformation is inﬁnity, so the roughness measure should push the deformation away from suchsolutions. In this paper, we will use R as our chosen roughness measure. Details for computingthese roughness measures Q and R are presented in the Appendix.

4. Optimization Algorithm

Returning to the problem of model estimation, we treat it as an optimization of the objectivefunction E ( γ ) = n − (cid:80) ni =1 y Ti γ ( x i ) − λR ( γ ) by gradient ascent. Since we are optimizing over anonlinear group, the Riemannian geometry of Γ plays a vital role. The gradient of a function at apoint on the Riemannian manifold is by deﬁnition an element of the tangent space at that point.Since tangent spaces are linear, one can represent their elements as coeﬃcients with respect tocorresponding orthonormal bases. Thus, the gradient of E at any γ can be expressed as a linearcombination of tangent basis elements at that γ . This simpliﬁes the computation of gradient tosolving for the corresponding coeﬃcients, but it still requires an orthonormal basis of all tangentspaces of Γ. To avoid this, we will take an iterative approach and solve of an optimal incrementaldiﬀeomorphism at every iteration as follows.We deﬁne an incremental cost function H (˜ γ ) = E (˜ γ ◦ γ ) for the current estimate γ ∈ Γ. Theincremental diﬀeomorphism ˜ γ will be small, i.e., close to γ id , and can be related to an element of T γ id (Γ). Thus, we need to specify the tangent space at γ id only and, as mentioned earlier, this isa set of all smooth tangent vector ﬁelds on S d . The gradient of H at γ id , ∇ γ id H , is an element of T γ id (Γ). Given an orthonormal basis { B , B , . . . } of T γ id (Γ), we can write ∇ γ id H = (cid:80) ∞ j =1 d j B j ,where each d j ∈ R . Each d j coeﬃcient is the directional derivative of H in the direction of B j at γ id . In Section 4.2 we deﬁne such a basis for the tangent space T γ id (Γ). Then in Section 4.3 weshow how to compute the gradient and present the gradient algorithm.Additionally, we will need the following tools on the sphere S d . The exponential map at p ∈ S d − is a mapping exp p : T p ( S d − ) → S d − according to exp p ( v ) = cos( (cid:107) v (cid:107) ) p + sin( (cid:107) v (cid:107) ) v/ (cid:107) v (cid:107) ∈ S d − .The inverse exponential map at point p ∈ S d − maps each non-antipodal point z ∈ S d − to atangent vector v ∈ T p ( S ) according to exp − p ( z ) = ( z − cos( θ ) p ) θ (sin( θ )) − , where θ = cos − (cid:0) p T z (cid:1) .Two points p, z ∈ S d − are non-antipodal if p T z (cid:54) = −

1. For V ∈ T γ id (Γ) we deﬁne a map expressedas Ψ V ( p ) = exp p ( V ( p )) for each p ∈ S d − . The map Ψ V is a diﬀeomorphism on S d − when V isin a small neighborhood around the zero vector. The zero vector is denoted by the vector ﬁeld V with (cid:107) V ( p ) (cid:107) = 0 for each p ∈ S d − . 5 .2. Orthogonal Basis for Incremental Diﬀeomorphisms In this paper, we focus on d = 3 and develop the optimization algorithm for that case. InKurtek et al. (2011) construct an orthonormal basis of smooth tangent vector ﬁelds on a spherefor T γ id (Γ) by applying the gradient to the real and imaginary parts of the complex sphericalharmonic function Y ml of degree l and order m = 0 , . . . , l . From the ﬁrst l harmonics one gets( l + 1) distinct functions by taking the real and imaginary parts as separate functions. We willdenote these real-valued functions on S by ϕ , . . . , ϕ ( l +1) and note that they are parametrizedusing polar coordinates. For each i ∈ , . . . , ( l +1) we evaluate the gradient at each point ( θ, φ ) ∈ S resulting in the vector ﬁeld ∇ ( θ,φ ) ϕ i = (cid:18) ∂ϕ i ∂θ , θ ) ∂ϕ i ∂φ (cid:19) . For each resulting non-trivial tangent vector ﬁeld ∇ ϕ i for i ∈ { , , . . . } , let ˜ B i ( θ, φ ) = ∇ ( θ,φ ) ϕ i / (cid:107)∇ ( θ,φ ) ϕ i (cid:107) .Let ∗ ˜ B i ( θ, φ ) denote the tangent vector at point ( θ, φ ) obtained by rotating ˜ B i ( θ, φ ) counterclock-wise by π/ { ˜ B i } i =1 , ,... and the rotated tangent vector ﬁelds {∗ ˜ B i } i =1 , ,... provides an orthonormal basis for T γ id (Γ).We denote this union by { B , B , . . . } . For any B ∈ T γ id (Γ) and any point p ∈ S , the tangentvector B ( p ) ∈ T p ( S ) is represented as an element of R . If we restrict to basis elements obtainedfrom spherical harmonics of order ≤ l , we obtain L = 2( l +1) − l increases, so does L , and one is able to capture more complex deformations. As mentioned earlier, each iteration is based on optimization of the functional H (˜ γ ) = E (˜ γ ◦ γ )where γ ∈ Γ is the current deformation, and we optimize over the increment ˜ γ in the neighborhoodof γ id . By deﬁnition, ∇ γ id H ∈ T γ id (Γ), which implies that the gradient of H at the identity canbe expressed as a linear combination ∇ γ id H = (cid:80) ∞ j =1 d j B j . Each coeﬃcient d j is the directionalderivative of H in the direction of B j ∈ T γ id (Γ) given by d j = lim (cid:15) ↓ H (Ψ (cid:15)B j ) − H ( γ id ) (cid:15) = lim (cid:15) ↓ E (Ψ (cid:15)B j ◦ γ ) − E ( γ ) (cid:15) . Since the gradient operation is linear, we can separate the likelihood and roughness terms in H . We can write an analytical expression for the directional derivatives of the log-likelihood term.Let f ( i ) ( γ ) = y Ti γ ( x i ) and z i = γ ( x i ), then the coeﬃcient a ij ∈ R is computed as follows: a ij = lim (cid:15) ↓ f ( i ) (Ψ (cid:15)B j ◦ γ ) − f ( i ) ( γ ) (cid:15) = lim (cid:15) ↓ y Ti exp z i ( (cid:15)B j ( z i )) − y Ti z i (cid:15) = y Ti ( − sin( (cid:15) (cid:107) B j ( z i ) (cid:107) ) z i (cid:107) B j ( z i ) (cid:107) + cos( (cid:15) (cid:107) B j ( z i ) (cid:107) ) B j ( z i )) | (cid:15) =0 = y Ti B j ( z i ) . The directional derivative of the log-likelihood term, at the current estimate γ , is given by b j = n − n (cid:88) i =1 a ij = n − n (cid:88) i =1 y Ti B j ( γ ( x i )) . (2)We compute the directional derivative of roughness term numerically: for some small ﬁxed (cid:15) > c j = (cid:15) − ( R (Ψ (cid:15)B j ◦ γ ) − R ( γ )). Thus, each coeﬃcient d j is approximated as d j ≈ b j − λc j , and6he gradient is approximated by ∇ γ id H ≈ (cid:80) Lj =1 ( b j − λc j ) B j . Finally, γ is updated in the directionof ∇ γ id H according to the mapping γ (cid:55)→ Ψ ( δ ∇ γid H ) ◦ γ (3)for a small step size δ > Algorithm:

1. Initialize γ by the rigid rotation parametrized by U U T , where D = U Σ U T is the modiﬁedsingular value decomposition of D = (cid:80) ni =1 y i x Ti with U , U ∈ SO(2) and Σ is the diagonalsingular value matrix.2. For each j ∈ { , . . . , L } , compute the log-likelihood coeﬃcient b j according to equation (2)and numerically compute the roughness coeﬃcient c j for some small ﬁxed step size (cid:15) > ∇ γ id H .3. Update γ for a small step size δ > E ( γ ) has converged, then stop. Otherwise return to step 2.

5. Experimental Results

In this experiment, we explore what happens with the gradient ascent algorithm when there isno likelihood term. We do this to check that the roughness term will push the deformation towardsomething which makes sense to have zero roughness. When there is no data, n − (cid:80) ni =1 y Ti γ ( x i ) = 0because the sum of the empty set is deﬁned to be the additive identity. This implies that theobjective function E does not have a likelihood term in this case. We can perform this experimentusing the previously deﬁne gradient and algorithm with one minor change. Instead of initializingwith a rigid rotation, we initialize our deformation with an arbitrary diﬀeomorphism and theniteratively apply the gradient for the roughness term. Our intuition is that by iteratively applyingthe gradient of the roughness term, the deformation will converge to something which closelyresembles a rigid rotation. One can see in Fig. 1 that this initial deformation is relatively roughand distant from a rigid rotation. The resulting deformation after applying 10,000 iterations canbe seen in the middle panel. The evolution of Roughness measure is plotted in the right panel. Onecan see further details and the animated results of this experiment in the supplementary material. Figure 1: The left panel shows the initial diﬀeomorphism, the middle panel shows the resulting deformation after10,000 iterations, and the right panel shows the evolution of the roughness measure. able 1: Mean squared error of test data from Section 5 experiments. OURS NLL PLT RR TRUESection 5.2 18 . − ) 56 . − ) 22 . − ) 21 . − ) 1 . − )Section 5.3 4 . − ) 5 . − ) 19 . − ) 19 . − ) N/ASection 5.4 7 . − ) 11 . − ) 8 . − ) 9 . − ) N/AOURS refers to the model presented in this paper. NLL refers to the nonparametric local linearregression model. PLT refers to the rigid rotation model. RR refers to the rigid rotation model.TRUE refers to the true diﬀeomorphism and is only applicable to the simulated experimental resultfrom Section 5.2. N/A stands for Not Applicable. We demonstrate the proposed nonparametric diﬀeomorphic regression model on both simulatedand real data. For the simulated case we compare the estimated model to the true deformation,and on the real data we compare predictive performances to some alternative models includingRigid Rotation from Chang (1986), Projective Linear Transformation from Rosenthal et al. (2014),and the Nonparametric Local Linear model from Marzio et al. (2014). We evaluate the predictiveperformance of a model by splitting a data set into training and test data. The model parametersare estimated using the training data, and then the performance is evaluated by computing themean squared error of the test data. This error is computed as

M SE = n − (cid:80) ni =1 (cid:107) y i − ˆ µ i (cid:107) , where y i and ˆ µ i , respectively, denote the true and predicted values and (cid:107) · (cid:107) denotes the Frobenius norm.Since the data and the predicted values are restricted to a sphere, the mean squared error lies in[0, 4]. We also compare the ﬁtted spherical mappings of each model and compare the observed andpredicted values.We start by illustrating the gradient ascent algorithm using simulated paired data points on S that are related by a diﬀeomorphism γ ∈ Γ. The true γ is obtained via a combinationof M¨obius and twisting transformations on S . Additionally, we compose a sequence of smallincremental diﬀeomorphisms using spherical harmonics to obtain our ﬁnal true diﬀeomorphism. Seethe supplementary for details on these maps and their parametrization for this example. Once γ is generated, it is ﬁxed throughout the experiment. The training and test data respectively consistof 200 and 100 independently sampled data points. Predictor variables x , . . . , x n are simulatedindependently from a von-Mises Fisher distribution with mean (0 , , T and concentration κ = 5for the training data and uniformly for the test data. This will leave a gap in the training datato simulate model performance with limited information. The individual responses y i are thenindependently simulated from a von-Mises Fisher distribution with mean γ ( x i ) and concentration κ = 100. The training and test data can be seen in the supplementary.The roughness parameter λ is estimated using cross-validation, i.e. by splitting the trainingdata into 75 training observations and 25 validation observations. In the left panel of Fig. D.8,one can see the mean squared error of the validation data plotted over various levels of λ and for l ∈ { , } . We select the value of λ and l which minimizes the validation error. In this case,we construct a basis from the spherical harmonic functions of order less than or equal to l = 3,select λ = 2(10) − , and use the model ﬁtted using these values to make future predictions. Theright panel of Fig. D.8 shows the evolution of E ( γ ) as γ is iteratively updated according to thegradient ascent algorithm applied to the full set of 200 training data points. In Fig. 3, the true8iﬀeomorphism γ is compared to estimated diﬀeomorphisms at several roughness levels. Notice for l = 3 that the diﬀerence in the estimated diﬀeomorphisms are very subtle over λ . The roughnesspenalty’s inﬂuence is more visible for l = 10.A comparison of other model performances on the test data is summarized in the ﬁrst rowof Table 5.2. None of the estimated models outperform the true diﬀeomorphism because data isscarce over an part of the sphere. The nonparametric local linear regression model performs poorlyhere because it does not handle the extrapolation well. All the other models outperform the locallinear model because they assume the underlying deformation is a diﬀeomorphism, which is truein this simulated the case. One can see a comparison of estimated deformations for other modelswith observed and predicted values plotted in the supplementary material. M SEλ E ( γ )Iteration Figure 2: On the left, one can see the mean squared error on the validation data for various values of λ . The dottedblue and solid red line respectively denote l = 3 and l = 10. The right plot show the evolution of the penalizedlog-likelihood function as the algorithm iterates. The Integrated Global Radiosonde Archive contains radiosonde and pilot balloon observationfrom various stations distributed across the globe. An overview of data coverage is presented inI. Durre (2006). The data set was constructed from the monthly average of November 2013. Inthis data set, there are typically multiple means per station which correspond to various pressurelevels. We take an average of tangential wind velocities at all the available pressure levels at apoint to form a direction at that point. This results in 694 spatial observations, which we split into200 training observations and 494 test observations. For each observed i = 1 , . . . , n the stationlocation will be treated as the predictor x i , and the corresponding tangential wind velocities willbe denoted by v i ∈ T x i ( S ). The tangential wind velocities are measured in scaled units of 600meters per second for this data. Each response variable y i = exp x i ( v i ) is obtained by applying theexponential map to each respective tangent vector. The training data, test data are shown in Fig.4. Using cross validation on the training data, we selected λ = (10) − for the roughness penaltyof our model and l = 10 for the maximum order of basis elements. A plot of the M SE for thevalidation data used for tuning the model is in the supplementary.The test error in Table 5.2 shows that out method outperforms the nonparametric local linearmodel. In Fig. 5, the observed and predicted tangential wind directions are plotted for each model.9rue Diﬀeomorphism λ = (10 − ) λ = 2 . − λ = (10) − l = 3 l = 3 l = 3 λ = (10 − ) λ = 2 . − λ = 7 . − l = 10 l = 10 l = 10 Figure 3: This illustration compares the observed (in red) and predicted (in green) test values of the true diﬀeo-morphism and deformations ﬁtted at various λ values. Training Data ( n = 200) Test Data ( n = 494) Figure 4: The ﬁgures show the training and test data for the wind direction models. The yellow dots denote thelocations of the weather monitoring stations. The red dots denote the average wind displacement. Correspondingdata points are connected with a light gray lines.

The nonparametric local linear model is in close agreement with our diﬀeomorphic model in regionswhere there is abundant data available. In places where training data is scarce, as seen in the southpaciﬁc region, the nonparametric local linear model is non-injective and allows the mesh grid tooverlap and deform heavily.

A vector-cardiogram measures the direction and magnitude of electrical forces that are gener-ated by heart actions. The directional aspect of these vectors have important applications in thediagnoses of certain diseases. The dataset, which was used in Downs (2003), consists of vector-cardiogram data from 98 children ages 2-18, where each child is measured using two lead systems,namely the Frank system and the McFee system. The objective of this experiment is to deﬁne acorrespondence between these two systems. To do this, models will be ﬁtted using the directionalvector from the Frank system as the predictor and the directional vector from the Mcfee system10URS NLLPLT RR

Figure 5: Deformations of ﬁtted models from weather training data with observed and predicted test data respectivelyplotted as red squares and green circles. Corresponding data points are connected with a light gray lines. OURSrefers to the model presented in this paper. NLL refers to the nonparametric local linear regression model. PLTrefers to the rigid rotation model. RR refers to the rigid rotation model. as the response. This is could be useful for combining data sets that use two diﬀerent systems. Inthis case, we choose to convert Frank system to optimally correspond with Mcfee system data.The test and training data has been plotted on the sphere for each corresponding lead systemfrom each child in Fig. 6. The selected roughness parameter is λ = 10 − and up to order l = 3is used for ﬁtting the ﬁnal model. A plot of the M SE for the validation data used for tuning themodel is in the supplementary. The third row of Table 5.2 shows that our method has the smallestpredictive error, while the previous nonparametric method proposed by Marzio et al. (2014) hasthe largest prediction error. In Fig. 7, the observed and predicted McFee directions are plotted foreach model.

6. Discussion

The asymptotic analysis is an important component of this method. This includes determiningconditions necessary for consistency to occur. Unfortunately, the classical arguments for thesetypes proofs usually involve many strong and subtle assumptions which are diﬃcult to verify whenanalyzing over inﬁnite dimensional nonlinear domains such as the group of diﬀeomorphisms on S .To facilitate in bringing this problem forward we will outline the general argument typically used11raining Data ( n = 70) Test Data ( n = 28) Figure 6: The ﬁgure shows the direction of greatest magnitude using the Frank system in yellow and the McFeesystem in red. Corresponding data points are connected with a light gray lines.

Ours NLLPLT RR

Figure 7: Deformations of ﬁtted models from vector-cardiogram training data with observed and predicted test datarespectively plotted as red squares and green circles. Corresponding data points are connected with a light graylines. NLL refers to the nonparametric local linear regression model. PLT refers to the rigid rotation model. RRrefers to the rigid rotation model. and point out the parts which we are uncertain about.Several extensions to the proposed framework can result from alternative optimization strate-gies. This includes extensions into hyper-spherical domains. It is likely that one can extend thedeformation basis utilized in this paper into hyper-spherical domains, however this may be costly12nd ineﬃcient. For higher dimensional spheres it may be better to construct custom orthonormalbases which are derived for speciﬁc applications in order to reduce the number of model coeﬃ-cients. Additionally, one may wish to include a second order roughness penalty to the objectivefunction. Other possible extensions include developing more general classes of deformations. Forexample, to better handle sliding plate boundaries one may wish to utilize a mapping that allowsfor discontinuity on sets of measure zero.

Acknowledgements

This research was supported in part by the NSF grants DMS 1208959 and IIS 1217515.

ReferencesReferences

Boyd, J., Little, J., 2005. Biometric gait recognition. In: Tistarelli, M., Bigun, J., Grosso, E. (Eds.), AdvancedStudies in Biometrics. Vol. 3161 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 19–42.URL http://dx.doi.org/10.1007/11493648_2

Chang, T., 1986. Spherical regression. Annals of Statistics 14 (3), 907–924.Chang, T., 1989. Spherical regression with errors in variables. Annals of Statistics 17 (1), 293–306.Downs, T. D., 2003. Spherical regression. Biometrika 90 (3), 655–668.Downs, T. D., Mardia, K. V., 2002. Circular regression. Biometrika 89 (3), 683–697.Glaun`es, J., Vaillant, M., Miller, M. I., Jan. 2004. Landmark matching via large deformation diﬀeomorphisms on thesphere. J. Math. Imaging Vis. 20 (1-2), 179–200.I. Durre, Russell S. Vose, D. B. W., 2006. Overview of the integrated global radiosonde archive. Journal of Climate19, 53–68.Kac, V. G., 1990. Inﬁnite-Dimensional Lie Algebras. Cambridge University Press, Third Edition.Kim, P. T., 1991. Decision theoretic analysis of spherical regression. Journal of Multivariate Analysis 38, 233–240.Kurtek, S., Klassen, E., Ding, Z., Jacobson, S., Jacobson, J., Avison, M., Srivastava, A., march 2011.Parameterization-invariant shape comparisons of anatomical surfaces. Medical Imaging, IEEE Transactions on30 (3), 849 –858.Mardia, K. V., 1972. Statistics of Directional Data. Academic Press.Mardia, K. V., Jupp, P. E., 2000. Directional Statistics. John Wiley & Sons.Marzio, D., Panzera, A., Taylor, C., 2014. Nonparametric regression for spherical data. Journal of the AmericanStatistical Association.McKenzie, J. K., 1957. The estimation of an orientation relationship. Acta Crystallographica (10), 61–62.Prentice, M. J., Mardia, K. V., 1995. Shape changes in the plane for landmark data. Annals of Statistics 23 (6),1960–1974.Rancourt, D., Rivest, L.-P., Asselin, J., 2000. Using orientation statistics to investigate variations in human kine-matics. Journal of the Royal Statistical Society. Series C (Applied Statistics) 49 (1), pp. 81–94.URL

Rivest, L., 1997. A decentred predictor for ciruclar-circular regression. Biometrika 84 (3), pp. 717–726.Rivest, L.-P., 1989. Spherical regression for concentrated Fisher-von Mises distribution. Annals of Statistics 17,307–317.Rosenthal, M., Wu, W., Klassen, E., Srivastava, A., 2014. Sperical regression models using projective linear trans-formation. Journal of American Statistical Association In Press.Stephens, M. A., 1979. Vector correlation. Biometrika 66, 41–48.

Appendix A. Details of Numerically Computing the Roughness Measure

We use polar coordinates to compute the roughness measure numerically. Given a point ( θ, φ ) ∈ (0 , π ) × (0 , π ], we can obtain the cartesian coordinates as ψ θ,φ = (sin( θ ) cos( φ ) , sin( θ ) sin( φ ) , cos( θ )) T .13onversely, given a point p = ( p , p , p ) T ∈ S ⊂ R in Cartesian coordinates, we can get its polarcoordinates as ψ − p = ( θ, φ ) = (cos − ( p ) , tan − ( p /p )). For a warping γ which is parametrized inCartesian coordinates, let z = ( z , z , z ) T = γ ( ψ θ,φ ). The image of γ at ( θ, φ ) in polar coordinatescan thus be computed as (˜ θ, ˜ φ ) = (cos − ( z ) , tan − ( z /z )). The variable φ can be visualized as arotation about the z axis and θ can be thought of as the arc-length distance from the north pole.To compute a Jacobian on γ , we take derivatives with respect to an orthonormal basis for eachpoint on the sphere. An orthonormal basis for the tangent space at the point ( θ, φ ) is computed as (cid:20) dψ θ,φ dθ , θ ) dψ θ,φ dφ (cid:21) =  cos( θ ) cos( φ )cos( θ ) sin( φ ) − sin( θ )  ,  − sin( φ )cos( φ )0  . Note that sin( θ ) = 0 at the north pole θ = 0 and south pole θ = π . This implies that the changein distance on the sphere is zero at these two points when moving in the direction of φ , so thereare two points of discontinuity. Since we are integrating the roughness over the entire surface ofthe sphere, these two points will not theoretically change the measure of roughness. We excludesmall neighborhoods around the north and south pole in our computation of the roughness.Recall that the Jacobian operation is a linear mapping between the tangent spaces T θ,φ to T ˜ θ, ˜ φ . Therefore, J θ,φ the Jacobian matrix with respect to the sphere at ( θ, φ ) can be computed byapplying a change of basis to the linear Jacobian matrix as J θ,φ = (cid:18) θ ) (cid:19) (cid:32) d ˜ θdθ d ˜ θdφd ˜ φdθ d ˜ φdφ (cid:33) (cid:18) θ ) (cid:19) =  d ˜ θdθ θ ) d ˜ θdφ sin(˜ θ ) d ˜ φdθ sin(˜ θ )sin( θ ) d ˜ φdφ  . Looking at this mapping in reverse, the change of basis from the left hand side transforms theelements from the tangent space T ˜ θ, ˜ φ into coordinates with respect to e , e , which is then appliedto the linear Jacobian. Finally the matrix on the right transforms the basis back to elements of T θ,φ . A measure of the distance from isometry can be computed as (cid:107) logm( J Tθ,φ J θ,φ ) (cid:107) where logmdenotes the matrix log and (cid:107) · (cid:107) denotes the matrix Frobenius norm. The elements of the matrix A θ,φ = J Tθ,φ J θ,φ can be computed as A θ,φ = (cid:32) d ˜ θdθ sin(˜ θ ) d ˜ φdθ θ ) d ˜ θdφ sin(˜ θ )sin( θ ) d ˜ φdφ (cid:33)  d ˜ θdθ θ ) d ˜ θdφ sin(˜ θ ) d ˜ φdθ sin(˜ θ )sin( θ ) d ˜ φdφ  = (cid:18) a a a a (cid:19) =  (cid:16) d ˜ θdθ (cid:17) + sin (˜ θ ) (cid:16) d ˜ θdθ (cid:17) θ ) d ˜ θdθ d ˜ θdφ + sin (˜ θ )sin( θ ) d ˜ φdθ d ˜ φdφ θ ) d ˜ θdφ d ˜ θdθ + sin (˜ θ )sin( θ ) d ˜ φdφ d ˜ φdθ ( θ ) (cid:16) d ˜ θdφ (cid:17) + sin (˜ θ )sin ( θ ) (cid:16) d ˜ φdφ (cid:17)  In this manner, we can measure the distance from isometry for the tangent space at each pointby measuring the distance J θ,φ is from a rotation or reﬂection by (cid:107) J Tθ,φ J θ,φ − I (cid:107) = (cid:107) A θ,φ − I (cid:107) where I denotes the 2 × (cid:107) · (cid:107) denotes the Frobenius norm. This results inthe ﬁrst-order roughness measure Q ( γ ) = (cid:82) π (cid:82) π (cid:107) A θ,φ − I (cid:107) sin( θ ) dθdφ . Alternatively, the square14f the Frobenius norm of the matrix log of A θ,φ can be computed using the eigenvalues as (cid:107) logm( A θ,φ ) (cid:107) = log( λ ) + log( λ ) , where λ = a + a (cid:112) ( a + a ) − a a − a a )2 λ = a + a − (cid:112) ( a + a ) − a a − a a )2 . This results in the following ﬁrst-order roughness measure. R ( γ ) = (cid:82) π (cid:82) π (cid:107) logm( J Tθ,φ J θ,φ ) (cid:107) sin( θ ) dθdφ .For a small δ >

0, let Θ denote a M × M matrix such that Θ i,j = π ( j − δ ) / ( M − − δ )for i, j = 1 , . . . , M . Similarly, let Φ denote a M × M matrix such that Φ ij = 2 π ( i − / ( M − i, j = 1 , . . . , M . In this manner, Θ and Φ represent a parametrization of the sphere where eachpoint in the matrix can be mapped back to Cartesian coordinate using the mapping ψ (Θ i,j , Φ i,j ).Let ( ˜Θ i,j , ˜Φ i,j ) = ψ − ( ψ (Θ i,j , Φ i,j )) for i, j = 1 , . . . , M . One can discretely represent the warpingfunction γ ∈ Γ by these four matrices (Θ , Φ) and ( ˜Θ , ˜Φ). If the resolution M is ﬁne enough one canobtain numerically approximated derivatives for ( ˜Θ , ˜Φ). Since cos − and tan − are typically deﬁnedwith range interval [ − π/ , π/ m ( a, b ) = ( a − b + jπ ) with j = argmin i ∈ Z ( | ( a − b + iπ | ). A numericallyapproximated 2 × J i,j evaluated at (Θ i,j , Φ i,j ) can be computed as J i,j =  m ( ˜Θ i,j , ˜Θ i,j +1 ) m (Θ i,j , Θ i,j +1 ) m ( ˜Θ i,j , ˜Θ i +1 ,j ) m (Φ i,j , Φ i +1 ,j ) m (˜Φ i,j , ˜Φ i,j +1 ) m (Θ i,j , Θ i,j +1 ) m (˜Φ i,j , ˜Φ i +1 ,j ) m (Φ i,j , Φ i +1 ,j )  A numerical estimate of Q ( γ )is given by: Q ( γ ) ≈ M − (cid:88) i =1 M − (cid:88) j =1 tr(( − I ) T ( J Ti,j J i,j − I )) sin(Θ i,j )2 π / ( M ) . If we let A i,j = J Ti,j J i,j and respectively denote the elements of the matrix as a ,i,j , a ,i,j , a ,i,j ,and a ,i,j then the eigenvalues can be computed as λ ,i,j = a ,i,j + a ,i,j (cid:112) ( a ,i,j + a ,i,j ) − a ,i,j a ,i,j − a ,i,j a ,i,j )2 λ ,i,j = a ,i,j + a ,i,j − (cid:112) ( a ,i,j + a ,i,j ) − a ,i,j a ,i,j − a ,i,j a ,i,j )2 . A numerical estimate of R ( γ ) is given by: R ( γ ) ≈ M − (cid:88) i =1 M − (cid:88) j =1 (cid:16) log ( λ ,i,j ) + log ( λ ,i,j ) (cid:17) sin(Θ i,j )2 π / ( M ) . Appendix B. Generating Arbitrary Diﬀeomorpism

In the Sections 5.1 and 5.2, diﬀeomorphic maps were generated by composing several of a para-metric diﬀeomorphisms. The parametric families are:15

Rigid Rotation:

One can uniquely parametrize rigid rotations of the sphere using a specialorthogonal matrix R ∈ SO ( n, R ) which denote the set of n × n real valued matrices withdet( R ) = 1 and R T R = I n where I n denotes the n × n identity matrix. A rotation diﬀeomor-phism γ which is parametrized via rotation matrix R can be evaluated at each z ∈ S by themap γ ( z ) = Rz . • Projective Linear Transformation:

One can uniquely parametrize projective linear trans-formations using a special linear matrix P ∈ SL ( n, R ) which denotes the set of n × n realvalued matrices with det( P ) = 1. A rotation diﬀeomorphism γ which is parametrized viarotation matrix R can be evaluated at each z ∈ S by the map γ ( z ) = P z/ (cid:107)

P z (cid:107) where (cid:107) · (cid:107) denotes the Frobenius norm for matrices. This include the group of rigid rotations. • Conformal map:

Consider S ⊂ R and identify R with C ⊕ R . So, instead of writing anelement of S as ( x, y, z ), we write it as ( z, t ), where z ∈ C and t ∈ R and | z | + t = 1. For S , the conformal maps are precisely the M¨obuis transformations studied by Downs (2003).Suppose we are given the matrix M = (cid:18) a bc d (cid:19) ∈ GL (2 , C ) . Associated to this matrix is a conformal map A : S → S , given by the following formula: A ( z, t ) = (cid:32) cz + d (1 − t ))( az + b (1 − t )) | az + b (1 − t ) | + | cz + d (1 − t ) | , | az + b (1 − t ) | − | cz + d (1 − t ) | | az + b (1 − t ) | + | cz + d (1 − t ) | (cid:33) This formula is well deﬁned for every point of the sphere except the point where z = 0 and t = 1. At this point, A is deﬁned by A (0 ,

1) = (cid:18) ca | a | + | c | , | a | − | c | | a | + | c | (cid:19) Essentially this map is derived by conjugating a M¨obius tranformation by stereographic pro-jection from C → S . Note that every conformal map S → S is obtained by a matrix inthis manner. • Twist map:

Consider S ⊂ R and identify R with C ⊕ R . So, instead of writing an elementof S as ( x, y, z ), we write it as ( z, t ), where z ∈ C and t ∈ R and | z | + t = 1. A very simpleformula for a twist map is as follows. Start by ﬁxing r ∈ R . Then deﬁne a map T : S → S by T ( z, t ) = ( e irt z, t )Essentially this twist map takes each latitude circle to itself by a rotation, and this rotationvaries from the south pole to the north pole. • Small Incremental Diﬀeomorphism Using Spherical Harmonic Basis:

Let B , B , . . . , B L denote the basis elements obtained from the spherical harmonics up to order l as describedin Section 4.2. One can parametrize a small incremental diﬀeomorphism which is close tothe identity transformation via coeﬃcient function c ∈ R L . If (cid:107) c (cid:107) is suﬃciently small then16he map z ∈ S as γ ( z ) = exp z ( (cid:80) Li =1 c i B i ( z )) will represent a diﬀeomorphism which is closeto the identity transformation. Since the set of diﬀeomorphism forms a group under compo-sition, γ ◦ γ is also a diﬀeomorphism for any two γ , γ ∈ Γ. In this manner one can get alarger diﬀeomorphism by iteratively applying these small incremental diﬀeomorphisms. Let γ = γ ◦ γ ◦ . . . ◦ γ J denote J compositions of small incremental diﬀeomorphisms.The initialized deformation in Section 5.1 is generated from a composition of a rigid rotationparametrized by A =  . . . − . . − . − . − . .  , a projective linear transformation parametrized by A =  . . . . . . . − . .  , a conformal transformation parametrized by M = (cid:18) . . i − . . i − . . i . − . i (cid:19) , and a twist map parametrized by r = 0 . M = (cid:18) . − . i − . . i − . . i . − . i (cid:19) , a twist map parametrized by r = 0 . l = 2 spherical harmonics. The 150 coeﬃcients were generated randomly. Appendix C. Diﬀeomorphisms and Their Jacobean Eigenvalues

Any diﬀeomorphism γ : S → S by deﬁnition is diﬀerentiable at each point p ∈ S . A map γ is diﬀerentiable at p ∈ S if J γ,p , the Jacobean of γ at p , exists. If it is exists, then J γ,p is deﬁnedas the full rank linear map from T p ( S ) to T γ ( p ) ( S ) such that for each v ∈ T p ( S ) the limit J γ,p ( v ) = lim (cid:15) ↓ γ ( p + (cid:15)v ) − γ ( p ) (cid:15) exists and yields a well deﬁned linear map between two tangent spaces. If the limit fails to existfor some v ∈ T p ( S ), or maps to a lower dimensional subset of T γ ( p ) ( S ), then we say that J p doesnot exist so that γ is not diﬀerentiable at p .If J γ,p exists, then it can be represented as a full rank 2 × B with respectto some choice of orthonormal bases respectively for T p ( S ) and T γ ( p ) ( S ). There is no standardbasis to use so that the matrix representation will depend on the chosen basis. A change of basiswill results in an orthogonal transformation OBO T for some O ∈ O (2) so that the Eigenvalues willnot aﬀected by the choice of basis.Let λ ,p and λ ,p denote the two Eigenvalues for J γ,p . If J γ,p exists then both Eigenvalues mustbe ﬁnite and non-zero. Because γ is a diﬀeomorphism, J γ,p is continuous with respect to p so thatthe Eigenvalues λ ,p and λ ,p are also continuous with respect to p .17 ppendix D. Additional Plots and Results Appendix D.1. Demonstration Using Simulated Data

Some additional plots from Section 5.2 which may be of interest are presented here. In FigureD.8 the training and testing data are presented. In Figure Appendix D.1, several other modelsﬁtted from the training data are compared. Notice the large overlapping regions that the nonpara-metric local linear model has in this case. This suggests that there is a problem with extrapolationhere. Training Data( n = 200) Test Data( n = 100) Figure D.8: The yellow points denote the x ’s and the red points denote the y ’s. Corresponding data points areconnected with a light gray lines. Appendix D.2. Weather Balloon Wind Directions Data

An additional plot from Section 5.3 which may be of interest are presented here. In FigureD.10 one can see the validation error used for tuning the model. One can see that l = 10 has asmaller validation error and is minimized around λ = 1 . − ). Since the error is not greatlyreduced with respect to λ so we select λ = 10 − . There may be some higher frequency variabilitywhich might be better captured with a larger value of l in this case. Appendix D.3. Vector-Cardiogram Data

An additional plot from Section 5.4 which may be of interest are presented here. In FigureD.11 one can see the validation error used for tuning the model. One can see that l = 3 has asmaller validation error and is minimized with a heavy penalty. Since the roughness seems to begoing down, we select λ = 10 − . The variability seems to be low frequency and may be bettercharacterized using up to order l = 2 basis elements.18urs NLLPLT RR Figure D.9: Deformations of ﬁtted models from simulated training data with observed and predicted test datarespectively plotted as red squares and green circles. Corresponding data points are connected with a light graylines.

M SEλ

Figure D.10: One can see the mean squared error on the weather validation data for various values of λ . The dottedblue and solid red line respectively denote l = 3 and l = 10. SEλ

Figure D.11: One can see the mean squared error on the vector-cardiogram validation data for various values of λ .The dotted blue and solid red line respectively denote l = 3 and l = 10.= 10.