[PDF] Novel Bayesian Procrustes Variance-based Inferences in Geometric Morphometrics & Novel R package: BPviGM1

Abstract

Compared to abundant classical statistics-based literature, to date, very little Bayesian literature exists on Procrustes shape analysis in Geometric Morphometrics, probably because of being a relatively new branch of statistical research and because of inherent computational difficulty associated with Bayesian analysis. Moreover, we may obtain a plethora of novel inferences from Bayesian Procrustes analysis of shape parameter distributions. In this paper, we propose to regard the posterior of Procrustes shape variance as morphological variability indicators. Here we propose novel Bayesian methodologies for Procrustes shape analysis based on landmark data's isotropic variance assumption and propose a Bayesian statistical test for model validation of new species discovery using morphological variation reflected in the posterior distribution of landmark-variance of objects studied under Geometric Morphometrics. We will consider Gaussian distribution-based and heavy-tailed t distribution-based models for Procrustes analysis. To date, we are not aware of any direct R package for Bayesian Procrustes analysis for landmark-based Geometric Morphometrics. Hence, we introduce a novel, simple R package \textbf{BPviGM1} ("Bayesian Procrustes Variance-based inferences in Geometric Morphometrics 1"), which essentially contains the R code implementations of the computations for proposed models and methodologies, such as R function for Markov Chain Monte Carlo (MCMC) run for drawing samples from posterior of parameters of concern and R function for the proposed Bayesian test of model validation based on significance morphological variation. As an application, we can quantitatively show that primate male-face may be genetically viable to more shape-variation than the same for females.

Full PDF

NN OVEL B AYESIAN P ROCRUSTES V ARIANCE - BASED I NFERENCES IN G EOMETRIC M ORPHOMETRICS & N

OVEL R PACKAGE : BPviGM1

A P

REPRINT

Debashis Chatterjee ∗ Interdisciplinary Statistical Research UnitIndian Statistical InstituteKolkata, India [email protected]

January 20, 2021 A BSTRACT

Classical Procrustes analysis (Bookstein, 1997) has become an indispensable tool under GeometricMorphometrics. Compared to abundant classical statistics-based literature, to date, very few Bayesianliterature exists on Procrustes shape analysis in Geometric Morphometrics, probably because ofbeing a relatively new branch of statistical research and because of inherent computational difﬁcultyassociated with Bayesian analysis. On the other hand, although we can easily obtain the pointestimators of the shape parameters using classical statistics-based methods, we cannot make inferencesregarding the distribution of those shape parameters in general and, cannot put forward our priorbelief and update to posterior belief on the same. We need to shift to Bayesian methods for that.Moreover, we may obtain a plethora of novel inferences from Bayesian Procrustes analysis ofshape parameter distributions. In this paper we propose to regard the posterior of Procrustes shapevariance as morphological variability indicators, which is on par with various laws of mathematicalpopulation genetics like Hardy-Weinberg law (Stern, 1943; Masel, 2012). Here we propose novelBayesian methodologies for Procrustes shape analysis based on landmark data’s isotropic varianceassumption and propose a Bayesian statistical test for model validation of new species discovery usingmorphological variation reﬂected in the posterior distribution of landmark-variance of objects studiedunder Geometric Morphometrics. We will consider Gaussian distribution-based and heavy-tailed tdistribution-based models for Procrustes analysis.To date, we are not aware of any direct R package for Bayesian Procrustes analysis for landmark-basedGeometric Morphometrics. Hence, we introduce a novel, simple R package

BPviGM1 (”BayesianProcrustes Variance-based inferences in Geometric Morphometrics 1”), which essentially containsthe R code implementations of the computations for proposed models and methodologies, such asR function for Markov Chain Monte Carlo (MCMC) run for drawing samples from posterior ofparameters of concern and R function for the proposed Bayesian test of model validation based onsigniﬁcance morphological variation.As an application, we applied our proposed Bayesian Procrustes Analysis on “apes” data of O’Higgins& Dryden (1993) (also documented in R package “shapes”). We compared the posterior varianceof the shape of female vs. male for Gorilla, Chimpanzee, Orangutan and conclude that there mightbe “Bayesian evidence” in favor of a novel hypothesis on face-shape: “male primate manifests moreﬂuctuation in face-shape than females,” which suggests further research in future. K eywords Bayesian Analysis · Procrustes Shape Analysis · Geometric Morphometrics ∗ For suggestions and bug-report regarding novel R package

BPviGM1 , please send email to [email protected] , or,please report new issue at https://github.com/debashischatterjee111/BPviGM1/issues a r X i v : . [ s t a t . M E ] J a n PREPRINT - J

ANUARY

20, 2021

Contents

BPviGM1 & Source Code Availability 17

BPviGM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178.2 Installation of R package

BPviGM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178.3 Main functions described in Novel R package

BPviGM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188.4 source code availability for Application part using R package ”BPviGM1” . . . . . . . . . . . . . . . . . . . . . . 19

A.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.1.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.1.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.2 Helmert matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.3 Bayesian Predictive p-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.4 Gibbs-Sampling Algorithm for Bayesian Procrustes Analysis under fully-known conditionals . . . . . . . . . . . . 20A.5 Detailed Examples of usage of R functions in “BPviGM1” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 PREPRINT - J

ANUARY

20, 2021

Procrustes shape analysis is one of the most important method for morphological identiﬁcations and morphological variation study.Morphometrics is the subject of the statistical study of biological shape and change of shape (Bookstein, 1997). Procrustes (Shapeanalysis) problems arise in a wide range of scientiﬁc disciplines, especially when the geometrical shapes of objects are compared,contrasted Theobald & Wuttke (2006) and analyzed in Theobald & Wuttke (2006). The shape is ”the geometrical information thatremains when location, scale, and rotational effects are ﬁltered out from an object” (Kendall, 1977). Two objects will have thesame shape if one can be translated, rescaled, and rotated to the other to match exactly, in the sense that they are similar objectsMicheas et al. (2006). Geometric morphometrics is a discipline that focuses on the study of shape and morphological variationsusing statistical tools. In geometric morphometrics, the shape is usually diagnosed by collecting and analyzing length measurements,counts, ratios, and angles using Cartesian landmark and landmark coordinates capable of capturing morphologically distinct shapevariables. Using classical Procrustes analysis Bookstein (1997), we can estimate shape parameters, but we will not get the posteriordistribution of those. In principle, the Bayesian approach does not have such a problem because it provides multiple realizations togenerate an ”a-posteriori” distribution of parameters of the model Fox et al. (2016).Bayesian methods start with a prior belief about a parameter, thereby involving computation of update on belief after data isobserved, known as “posterior distribution”. Posteriors involve a mathematical superposition of prior belief and evidence providedby observations. Hence, Bayesian data analysis with suitable models offers a highly ﬂexible, intuitive, and transparent alternativeto classical statistics Demˇsar et al. (2020). We may obtain a wide range of novel inferences from Bayesian Procrustes analysisof shape parameter distributions, which we may not achieve if we stick to the classical statistics-based Procrustes approach. Forinstance, in this paper, we propose to regard the posterior of Procrustes shape variance as morphological variability indicators. Forexample, inter-species and intra-species morphological variability may be reﬂected in Kulberk-Leibler divergence (KL-divergence)between the posterior and the respective variance parameters. KL-divergence is a divergence between distributions, and hence, thedivergence between posteriors of shape-variance of different populations may be viewed as an indicator of morphological variability.We may note that we may not get such reasonable indicators merely from the euclidean distance between point estimates in general.Moreover, such novel indicators involving posteriors of biometric shape-variance may reﬂect a natural extension of various laws ofmathematical population genetics like Hardy-Weinberg law (Stern, 1943; Masel, 2012) (see section 6). Unfortunately, much of themodern era of science, Bayesian approaches remained on the sidelines of data analysis compared to classical statistical methods,mainly because computations required for Bayesian analysis are usually quite convoluted, often involving numerical calculationapproximations of multi-dimensional integrals. Markov chain Monte Carlo (MCMC) methods are popular algorithms to sampleefﬁciently from posterior, which we predominantly use in our paper.In this paper, we propose novel Bayesian methodologies for Procrustes shape analysis based on landmark data’s isotropic varianceassumption and propose a Bayesian statistical test for model validation of new species discovery using inter-species intra-speciesmorphological variation reﬂected in the distribution of landmark-variance of objects studied under Geometric Morphometrics. Wewill assume isotropy of variance for landmark data, i.e., the nonsingularity of variance-covariance matrices.For an application to real geometric morphometric data, we applied our proposed Bayesian Procrustes Analysis methodology on“apes” data of O’Higgins & Dryden (1993) (documented in R package “shapes”). In particular, under isotropic variance assumption(Assumption 1), We compared the posterior of variance of shape of female vs. male for different primates (Gorilla, Chimpanzee,Orang utang) and come to a conclusion that there might be “Bayesian evidence” in favor of a novel biometric hypothesis on shape:“male primate manifests more ﬂuctuation in face-shape than females”, which suggests further research in future.At present, there is no R package for Bayesian Procrustes shape analysis using landmark-based objects as data, at least notthat we know to date. Hence, we combine the whole computational parts of this paper into a novel, simple and ﬂexible Rpackage

BPviGM1 (”Bayesian Procruste Analysis using Landmark based on Isometric or Anisometric Error under GeometricMorphometrics”), which will implement the novel bayesian models and methodologies introduced in this paper.The rest of the paper is organized as follows. Section 1 contains the Introduction part as well as overview,section 6 contains existingliterature review, section 3 contains preliminaries on Landmark based Object Analysis, Kendal’s Shape Space, Principle of ProcrustesShape Analysis and overview of principle behind Classical Method for Procruste Analysis. Section 5 contains our proposed novelBayesian Models & methods for Procrustes Analysis under Isotropic Error Variance, along with sampling Procedure ( Markov ChainMonte Carlo Approach for posterior sampling), which we illustrate using Simplest 2D Regression Modelwith Isotropic LandmarkVariance and both uniform prior and Empirical prior with a detailed discussion on problem of Ill-Posedness & Remedy usingEmpirical Prior. The theory behind our proposed novel measure of intra-inter-species shape variability in subsection 5.4. Section 7contains Applications of our proposed methodologies on1. Convergence diagnostics using simulated random convex-concave polygon in 2D,2. Bayesian Procrustes Analysis on “apes” Data: ﬂuctuation in face-shape for Male vs. Female,3. Novel Bayesian Procrustes Analysis on Paleontological Objects : Trilobite shape data (2D Landmark) & Dinosaurbone-shape data (3D Landmark),4. Application of Novel Measure of intra-species and Inter-species Shape Variability on “apes” data,5. Over-ﬁt Problem using Objects already in Shape-Space.section 8 contains details of installation procedure for novel R Package

BPviGM1 & Source Code availability for the same and, mainfunctions described in our R package. We conclude in section 9. PREPRINT - J

ANUARY

20, 2021

The book of Bookstein (1997) contains systematic survey of morphometric methods and shape analysis for landmark data usingconventional multivariate statistical analysis, solid geometry and biomathematics for biological insights into the features of manydifferent organs and organisms. Klingenberg et al. (2002) contains methodologies for analysis of symmetric structures usingquantiﬁcation of variation among objects. Rohlf & Slice (1990) contains a review and some generalizations of superimpositionmethods for comparing conﬁgurations of landmarks in two or more specimens. Use of Procruste-Variance is common in literaturefor study of morphological variation, e.g.,Stange et al. (2018). Landmark-based geometric morphometric study has becomingincreasingly popular and the popularity of statistical methodology research regarding shape analysis is tending to shift from classicalto bayesian. Although there are plenty of classical statistics based literature which addresses various aspects of shape analysis basedon landmark and semi-landmark under geometric morphometrics, bayesian research on the same is comparatively new and till date,there are very few bayesian literature on the same. Nevertheless, bayesian methods becoming increasingly popular and there aresome recent advancements on the bayesian theory of shape analysis like Theobald & Wuttke (2006), Fox et al. (2016), Guti´errez et al. (2019), Micheas & Peng (2010), although most of the existing bayesian literatures depend heavily on isotropic error variance ofthe landmark data. In this paper we will generalize the existing methods and propose novel models and methods to handle even theanisotropic error variance using Mahalanobis distance of Mahalanobis (1925), Mahalanobis (1936). Interestingly, Mahalanobis’sdeﬁnition was prompted by shape analysis and shape comparison problems (Mahalanobis, 1925).

Landmarks are point locations that are biologically and morphologically homologous between specimens (Gunz & Mitteroecker,2013). We will consider a particular object with a ﬁnite number k of points in d dimensions (For instance, d = 2 or d = 3 for two orthree dimensions, respectively). We often select these points on complex objects’ continuous surface, such as a fossilized dinosaurbone. Sometimes too few landmarks are available, or a situation occurs when some structures cannot be quantiﬁed using traditionallandmarks. For example, a traditional homologous landmark can not capture the shape of visible muscle attachments on bones. Insuch a scenario, semi landmarks make it possible to quantify two- or three-dimensional homologous curves and surfaces and analyzethem together with traditional landmarks (Gunz & Mitteroecker, 2013). We can digitize the outlines as a series of discrete pointswith the individual points that must be slid along a tangential direction to remove tangential variation (Perez et al. , 2006), known assemilandmarks or sliding landmarks.A 2D landmark or 3D landmark data can be represented by 3D array ( p × k × n ) , which is the required input format for manyfunctions in package geomorph (see section). where p = the number of landmark points, d = the number of landmark dimensions (2 or 3 for two or three dimensions, respectively), n = the number of specimens.For Procrusts analysis, we cannot directly work with raw landmark data. First, we need to transform into Kendal’s Shape space(Dryden & Mardia, 1998, 2016) (see Subsection 3.2).For a 2D landmark, we can equivalently represent a particular landmark with coordinate ( x, y ) with a complex number z = x + iy ,where x is the real part, and y is the imaginary part. The plane of matrices M (2 , (cid:60) ) generated by (cid:18) x y − y x (cid:19) , is isomorphic to thecomplex number plane C , and the rotation matrix R θ := (cid:18) cos θ − sin θ sin θ cos θ (cid:19) in 2D is equivalent to multiplication by e iθ in complexplane C , which acts on the plane as a rotation of θ radians. Hence, using complex number notation, the rotation operation by angle θ can be represented easily using multiplication by a factor exp( iθ ) , which is equivalent to a rotation matrix. For 3D landmark, wecannot directly use complex number and have to make use of 3D rotation matrix (see Appendix A). Shape is the geometry of an object modulo position, orientation, and size. Mathematically, A shape is a point in a high-dimensional,nonlinear manifold, called a shape space. For preliminaries about manifold, see Appendix A. We require a metric space structure fora comparison between two shapes. Each shape determined by a set of landmarks can be represented by a point in Kendall’s shapespace.The data for raw objects can be arranged such that the coordinates of each landmark are found on a separate row, or that each rowcontains all landmark coordinates for a single specimen. Following Dryden & Mardia (1998), we can mathematically express at leastthree spaces on which models can be deﬁned

Conﬁguration space:

The space for raw objects representation. Formally, The Euclidean space (cid:60) p × d , where conﬁgurations z , · · · , z n ∈ (cid:60) p × d are d-dimensional shapes such that z i = ( z i , z i , · · · , z ip ) T are vector containing p many landmarks PREPRINT - J

ANUARY

20, 2021 coming from (cid:60) d , for i = 1 , , · · · , n , where T denotes transposition operator. For 2 dimension, d = 2 and can equivalentlyrepresented by complex plane. Pre-shape space:

The space obtained from conﬁguration space after translations and scalings have been removed. Formally, itis represented by a hypersphere of unit radius in d ( p − dimensions. We can go from the conﬁguration space to thepre-shape space using the transformation z + = H z (cid:107) H z (cid:107) , (1)where H is the Helmert sub-matrix (Lancaster, 1965; Dryden & Mardia, 1998). In particular, examples of Helmert matrixincludes many orthogonal matrices and rotation matrices in 2D and 3D, many of which will be used in this paper (seesubsection A.2 in Appendix A). Shape space:

Space for which conﬁgurations are invariant under location, rotation, and scaling. We can go from the conﬁgurationspace to the shape space via Procrustes Analysis. The method might be classical or Bayesian.Figure 1 shows triangles (objects) in conﬁguration space, whereas ﬁgure 4 shows those objects in pre-shape space (after makingtranslation & scale invariance). Figure 5 shows the corresponding shape space (using classical full Procrustes analysis). Lastly,ﬁgure 6 shows the Bayesian version (using a novel methodology of this paper, see section 5), which shows the posterior contourof Landmark variance. Intuitively, the blue triangle shape is different from others because one of its side is very small relative toits other two sides, whereas all other triangles have roughly proportional sides. As we can infer from the posterior variance-plotusing methodologies discussed in section 5, blue triangle fails to capture the contour’s sufﬁcient weight, indicating a probableoutlier-object.

Figure 1: Plot of 5 triangles of randomized coordinates.This is an illustration of objects (triangles with vertices aslandmarks) in conﬁguration space (cid:60) × . Here the land-mark coordinates (vertices) are taken from 2D uniformdistribution, except the orientation of red triangle hasmade different, which shows red triangle forms differentshape from the rest (reﬂection is not included) Figure 2: Plot of location shifted (not scaled) space (aftertranslation, scale invariant applied using Helmert matrixin comparison to raw data plot of triangles (ﬁgure 1).Observe that, green & yellow triangle pair are now al-most overlapped (if scaled) because of similarity of theirshapes.5 PREPRINT - J

ANUARY

20, 2021Figure 3: Plot of 5 triangles of randomized coordinatesin Pre-shape space(translation & scale invariance). Herethe landmark coordinates (vertices) are taken from 2Duniform distribution, except the orientation of red trian-gle has made different, which shows red triangle formsdifferent shape from the rest (reﬂection is not included) Figure 4: Plot of posterior-contour of Landmark varianceparameter σ using novel Bayesian Procrustes Analysisof section 5. Here the rotation has not been considered.Those 5 triangles of randomized coordinates in Pre-shapespace(translation & scale invariance). The word “Procrustes”refers to a bandit from Greek mythology who made his victims ﬁt his bed either by stretching their limbs orcutting them off. In its most general formulation, Procrustes analysis involves the optimal matching of two or more form matrices.Here we consider only sets of forms where each form matrix has the same dimensions. The deﬁnition of “shape” (Kendall, 1977)is “all the geometrical information that remains when location, scale and rotational effects are ﬁltered out from an object”. Theword “scale” can be regarded as positive scalar s satisfying the equation g ( s · z ) = s · g ( z ) , where g ( X ) represents any positivereal-valued function of the complex vector z or (equivalently conﬁguration matrix).2-dimensional shapes is deﬁned by a set of p landmarks represented by a complex vector z = ( z , z , · · · , z p ) T where the real partof each z i is the x-coordinate and the imaginary part is the y-coordinate. Fox et al. (2016) considered matching cells based on eithertheir location (centroid based), or their shape differences (Procrustes matching).For the Full Procrustes Fit (FPF) an object is translated, rotated and dilated to produce an exact match with another object.Consider two conﬁgurations w = ( w , w , · · · , w p ) T and z = ( z , z , · · · , z p ) T both in (cid:60) d × p (alternatively, using complex planefor 2D landmark, both in C p ), where z is usually centered, i.e., z (cid:63) p = 0 , ( z (cid:63) denotes the transpose of the complex conjugate of z and p is the p × vector of ones). In order to compare the conﬁgurations in shape we need to establish a measure of distancebetween the two shapes (Dryden & Mardia, 1998). The object is to match to a mean object z using the complex regression (orequivalently, multivariate regression for 2D or 3D euclidean coordinate). For scalar dilation parameter b , c ∈ (cid:60) d (for example, c = ( c , c ) for 2D landmark or equivalently, for complex plane c = c + ic ), and for w , z ∈ (cid:60) d × p we consider the multivariateregression equation w d × p = c Td × p + bR θ z d × p + (cid:15) d × p = [ p , z ] A + (cid:15) = X D A + (cid:15), (2)or, equivalently, w = [ p , z ] A + (cid:15) = X D A + (cid:15), (3)where the matrix containing parameters is is the ( d + 1) × d dimensional A = [ c, bR θ ] T (using rotation matrix notation R θ ), X D := [ p , z ] is the ( d + 1) × p ’design matrix’, c is the d dimensional vector corresponding to translation, b is the complex numbercorresponding to the dilation, and θ is the rotation angle and (cid:15) is a d × p -dimensional complex error vector. For our convenience, wewill omit the notations for dimension. Under isotropic error variance assumption (Fox et al. , 2016), where it is assumed that thevariations of the landmarks are independent, The popular distribution for (cid:15) d × p = ( (cid:15) , (cid:15) , · · · , (cid:15) d ) T (each (cid:15) i ∈ (cid:60) d ) will be (cid:15) i ∼ N p (0 , σ I d ) , (4)where N ( · ) is multivariate normal distribution (Goodman, 1963)(see Appendix) with I p being the identity matrix. In practice,assumption of isotropic error variance may not hold and in this paper we will not assume isotropy. In this paper we will assume (cid:15) ∼ N d × p (0 , Σ dp × dp ) (5) PREPRINT - J

ANUARY

20, 2021 where Σ dp × dp is the variance-covariance matrix of (cid:15) , which in general, represent the dependency structure among the landmarks(which almost always happens in reality). Special case (using complex plane):

If we use complex plane, then we can rewrite (2) as w = c p + be iθ z + (cid:15) = [ p , z ] A + (cid:15) = X D A + (cid:15), (6)where A = (cid:2) c, be iθ (cid:3) (equivalently, A = [ c, bR θ ] is the d × ( d + 1) dimensional location-dilation-rotation matrix using rotationmatrix notation R θ in place of e iθ ) is the matrix containing parameters and X D := [ p , z ] is the ( d + 1) × p ’design matrix’, c isthe complex number corresponding to translation, b is the complex number corresponding to the dilation, and θ is the rotation angleand (cid:15) is a p-dimensional complex error vector. Under isotropic error variance assumption (Fox et al. , 2016), where it is assumed thatthe variations of the landmarks are independent, The popular distribution for (cid:15) is (cid:15) ∼ C N p (0 , σ I p × d ) , (7)where C N ( · ) is complex normal distribution (Goodman, 1963)(see Appendix) with I p being the identity matrix. In practice,assumption of isotropic error variance may not hold and in this paper we will not assume isotropy. In this paper we will assume (cid:15) ∼ C N p (0 , Σ) (8)where Σ is the variance-covariance matrix of (cid:15) . For d = 2 and complex plane, Σ = Σ + i Σ . Assumption 1 (Isotropy of variance parameter) . For the general Bayesian regression model 3, variance for landmark-based objectsis isotropic, i.e., the variance-covariance matrices Σ of (7) is nonsingular. In particular, the landmark variances are independent. Assumption 1 is needed to by pass the pathological scenario of having more unknown parameters than number of observations,known as “ill-posed” problem (refer to subsection 5.4).

Existing bayesian literature like Fox et al. (2016) has assumed that, in equation (6), (cid:15) is a p-dimensional complex error vector tohave an isotropic covariance matrix, i.e., any direction for a landmark is equally preferred. In other words, for isotropic case, wecan model To carry out the superimposition by classical statistical method we estimate A by minimizing the least squares objectivefunction, the sum of square of errors D iso ( w , z ) = (cid:15) (cid:63) (cid:15) = ( w − X D A ) (cid:63) ( w − X D A ) = (cid:107) w − c p − be iθ z (cid:107) , (9)The classical statistical idea is to ﬁnd the FPF of w onto z that minimizes the magnitude of the difference in shape between w and z .Following Dryden & Mardia (1998), we can rewrite in terms of FPD (full Procrustes distance between complex conﬁgurations w and z ) is D F,iso ( w , z ) = inf b,θ,c D iso ( w , z ) , (10)which shows the magnitude of the difference in shape between the two objects (Dryden & Mardia, 1998). The classical full Procrustessuperimposition of z ; on w is obtained by estimating A with ˆ A , where ˆ A = arg min (cid:15) (cid:63) (cid:15). (11) Remark 1 (usefulness of Bayesian method over Classical) . Classical (frequentist) statistical method-based Procrustes analysishas its own advantages and disadvantages. Although it is easy to apply and obtain the point estimators of the shape parameters,their sampling distribution cannot be obtained in general (Fox et al. , 2016), (Micheas & Peng, 2010). In other words, in generalfrom classical approach it is not possible to make additional inference on the parameters except the point estimates. We cannotobtain in general the distributions of these estimates, even under the case of easiest possible error distribution (normally distributederror models) (Micheas & Peng, 2010). Moreover, from classical approach the distribution of the full Procrustes distance is also, ingeneral hard to work with, even under popular multinormal models (Dryden & Mardia, 1998; Fox et al. , 2016; Micheas & Peng,2010).

The principle of Bayesian Procrustes Shape Analysis under isotropic error variance for landmark data adapted in this paper is a novelgeneralization of principle stated in Fox et al. (2016), Micheas & Peng (2010), whereas novel models and methodologies has beenproposed under anisotropic error variance using Mahalanobis distance. PREPRINT - J

ANUARY

20, 2021

Bayesian regression ﬁt estimate does not always match the least-square estimate of the classical method (10), rather it minimizesBayes risk. let L ( θ, (cid:98) θ ) be a loss function, such as squared error (see appendix). The expected loss of an estimator (cid:98) θ of parameter θ isdeﬁned as E π ( L ( θ, (cid:98) θ )) , where the expectation is taken over the probability distribution of θ . Now, (cid:98) θ will be a ”Bayes estimator” if itminimizes the Bayes risk (the posterior expected loss) among all estimators.Hence, the aim of Bayes estimtion is to choose the most appropriate estimator (cid:101) λ = ( (cid:101) b, (cid:101) c, (cid:101) θ, (cid:101) σ ) among all estimators (cid:98) λ = ( (cid:98) b, (cid:98) c, (cid:98) θ, (cid:98) σ ) of the unknown parameter λ = ( b, c, θ, σ ) such that (cid:101) λ = arg min (cid:98) λ E ( L ( λ, (cid:98) λ ) | w , z ) . (12)The most common risk function used for Bayesian estimation is the mean square error (MSE). This is a Bayesian version of classicalcriteria (11), known as ”squared error risk”. The MSE B is deﬁned by MSE B = E (cid:104) ( (cid:98) λ − λ ) (cid:105) , (13)where the expectation is taken over the joint distribution of ( b, c, θ, σ ) and w . We will use the MSE of (13) as risk, which impliesthat the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution. Bayesian Procrustes analysis starts with idea that the observed values of w can be represented as samples from a model distributionwith a property that the highest probability of realization of a sample is around the mean value z . Assumption 2 (Essential Property of Error Density for Bayesian Model (under isotropic variance)) . Selection of Bayesian model forBayesian Procrustes analysis with density of each random variable ( unobserved random variable −→ w corresponding to the observedvalue w ) should be such that, it has its peak value (global optimum) at unobserved truth value of parameters c p + b e iθ z andthe error is such that the probability of a typical sample to fall inside a r − ball around z is higher than to fall inside a similar r − ball around some other ˜ z (cid:54) = z . In other words, bayesian density criteria (complementary to classical criteria (11) ) can be stated as P (cid:16)(cid:110) w : (cid:107) w − c p − b e iθ z (cid:107) ≤ r (cid:111)(cid:17) ≥ P (cid:16)(cid:110) w : (cid:107) w − c p − be iθ ˜ z (cid:107) ≤ r (cid:111)(cid:17) , ∀ r > , ∀ ( c, b, θ, ˜ z ) (cid:54) = ( c , b , θ , z ) . (14)To date, most of the existing bayesian literature on shape analysis like (Fox et al. , 2016; Micheas et al. , 2006) have assumed isotropicerror variance for landmarks and assumed bayesian models which satisﬁes assumption 2. Here we will also propose two Bayesianmodels which satisﬁes assumption 2, under isotropic error variance.Model A: f z ( w |· ) ∝ σ − exp (cid:0) σ − D iso ( w , z ) (cid:1) . (15) Theorem 1.

Density stated in Model (15) satisﬁes Assumption 2 for large numbers of landmarks p and large numbers of objects n ,if the truth model is distributed with mean c + b e iθ and variance σ I pn . In other words, for suitable prior density π ( c, b, θ, σ ) ifposterior consistency holds in the sense lim r → (cid:32) lim p,n →∞ (cid:90) { w : (cid:107) w − c p − be iθ ˜ z (cid:107)≤ r } f ˜ z ( w | c, b, θ, σ ) dπ ( c, b, θ, σ ) (cid:33) = f z ( w | c , b , θ , σ ) , (16) then, for all ˜ z (cid:54) = z and for all r > p,n →∞ (cid:90) { w : (cid:107) w − c p − b e iθ z (cid:107)≤ r } f z ( w | c, b, θ, σ ) dπ ( c, b, θ, σ ) ≥ (cid:90) { w : (cid:107) w − c p − be iθ ˜ z (cid:107)≤ r } f ˜ z ( w | c, b, θ, σ ) dπ ( c, b, θ, σ ) , (17) where f ˜ z ( w | c, b, θ, σ ) is as stated in 15. Based on assumption 2 we can generalize the density (15) withModel AG: f ( w |· ) ∝ g (cid:0) σ, D iso ( w , z ) (cid:1) , (18)such that the function g ( · ) has global maxima at z . PREPRINT - J

ANUARY

20, 2021Figure 5: Plot of shape space (using classical Procrustes Analysis in comparison to raw data plot of triangles (ﬁgure 1)and also in comparison to pre-shape space plot (ﬁgure 4). Observe that, blue & orange triangle-pair are NOT merged(and also green & yellow triangle pair NOT merged), which is due to the fact that Landmarks were deliberately inputtedcounter-clock wise, which accounts for the upside-down ﬂip of blue triangle in comparison to orange triangle.Figure 6: Plot of Posterior density contour using 5 triangles of randomized coordinates, using methods discussed insection 5. Observe that, the blue triangle is of different shape which has been depicted from the plot as it predominantlyremains outside of inner dense-coloured contour. 9

PREPRINT - J

ANUARY

20, 2021

Sampling for posterior is one of the main challenges for Bayesian Procrustes analysis because often the full form of prior or eventhe likelihood will not be known. There are popular sampling methods to handle these computational difﬁculties. In this paper,we will resort to

MCMC (Markon Chain Monte Carlo) based method because of its ﬂexibility general applicability. The

Gibbssampler is a special case of the Metropolis-Hastings algorithm and adopted in existing bayesian literature like Fox et al. (2016),but it differ in two ways: ﬁrst, we always accept candidate point and secondly, we Need to know full conditional distributions. Forboth models (15), (18) respectively, we can sample the parameters using Gibbs sampler only when full conditional distributions isknown, for example, when the landmarks are in pre-shape space and only parameters to be inferred are the variance-parameterswith non-informative priors (refer to Algorithm A.4 in Appendix A. In general, is often not possible to get full expressions of allconditional distributions. In particular, for empirical prior or for anisotropic landmark variance Gibbs sampler may not work atalland MCMC (Markon chain Monte Carlo, a version of Metropolis Sampling) is necessary. The outline of simple version of MCMC isas follows: after choosing initial values of ( b, c, θ, σ ) arbitrarily, we perturb the parameter point and calculate the density ratio, weaccept a new parameter point if the density ratio becomes greater than unity, in fact we accept a new sample parameter point withdensity-ratio-dependent acceptance probability. the detailed algorithm is stated in algorithm 1.We will allow for a burn in period of , we will run the sampler for samples and then take the sample mean, which willgive us the minimizer of the Bayes MSE (13) (Bayes estimator), which we denote as (cid:101) λ = ( (cid:101) b, (cid:101) c, (cid:101) θ, (cid:101) σ ) . Then the BFPF (Bayesian FullProcruste Fit) will be w BFPF = (cid:101) c p + (cid:101) bR (cid:101) θ z . (19)Alternatively, if we write using complex number (for 2D landmark), then w BFPF = (cid:101) c p + (cid:101) be i (cid:101) θ z (20) For illustration, as a special case of (6), with number of objects being n . Here we will assume that we are in pre-shape space, i.e.,both w and z has been transformed in the sense of subsection 3.2. In other words, in Bayesian language we are giving empiricalprior on a neighbourhood of the classical point estimates of ( c, b ) = (¯ c, ¯ b ) (see subsection 4.1). In this new empirical prior choicethe convergence of all the parameters will follow.The whole object data can be represented as 3-dimensional array { w ijk } ≤ i ≤ p ≤ j ≤ ≤ k ≤ n .For k = 1 , , · · · , n suppose w k = { w ,k , w ,k , · · · , w p,k } , z = { z , z , · · · , z p } , where for all i = 1 , , · · · , p each w i =( w i , w i ) T and z i = ( z i , z i ) T . We can write the bayesian regression (6) as (cid:18) w i k w i k (cid:19) = (cid:20) c k c k (cid:21) + b k · (cid:20) cos θ k sin θ k − sin θ k cos θ k (cid:21) (cid:18) z i k z i k (cid:19) + (cid:18) (cid:15) i k (cid:15) i k (cid:19) (21)Assuming the objects coming from same family (e.g., same species or same genera), the same error parameter σ for all the landmarkdata points are justiﬁed on the basis of isotropic error variance (simplest model). model 21 with bivariate Gaussian density for f z ( w |· ) we can rewrite the regression model, for k ∈ { , , · · · , n } (cid:18) w i k w i k (cid:19) ∼ N (cid:18)(cid:20) c k c k (cid:21) + b k · (cid:20) cos θ k sin θ k − sin θ k cos θ k (cid:21) (cid:18) z i k z i k (cid:19) , σ · (cid:18) (cid:19)(cid:19) . (22)We asume the following prior for the parameter set λ = ( c , c , b, θ, σ ) (cid:20) c c (cid:21) ∼ N (cid:18)(cid:20) (cid:21) , σ (cid:20) (cid:21)(cid:19) ; b ∼ N (0 , τ − b ); θ ∼ unif ( − π, π ) , σ ∼ unif (0 , ∞ ) . (23)The algorithm for such model (22) is stated in Algorithm 1. PREPRINT - J

ANUARY

20, 2021

Algorithm 1

Markov Chain Monte Carlo based-Sampling for BFPF (Bayesian Full Procruste Fit) for Isotropic LandmarkError Variance

Input:

Object containing 2D-Landmark data ( z , w ) , Nlandmark, tune, Nsample Output: A (5 × N sample ) dimensional large matrix containing samples from posteriors of the parameters λ = ( c =( c , c ) , b, θ, σ ) . procedure MCMC S

AMPLING ((A version of Metropolis Sampling)) Initialize values of λ = ( c = ( c , c ) , b, θ, σ ) = ( c = ( c (0) , c (0)) , b (0) , θ (0) , σ (0)) (cid:46) May be chosenarbitrarily function FRATIO ( λ , λ ) Compute log-density of (Likelihood × prior) assuming parameter values λ = ( c = ( c , c ) , b, θ, σ ) . Store in f ; Compute log-density of (Likelihood × prior) assuming parameter values λ = ( c = ( c , c ) , b, θ, σ ) . Store in f . return f f end function function PURTURB ( λ ) For each given value of λ → λ old , choose a new set of parameter λ new inside a small-neighbourhood (determined by tuning value ”tune”) of λ old . Generate u ∼ unif (0 , if u ≤ . then Select λ ← λ new else Select λ ← λ old end if end function function STEP ( λ , purturb) Pick new point λ p = purturb ( λ ) Compute Acceptance probability A ← min (1 , f ratio ( λ p , λ )) Accept new point with probability A . end function function RUN ( λ, purturb, nsteps ) Allocate matrix res for ( do i = 1 , i ≤ N sample, i + + ) res [ i, ] ← λ ← step ( λ, purturb ) end forreturn res = { ( c ( t ) , c ( t ) , b ( t ) , θ ( t ) , σ ( t )) } for all t ∈ { , , · · · , } . end function end procedure Theorem 2 (Asymptotic Efﬁciency of BFPF) . Suppose λ = ( b , c , θ , σ ) be the truth value of the parameter (cid:101) λ = ( (cid:101) b, (cid:101) c, (cid:101) θ, (cid:101) σ ) .Consider the reparametrization η ( λ ) := c p + bR θ z , with the truth value η := η ( λ ) = c p + b R θ z . Then, under squareerror risk ( MSE B of (13) ), and for large samples (large values of number of objects n ), the posterior density of (cid:101) λ ≡ (cid:101) λ ( n ) isapproximately normal. In other words, for large n, p , √ n (cid:16) w BFPF − (cid:16) c T p + b R θ z (cid:17)(cid:17) d −→ N (cid:18) , J − ) T I ( λ )( J − ) (cid:19) , (24) where where I ( λ ) is the ﬁsher information of λ and J is the Jacobian matrix such that the ( i, j ) th element of the Jacobian matrix J is deﬁned by J ij = ∂η i ∂λ j . Assumption 1 is needed to by pass the pathological scenario of having more unknown parameters than number of observations,known as “ill-posed” problem. In fact, even under non-singularity assumption of variance-covariance matrix of landmarks, wemay need additional assumption of indepency, otherwise the number of parameters involving variance-covariance matrix will behigh. This type of assumptions may not always hold in practice, and under certain circumstances we may relax the assumption ofIndependency using empirical prior (a prior which depends on data). PREPRINT - J

ANUARY

20, 2021

In fact, Mardia et al. (2000)proposed a fully multivariate test of directional asymmetry for the case of object symmetry for bypassingthe assumption of equal, independent, and isotropic variation at all landmarks.The symmetry in morphological structures is a big concern. It can cause serious statistical problems, for instance ill-conditionedcovariance matrices if all the landmark conﬁgurations are very nearly symmetrical (Klingenberg et al. , 2002), (Bookstein, 1996).Algebraically, symmetry (even if symmetry is not perfect) induces predictability which in tern makes linear dependence among thelandmarks, and therefore, the covariance matrix of landmark positions will be singular (for imperfct symmetry, covariance matriceswill be ill-conditioned). This causes difﬁculties for any statistical procedures that use the inverse or determinant of the dispersionmatrix, only remedy for such problem is by taking the symmetry of the forms into account explicitly and thereby adjusting theanalysis accordingly (Klingenberg et al. , 2002), the central idea of which revolves around ﬁnding proper method for partitioning thetotal shape variation of landmark conﬁgurations with object symmetry into components of symmetric variation among individualsand asymmetry.

Suppose we have two sets of landmark based objects (3D array), namely z n and w n . Our interest is to construct suitable statisticalhypothesis test to infer whether they belong to same taxa or different taxa, based on Procrustes variance among them. For nullhypothesis of same taxa, we may think of the landmark data to come from same distribution with same posterior mean, posteriorvariance. Consider the registration object v register and w register . For i ∈ { , } , let θ i be model parameter for model M i . Forinstance, θ = Σ and θ i = ( λ i , λ , Σ , Σ for the simplest case with Gaussian likelihood. Then, H : z n ∼ w n ∼ N (cid:0) ( z register , w register ) T , Σ (cid:1) ; (25) M : L n ( θ | z n , w n , M ) = N (cid:18) ( z register , w register ) T , (cid:20) Σ 00 Σ (cid:21)(cid:19) . (26)For alternate hypothesis of different taxa, we may think of the landmark data to come from mixture-distribution with differentposterior mean, posterior variance. H : z n | λ ∼ N ( z register , Σ ); (27) w n | λ ∼ N ( w register , Σ ); , (28) M : L n ( θ | z n , w n , M ) = λ N ( z register , Σ ) + λ N ( w register , Σ ) . (29)Then the Bayes factor (interpreted as the quantiﬁcation of the evidence of model M against model M , given objects z n , w n ) ofmodel M against M is given by B n = m ( z n , w n | M ) m ( z n , w n | M ) , (30)where, for i ∈ { , } , marginal densities for the two models be m ( z n , w n | M i ) = (cid:82) θ i L n ( θ i | z n , w n , M i ) π ( dθ i | M i ) respectively,with prior π ( θ i | M i ) . Chatterjee et al. (2020) theoretically proved that, asymptotically bayes factor goes to a version of Kulberk-Leibler divergence of densities of two competing models, even under miss-speciﬁcations. Formally, lim n →∞ log (cid:0) B n (cid:1) = h ( θ ) − h ( θ ) , (31)where h i ( θ i ) = lim n →∞ n E (cid:18) log (cid:26) m ( z n , w n | M i ) L n ( θ i | z n , w n , M i ) (cid:27)(cid:19) . (32)Under Gaussian distribution assumption, one of the main application of Mahalanobis distance (Mahalanobis, 1936) is to measure thedistance between two densities which takes account of intra and inter-species (or, inter-genera) variability. This is because expressionof Mahalanobis distance for multivariate normal is very similar to Kullback–Leibler divergence, a measure of difference from twodistributions. After estimating the mean and variance parameter for landmark data of two hypothesized species populations, we cancompute the Kullback–Leibler divergence from N ( µ , Σ ) to N ( µ , Σ ) , for non-singular matrices Σ and Σ , is: D KL ( N (cid:107) N ) = 12 (cid:26) tr (cid:0) Σ − Σ (cid:1) + ( µ − µ ) T Σ − ( µ − µ ) − d + ln | Σ || Σ | (cid:27) , (33)where d is the dimension of the vector space, for instance d = 2 in 2D landmark data. From BFPF methods discussed so far, we canget estimates (cid:102) µ , (cid:102) µ , (cid:102) Σ , (cid:102) Σ and hence, from (33) we can get an estimate of the divergence between two populations (characterizedby densities N ( · ) and N ( · ) ). We may put species variability problem in the statistical hypothesis frame: H : D KL ( N (cid:107) N ) < (cid:15) ; (34) H : D KL ( N (cid:107) N ) ≥ (cid:15) ; (35)for some threshold (cid:15) > . The sample-version of KL-divergence (cid:103) D KL ( N (cid:107) N ) will form a discrepency statistic, based on which wecan calculate Bayesian predictive p-value for inference (see Appendix A). PREPRINT - J

ANUARY

20, 2021

We generate 1000 quadrilateral-objects (set of 4 landmarks for a single quadrilateral), each from two different types of shapes withtruth values of variance parameter ( σ ) as follows. Convex shaped Quadrilateral: isotropic landmark variance with σ = 1 . , Concave shaped Quadrilateral: isotropic landmark variance with σ = 0 . .Figure 7.1 shows the raw-object plot. Objectives:

As we already know the truth value of parameter(s) σ v = 1 . and σ c = 0 . , we wish to test the consistency of ourBayesian models computationally. here the assumption of isotropic error variance and assumption 2 are both automatically satisﬁedbecause we are simulating deliberately using isotropic variance for the corners (landmarks) of those quadrilaterals.First, we transform the objects into pre-shape space. Then We assume the following model for both convex and concave quadrilateralsexcept different variance parameters σ c for concave quadrilateral objects and σ v for convex quadrilateral objects): (cid:20) vq i, ,k vq i, ,k (cid:21) ∼ N (cid:18)(cid:20) (cid:21) , σ v (cid:20) (cid:21)(cid:19) ; (cid:20) cq i, ,k cq i, ,k (cid:21) ∼ N (cid:18)(cid:20) (cid:21) , σ c (cid:20) (cid:21)(cid:19) . (36)First, we assume non-informative prior for the parameter set σ v ∼∼ unif (0 , ∞ ) , σ c ∼ unif (0 , ∞ ) . As we can observe, we canachieve convergence to the truth value, Figure 7: Plot of actual simulated 1000 convex quadrilateral raw-objects (red) with isotropic landmark variance σ = 1 . and concave quadrilateral raw-objects (blue) with isotropic landmark variance σ = 0 . .13 PREPRINT - J

ANUARY

20, 2021Figure 8: Plot of posterior density contour of isotropic landmark variance. Compare with ﬁgure 7.1 with truth σ = 1 . for convex quadrilaterals (red) and σ = 0 . for concave quadrilaterals (blue). The assumption of isotropic landmarkvariance is crucial here. The data ”apes” can be found in R package “shapes”. It is taken from O’Higgins & Dryden (1993) and also in the famous book ofstatistical shape analysis (Dryden & Mardia, 2016). The data falls in the category of geometric morphometrics landmark-based data.It has the following attributes. apes$x : An array of dimension × × , apes $group: Species and sex of each specimen:”gorf”: Female gorilla skull data. 8 landmarks in 2 dimensions, 30 individuals,”gorm”: Male gorilla skull data. 8 landmarks in 2 dimensions, 29 individuals,”panf”: Female chimpanzee skull data. 8 landmarks in 2 dimensions, 26 individuals,”panm”: Male chimpanzee skull data. 8 landmarks in 2 dimensions, 28 individuals,”pongof”: Female orang utan skull data. 8 landmarks in 2 dimensions, 30 individuals,”pongom”: Male orang utan skull data. 8 landmarks in 2 dimensions, 30 individuals.

Objective:

To obtain Bayesian evidence for the hypothesis: “male primate manifests more ﬂuctuation in face-shape than females”. PREPRINT - J

ANUARY

20, 2021Figure 9: MCMC mixing plot of face-shape landmarkdata of Gorilla-female. Figure shows satisfactory mixingfor samples with burn-ins with tune= . ,even from arbitrary start = 1 . Figure 10: MCMC mixing plot of face-shape landmarkdata of Gorilla-male. Figure shows satisfactory mixingfor samples with burn-ins with tune= . ,even from arbitrary start = 1 . Figure 11: Plot of MCMC posterior of Gorilla male vs. female Variance in Landmark after Pre-shape Space-formation15

PREPRINT - J

ANUARY

20, 2021Figure 12: Plot of MCMC posterior of Chimpanzee male vs. female Variance in Landmark after Pre-shape Space-formation tune= . , even from far start = 1 . Figure 13: Plot of MCMC posterior of Orang-utan male vs. female Variance in Landmark after Pre-shape Space-formation

Conclusion from Bayesian Procrustes Analysis using novel R package

BPviGM1

It is being observed from posteriorFace-shape variance density comparison-plot that there are Bayesian evidence for more face-shape variability in all 3 primates PREPRINT - J

ANUARY

20, 2021 (Gorilla, Chimpanzee & Orang-utan) male than the same for respective Females. In other words, our result supports the hypothesisthat primate male-face may be genetically viable to more shape-variation than the same for females.

For Bayesian Procrustes analysis, we may use landmark data in conﬁguration space, or in pre-shape space. Problem arises when wetry to make Bayesian inference on landmark already in shape-space by use of classical methods. This is because of over-ﬁtting arisingfrom twice use of same data. This results in apparent decrease of variance. For instance, although we know the actual variance, in ﬁg7.3 we demonstrate “over-ﬁt” problem, where the posterior of variance fails to capture the truth (black line) and demonstrates lesservariation than it actually is. Here we have used landmarks already from shape-space for concave quadrilateral data.

Figure 14: Plot of MCMC posterior of simulated 1000 concave quadrilateral raw-objects with isotropic landmarkvariance σ = 0 . , demonstrating the problem of over-ﬁt BPviGM1 & Source Code Availability

The computation part for the implementation of the Bayesian method and posterior sampling is a challenging task. Moreover, weare not aware of any direct Bayesian R package for Procrustes analysis using Geometric Morphometric objects. Hence we havebuilt a novel and simple R package

BPviGM1 (Bayesian Procrustes Variance-based Inferences in Geometric Morphometrics) whichessentially contains the R code version of the Bayesian methods discussed in this paper.

BPviGM1

A novel R package

BPviGM1 (”Bayesian Procruste Analysis using Landmark based on Isometric/Anisometric Error under GeometricMorphometrics”), which mainly contains functions corresponding to R code implimentation of algorithm for MCMC based posteriorsampling. These functions correspond to the above-discussed novel bayesian Procruste Analysis using isometric/anisometricLandmark error, are available as a novel R package through GitHub https://rdrr.io/github/debashischatterjee111/BPviGM1/ , ( https://github.com/debashischatterjee111/BPviGM1 ). BPviGM1

The package is currently available freely from Github. If download from GitHub, you can use devtools by the commands: install . packages (" devtools ") PREPRINT - J

ANUARY

20, 2021 require ( devtools ) install _ github (" debashischatterjee111 / BPviGM1 ") Alternatively, you ﬁrst install R package ”githubinstall”, thereby call

BPviGM1 from it, using following command in R: install . packages (" githubinstall ") require ( githubinstall ) githubinstall (" BPviGM1 ") Once the packages are installed, it needs to be made accessible to the current R session by the commands: require ( BPviGM1 ) BPviGM1

There will be mainly 4 kinds of functions available in the R package, all of which generates samples from posterior of the bayesianregression parameters discussed in section 5. For details, type R command ? < function >Cmat { BPviGM1 } : This function changes a 3D array to a matrix using row-bind.

Helmert { BPviGM1 } : ”Helmert” computes The Helmert sub-matrix, MCMCpostPsample2D { BPviGM1 } : MCMC posterior sampling for 2D landmark data (in Pre-shape space) (Gaussian likelihoodwith Isotropic Error Variance)

MCMCpostsample2D { BPviGM1 } : MCMC posterior sampling for 2D landmark data (Gaussian likelihood with Isotropic ErrorVariance), able to draw posterior from 5 parameters ”c1”,”c2”, ”b”, ”theta”, ”Sigma”.

PLOTpostvar2D { BPviGM1 } : ”PLOTpostvar2D” Plot of posterior of Landmark variance parameter from MCMC sampling. PPLOTpostvar2D { BPviGM1 } : ”PPLOTpostvar2D” Plot of posterior of Landmark variance parameter from MCMC sampling (single or double parameters) Pfratio2D { BPviGM1 } :

2D landmark data in Pre-shape space (Gaussian likelihood with Isotropic Error Variance) as the namesuggest, it evaluates fratio for two parameter vectors. fratio2D { BPviGM1 } :

2D landmark data(Gaussian likelihood with Isotropic Error Variance) as the name suggest, it evaluatesfratio for two parameter vector (for multi-dimensional vector) purturb2D { BPviGM1 } :

2D landmark data(Gaussian likelihood with Isotropic Error Variance) generates pertrubed point from 5parameter space.

Ppurturb2D { BPviGM1 } :

2D landmark data(Gaussian likelihood with Isotropic Error Variance) generates purtubed point from 5parameter space (single or double parameter set).

TMCMCpostsample2D { BPviGM1 } : MCMC posterior sampling for 2D landmark data (Gaussian likelihood with Isotropic ErrorVariance), able to draw posterior from 5 parameters ”c1”,”c2”, ”b”, ”theta”, ”Sigma”.

Prun2D { BPviGM1 } :

2D landmark data(Gaussian likelihood with Isotropic Error Variance) Accepts new parameter vector pointwith probability alpha (single or double parameter set). step2D { BPviGM1 } :

2D landmark data(Gaussian likelihood with Isotropic Error Variance) Accepts new parameter 5*1 point withprobability alpha.

Pstep2D { BPviGM1 } :

2D landmark data(Gaussian likelihood with Isotropic Error Variance) Accepts new parameter 5*1 pointwith probability alpha (single or double parameter set).

Simulated Polygon Dataset(s) { BPviGM1 } : ccq, cvq, ﬁvetr. MCMCpostsample3D { BPviGM1 } : function for MCMC sampling for 2D landmark data (Gaussian likelihood with generalanisotropic Error Variance with Empirical Bayes Prior), **( To do in 2021) PLOTpostvar3D { BPviGM1 } : function for MCMC sampling for 3D landmark data (Gaussian likelihood with general anisotropicError Variance with Empirical Bayes Prior). **( To do in 2021) COMPAREpostvar2D { BPviGM1 } : function for MCMC sampling for 3D landmark data (Gaussian likelihood with generalanisotropic Error Variance with Empirical Bayes Prior). **( To do in 2021) COMPAREpostvar3D { BPviGM1 } : function for MCMC sampling for 3D landmark data (Gaussian likelihood with generalanisotropic Error Variance with Empirical Bayes Prior). **( To do in 2021) Detailed Examples of the usage of R functions in ”BPviGM1”:

See in Appendix A.5. PREPRINT - J

ANUARY

20, 2021

The following information is to be noted regarding source code availability for Application part using R package "BPviGM1" . Source code:

All the source code of all the simulations and data analysis conducted in this paper are freely avail-able through GitHub https://rdrr.io/github/debashischatterjee111/BPviGM1/ , ( https://github.com/debashischatterjee111/Sourcecode1 ). Bayesian approach to geometric morphometrics is a new interdisciplinary branch with huge potential because of many naturaladvantages of bayesian methods that classical analysis lacks. In this paper, novel bayesian models with different kind of priorshas been implemented for the analysis of morphometric variability along with test for model validation under different situations.Moreover, a novel R package has been provided for that, with explanation for how to use it. Please send suggestions and report bugsto https://github.com/debashischatterjee111/BPviGM1/issues , or email to [email protected] . Acknowledgments

Author thanks Geological Studies Unit of Indian Statistical Institute for providing motivations and encouragements to pursue researchon novel bayesian methods for Geometric morphometrics. Author also thanks the anonymous reviewers for their constructivecomments and suggestions.

A Appendix

A.1 ProofsA.1.1 Proof of Theorem 1

Proof 1.

Both theorems follows trivially from properties of gaussian density. For ddetailed discussion, we refer to Mahalanobis(1936) and Mardia et al. (2000).

A.1.2 Proof of Theorem 2

Here we stae two Lemmas (refer to Lehmann & Casella (2006) for detailed proof).

Lemma 1. suppose λ be the truth value of unknown parameter λ , then under MSE B , the posterior mean (cid:101) λ ≡ (cid:101) λ ( n ) satisﬁes √ n ( (cid:101) λ ( n ) − λ ) → N (cid:18) , I ( λ ) (cid:19) , (37) in distribution, where I ( λ ) is the ﬁsher information of λ . It follows that the Bayes estimator (cid:101) λ under MSE is asymptoticallyefﬁcient. Lemma 2 (Reparametrization of Fisher Information) . Let λ and η ( λ ) := c T p + bR θ z be two parametrizations of our Bayesianregression estimation problem, where η ( λ ) is continuously differentiable function of λ with the truth value η := η ( λ ) := c T p + b R θ z . then, I ( λ ) = J T I ( η ( λ )) J , (38) where J is the Jacobian matrix such that the ( i, j ) th element of the Jacobian matrix J is deﬁned by J ij = ∂η i ∂λ j . The detailed proofis given Lehmann & Casella (2006).

Proof 2 (Proof of Theorem 2) . Directly follows from Lemma 2 and Lemma 1.

A.2 Helmert matrix

Standard Helmert matrix (Helmert, 1876; Lancaster, 1965) of order n is an orthogonal square matrix such that(i) h ij = 0 for j > i > ,(ii) h j = + √ w j ; w j > n (cid:88) i =1 w j = 1 (in the strict sense).Helmert matrix in the strict sense has been used widely in statistics. A generalized helmert matrix is which can be transformedby permutations of its rows and columns or by transposition or by change of sign of rows, to a form of standard Helmert matrix(Helmert, 1876; Lancaster, 1965). PREPRINT - J

ANUARY

20, 2021

Theorem 3 (Helmert (1876); Lancaster (1965)) . All standard Helmert matrix of size n has rank ( n − . In other words, standardHelmert matrix of size n depends on ( n − independent parameters. Remark 2 (Lancaster (1965)) . As standard Helmert matrix of size n is orthogonal, we can take the ( n − independent parametersof a standard Helmert matrix as the angles of certain rotations, namely in the plane of the st and j − th coordinate axes, for j = 2 , , · · · , n . Moreover, A Helmert matrix comes in the evaluation of the Jacobian of the transformation from rectangularCartesian to polar coordinates and vice versa. Examples of Helmert matrixExample 1:

Helmert (1876) has taken h j = √ w j = 1 √ n . Example 2: (Rotation matrices in 2D)

For n = 2 , there cannot be any zero above the diagonal and below the ﬁrst row.Hence, the 2D rotation matrix R θ are standard Helmert matrix, where R θ = (cid:20) cos θ sin θ − sin θ cos θ (cid:21) (39)Rotation matrices on euclidean space are square matrices, characterized by orthogonal matrices with determinant 1. A square matrix R is a rotation matrix if and only if R T = R − and determinant ( R ) = 1 . The set of all orthogonal matrices of size n withdeterminant ± forms the orthogonal group O ( n ) , and he set of all orthogonal matrices of dimension d × d with determinant forms the special orthogonal group SO ( n ) , for example, the rotation group SO (2) in 2D and the rotation group SO (3) in 3D.Let L = = (cid:26)(cid:18) a bc − a (cid:19) : a + bc + 1 = 0 (cid:27) . Let I be the identity matrix, then it can be shown that P m = { xI + yt : x, y ∈(cid:60) , t ∈ L } is a plane of matrices isomorphic to complex plane C . It follows from Eular’s formula that, any complex rotation exp( θt ) = cos( θ ) I + t sin( θ ) has a one-one correspondence with rotation matrix.In 2D, we may use matrix notation for counterclockwise rotation by angle θ for orthogonal matrix multiplication (rotation matrix),for example (cid:20) cos θ − sin θ sin θ cos θ (cid:21) (cid:20) xy (cid:21) = (cid:20) x cos θ − y sin θx sin θ + cos θ (cid:21) (40) Example 3: (Rotation matrices in 3D) R θ X =  θ x − sin θ x θ x cos θ x  ; R θ Y =  cos θ y θ y − sin θ y θ y  ; R θ Z =  cos θ z − sin θ z θ z cos θ z

00 0 1  (41) A.3 Bayesian Predictive p-value

We use Bayesian p-values when one would like to check how a model ﬁts the data. Given a model M, we wish to examine how well itﬁts the observed data x obs based on a statistic T . In other words, it measures the goodness of ﬁt of data and model. Consider model M with probability density function f ( x | θ ) and with prior g ( θ ) . The prior predictive p-value under the predictive distribution is p = P ( T ( x ) ≥ T ( x obs ) | M ) = (cid:90) T ( x ) ≥ T ( x obs ) h ( x ) dx, (42)where h ( x ) dx = (cid:82) f ( x | θ ) g ( θ ) dθ is the prior predictive density. (42) may be inﬂuenced by the choice of the prior (Ghosh et al. ,2007). For this reason, the posterior predictive p-value was introduced. If we consider empirical prior which depends on the observeddata g ( θ | x obs ) , then, h ( x | x obs ) = (cid:82) f ( x | θ ) g ( θ | x obs ) dθ . A.4 Gibbs-Sampling Algorithm for Bayesian Procrustes Analysis under fully-known conditionalsA.5 Detailed Examples of usage of R functions in “BPviGM1”

Refer to R package “BPviGM1” R Suppliment(s) kept in Github: https://github.com/debashischatterjee111/Sourcecode1 . References

Bookstein, Fred L. 1996. Combining the tools of geometric morphometrics.

Pages 131–151 of: Advances in morphometrics .Springer.Bookstein, Fred L. 1997.

Morphometric tools for landmark data: geometry and biology . Cambridge University Press. PREPRINT - J

ANUARY

20, 2021

Algorithm 2

Gibbs Sampling for BFPF (Bayesian Full Procruste Fit), When all Conditional distributions are known] procedure G IBBS S AMPLING (from posterior of ( b, c, θ, σ ) ) Initialize values of ( b, c, θ, σ ) = ( b (1) , c (1) , θ (1) , σ (1)) (cid:46) May be chosen arbitrarily Assign step t = 1 while step t ≤ do (cid:46) Assuming burn in value to be 1000 Sample c ( t + 1) from π ( c ( t ) | b ( t ) , θ ( t ) , σ ( t ) , z , w ) Sample b ( t + 1) from π ( b ( t ) | c ( t ) , θ ( t ) , σ ( t ) , z , w ) , Sample θ ( t + 1) from π ( θ ( t ) | c ( t ) , b ( t ) , σ ( t ) , z , w ) , Sample σ ( t + 1) from π ( σ ( t ) | b ( t ) , θ ( t ) , z , w ) . t ← t + 1 end whilereturn { ( b ( t ) , c ( t ) , θ ( t ) , σ ( t )) } for all t ∈ { , , · · · , } . end procedure Chatterjee, Debashis, Maitra, Trisha, & Bhattacharya, Sourabh. 2020. A short note on almost sure convergence of Bayes factors inthe general set-up.

The American Statistician , (1), 17–20.Demˇsar, Jure, Repovˇs, Grega, & ˇStrumbelj, Erik. 2020. bayes4psy—An Open Source R Package for Bayesian Statistics in Psychology. Frontiers in Psychology , .Dryden, Ian L, & Mardia, Kanti V. 1998. Statistical shape analysis: Wiley series in probability and statistics .Dryden, Ian L, & Mardia, Kanti V. 2016.

Statistical shape analysis: with applications in R . Vol. 995. John Wiley & Sons.Fox, Neil I, Micheas, Athanasios C, & Peng, Yuqiang. 2016. Applications of Bayesian Procrustes shape analysis to ensemble radarreﬂectivity nowcast veriﬁcation.

Atmospheric Research , , 75–86.Ghosh, Jayanta K, Delampady, Mohan, & Samanta, Tapas. 2007. An introduction to Bayesian analysis: theory and methods . SpringerScience & Business Media.Goodman, Nathaniel R. 1963. Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction).

The Annals of mathematical statistics , (1), 152–177.Gunz, Philipp, & Mitteroecker, Philipp. 2013. Semilandmarks: a method for quantifying curves and surfaces. Hystrix, the Italianjournal of mammalogy , (1), 103–109.Guti´errez, Luis, Guti´errez-Pe˜na, Eduardo, Mena, Rams´es H, et al. Bayesian Analysis , (2), 427–447.Helmert, FR. 1876. Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Beobachtungsfehlers directorBeobachtungen gleicher Genauigkeit. Astronomische Nachrichten , , 113.Kendall, David G. 1977. The diffusion of shape. Advances in applied probability , (3), 428–430.Klingenberg, Christian Peter, Barluenga, Marta, & Meyer, Axel. 2002. Shape analysis of symmetric structures: quantifying variationamong individuals and asymmetry. Evolution , (10), 1909–1920.Lancaster, HO. 1965. The helmert matrices. The American Mathematical Monthly , (1), 4–12.Lehmann, Erich L, & Casella, George. 2006. Theory of point estimation . Springer Science & Business Media.Mahalanobis, Prasanta C. 1925. Analysis of race-mixture in Bengal.Mahalanobis, Prasanta Chandra. 1936. On the generalized distance in statistics. National Institute of Science of India.Mardia, Kanti V, Bookstein, Fred L, & Moreton, Ian J. 2000. Statistical assessment of bilateral symmetry of shapes.

Biometrika ,285–300.Masel, Joanna. 2012. Rethinking Hardy–Weinberg and genetic drift in undergraduate biology.

BioEssays , (8), 701–710.Micheas, Athanasios C, & Peng, Yuqiang. 2010. Bayesian Procrustes analysis with applications to hydrology. Journal of AppliedStatistics , (1), 41–55.Micheas, Athanasios C, Dey, Dipak K, & Mardia, Kanti V. 2006. Complex elliptical distributions with application to shape analysis. Journal of statistical planning and inference , (9), 2961–2982.O’Higgins, Paul, & Dryden, Ian L. 1993. Sexual dimorphism in hominoids: further studies of craniofacial shape differences in Pan,Gorilla and Pongo. Journal of Human Evolution , (3), 183–205.Perez, S Ivan, Bernal, Valeria, & Gonzalez, Paula N. 2006. Differences between sliding semi-landmark methods in geometricmorphometrics, with an application to human craniofacial and dental variation. Journal of anatomy , (6), 769–784.Rohlf, F James, & Slice, Dennis. 1990. Extensions of the Procrustes method for the optimal superimposition of landmarks. SystematicBiology , (1), 40–59. PREPRINT - J

ANUARY

20, 2021

Stange, Madlen, Aguirre-Fern´andez, Gabriel, Salzburger, Walter, & S´anchez-Villagra, Marcelo R. 2018. Study of morphologicalvariation of northern Neotropical Ariidae reveals conservatism despite macrohabitat transitions.

BMC evolutionary biology , (1),38.Stern, Curt. 1943. The hardy-weinberg law. Science , (2510), 137–138.Theobald, Douglas L, & Wuttke, Deborah S. 2006. Empirical Bayes hierarchical models for regularizing maximum likelihoodestimation in the matrix Gaussian Procrustes problem. Proceedings of the National Academy of Sciences , (49), 18521–18527.(49), 18521–18527.