[PDF] A robust multivariate linear non-parametric maximum likelihood model for ties

Abstract

Statistical analysis in applied research, across almost every field (e.g., biomedical, economics, computer science, and psychological) makes use of samples upon which the explicit error distribution of the dependent variable is unknown or, at best, difficult to linearly model. Yet, these assumptions are extremely common. Unknown distributions are of course biased when incorrectly specified, compromising the generalisability of our interpretations -- the linearly unbiased Euclidean distance is very difficult to correctly identify upon finite samples and therefore results in an estimator which is neither unbiased nor maximally informative when incorrectly applied. The alternative common solution to the problem however, the use of non-parametric statistics, has its own fundamental flaws. In particular, these flaws revolve around the problem of order-statistics and the estimation in the presence of ties, which often removes the introduction of multiple independent variables and the estimation of interactions. We introduce a competitor to the Euclidean norm, the Kemeny norm, which we prove to be a valid Banach space, and construct a multivariate linear expansion of the Kendall-Theil-Sen estimator, which performs without compromising the parameter space extensibility, and establish its linear maximum likelihood properties. Empirical demonstrations upon both simulated and empirical data shall be used to demonstrate these properties, such that the new estimator is nearly equivalent in power for the glm upon Gaussian data, but grossly superior for a vast array of analytic scenarios, including finite ordinal sum-score analysis, thereby aiding in the resolution of replication in the Applied Sciences.

Full PDF

AA robust multivariate linear non-parametric maximum likelihood modelfor ties

Landon Hurley

Yale UniversitySchool of Public Health &VA Connecticut Healthcare SystemWest Haven CSP Coordinating Centre

Statistical analysis in applied research, across almost every ﬁeld (e.g., biomedical, economics,computer science, and psychological) makes use of samples upon which the explicit errordistribution of the dependent variable is unknown or, at best, di ﬃ cult to linearly model. Yet,these assumptions are extremely common. Unknown distributions are of course biased whenincorrectly speciﬁed, compromising the generalisability of our interpretations – the linearly un-biased Euclidean distance is very di ﬃ cult to correctly identify upon ﬁnite samples and thereforeresults in an estimator which is neither unbiased nor maximally informative when incorrectlyapplied. The alternative common solution to the problem however, the use of non-parametricstatistics, has its own fundamental ﬂaws. In particular, these ﬂaws revolve around the problemof order-statistics and the estimation in the presence of ties, which often removes the intro-duction of multiple independent variables and the estimation of interactions. We introduce acompetitor to the Euclidean norm, the Kemeny norm, which we prove to be a valid Banachspace, and construct a multivariate linear expansion of the Kendall-Theil-Sen estimator, whichperforms without compromising the parameter space extensibility, and establish its linear max-imum likelihood properties. Empirical demonstrations upon both simulated and empirical datashall be used to demonstrate these properties, such that the new estimator is nearly equivalent inpower for the glm upon Gaussian data, but grossly superior for a vast array of analytic scenar-ios, including ﬁnite ordinal sum-score analysis, thereby aiding in the resolution of replicationin the Applied Sciences. Introduction

Achieving the general construction of a non-parametriclinear regression framework, wherein the distribution, linear-ity, and closed form expressions of the estimating equationsbetween errors and covariates may be easily presented, hasbeen a long desired result in applied statistics. The ﬁrst ma-jor development was that of Kendall (1938) τ a and the corre-sponding univariate Kendall-Theil-Sen estimator, which de-veloped a locally consistent Gauss-Markov estimator insen-sitive to outliers, subject only to the requirement of order-ability to ensure these properties. This allows the estimatorto compete well against least squares even for normally dis-tributed data, while also allowing for linear single slopes tobe applied to discrete ordinal data. The estimator, however,does not provide all of the necessary properties required inexperimental statistical designs, in particular for use in theApplied Social Sciences. Speciﬁcally, we are referring to thehigher-order factorial and polynomial design matrices, which [email protected] are unable to e ﬀ ectively estimated with the introduction ofties (or collisions and surjective mappings), which precludeﬁnite sample (strong) identiﬁcation and convergence. Toresolve this, we introduce a Banach norm metric topologi-cal vector space, which possesses the same structure of theKendall τ -metric, but which does not possess the selectionbias upon the sample space, as it is naturally robust to theoccurrence of ties, unlike Kendall’s τ b . Further, this sametopology allows for the estimation of ﬁnite sample interac-tions, which are mathematically identical to ties, and there-fore a substantial unresolved problem in applied research.We introduce the mathematical properties of a complete met-ric upon a linear sub-space, and compare performance innumerous scenarios to that of the traditional Gaussian lin-ear regression model, which demonstrate support of the su-periority of the Kemeny norm, in particular as an unbiasedsecond-order consistent (i.e., replicable) estimator. In addi-tion, we derive estimating equations similar to that of OLSregression in terms of the variance-covariance matrices forboth parameter estimates and standard errors, and has beenshown e ﬀ ective at addressing missing at random data with anEM-algorithm. a r X i v : . [ s t a t . M E ] F e b HURLEY

In applied analysis, researchers are often presented witha measurement of interest y m , the dependent variable, alongwith a covariate set X nm which we use to estimate and explorea stable relationship between the expected changes in the tar-get relative to the di ﬀ erences observed or controlled upon oursample. In this manuscript, unless otherwise stated, we as-sume each column vector in { y , X } is of length m , for which y is a scalar while X is a rectangular matrix of order m × n , withthe restriction that m (cid:29) n , with uniform sampling selectionindependent and identical upon the population wrt the row-space. The goal of a regression framework is to remove thelinear dependencies between all n choose 2 features in the de-sign matrix X of n features, and then to project the optimallyweighted linear combination of these unique pieces of infor-mation onto Y . This unbiased Gauss-Markov optimality issuch that we may interpret and approximate how the averageﬁxed unit change in the similarity of X (cid:44) → y may be expectedto correspond to an estimated observed change in y . Linearsystems with complete metrics are extremely beneﬁcial forsuch applications, both in terms of their parameter ﬂexibilityin establishing complicated yet estimable linear relations, butalso in their ability to establish, upon relatively small sam-ples, learned patterns which strongly generalise outside ofthe sample at hand.This beneﬁcence comes at a cost, however: the condi-tional normality and linearity of errors be correctly estab-lished, in order to maintain the orthonormal separability ofthe bias of incomplete sampling upon the sample from thebias in the parameters. Thus, we provide a robust multi-variate mathematical framework, in the style of the generallinear model, which may be applied to almost any partiallyorderable probability distribution function deﬁnable upon acommon population; thus any distribution which is indepen-dently sampled, but which does not require linearity to beestablished across the column space of X . We will alsodemonstrate how the Kendall τ and similar non-parametricwork (e.g., the Wilcoxon rank-sums test and the Friedmantest) may be resolved to produce an unbiased linear estima-tor which is e ﬃ cient and easily capable of addressing non-parametric multivariate families within a linear sub-space.The Kemeny (1959) metric deﬁnes a complete mathemat-ical framework whose methods are shown to be a maximumlikelihood estimator, with both probabilistic and closed formsolutions, for almost any sortable distribution. It is furtherdemonstrated to be only mildly less e ﬃ cient when in thepresence of a truly normal distribution, and substantiallymore Gauss-Markov optimal when addressing non-normaldata. We introduce a non-parametric linear regression systemwhose solutions and standard errors are demonstrably andtheoretically a maximum likelihood estimator (MLE) whichis robust to non-normality and more informative even un-der applications to estimation scenarios such as summative-scores and even applications of the polychoric correlations. Contribution and organisation of the paper

Supposing the Kemeny metric ρ K to be a convex func-tional for a topological vector space ( X , ρ K ), we prove andprovide empirical demonstrations that the relation between( ˆ α n , (cid:15) ) is both uniquely determined and linear under a rela-tively weak set of conditions, as well as being an estimator ofminimum variance. We deﬁne the necessary characteristicsof the parametric error family which satisﬁes this functionallinear basis, and demonstrate how it enables minimum un-certainty with respect to ˆ α n as compared to other unbiasedestimators. We conclude with several simulations and an ap-plied data analysis, all validated under jackknife resampling,to demonstrate that the performance conditions expected un-der maximum likelihood are validated as a primal-dual char-acterisation for our introduced methodology, without the in-troduction of a non-identity link function, as a consequenceof the a ﬃ ne relationship upon ( X , ρ K ) for a much wider arrayof the exponential family of distributions. Motivation and literature review

We posit that the maximum likelihood properties of theEuclidean (cid:96) norm are non-robust in terms of their consis-tency and breakdown in ﬁnite samples. While asymptoticconsistency in expectation is provably true, the ability for aﬁnite sample to possess a sub-additive representation of thepopulation, from said subset, is much less forthcoming, espe-cially when the conditional distribution (i.e., the error distri-bution, (cid:15) ) is non-normally distributed. We argue that this em-pirical failing is largely a function of the over-generalisationof the normal distribution of errors as a continuous randomﬁeld which is orthonormal to the covariate space, which di-rectly implies that the ﬁnite sample selection and parameterbiases are inconsistent with our asserted inductive interpreta-tions in how to to understand a population.A brief introduction to the foundational basis of this errormay be found with the James and Stein (1961) lemma, de-composing bias into orthonormal components upon any ar-bitrary additive norm space, wherein γ T represents the totalestimation bias and the Bayes error ε , denoting the total ir-reducible error. γ T may be further expanded to denote biaswith respect to the sampling upon the population γ m , biaswrt the parameter estimation (e.g., scenarios for which themodel is not correctly identiﬁed, as well as traditional Tikhi-nov ridge regression or restricted maximum likelihood esti-mation), and the interaction of these two pieces γ n · γ m . Witha complete metric topological vector space (TVS), under thelimit wrt m for uniform sampling, it is expected by deﬁni-tion that γ n is strongly convergent to 0, and therefore that thebias γ T is solely a function of the proportional representationof the population within the sample. This bias, if reﬂectiveof uniform sampling, should tend to 0 as well revealing thestructure if the common population from which γ m arose. INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS (cid:15) (cid:124)(cid:123)(cid:122)(cid:125) sample error = γ T (cid:124)(cid:123)(cid:122)(cid:125) bias + ε (cid:124)(cid:123)(cid:122)(cid:125) Bayes Error (1) (cid:15) = ( γ m + γ n + γ m · γ n ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) γ T + ε (2) (cid:107) (cid:15) − ε (cid:107) = p-lim m →∞ ( γ m + (cid:19)(cid:19)(cid:55) γ n + (cid:8)(cid:8)(cid:8)(cid:8)(cid:42) γ m · γ n ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) γ T = . (3)Early non-parametric work arguably foundered upon theproblem of model identiﬁcation in the presence of ties, whichnaturally arise in both the sample and parameter spaces, suchthat the James and Stein (or bias-variance tradeo ﬀ ) inequal-ity for a Banach norm-space has both: (1) non-zero γ m , orbias with respect to the sampling resulting from ties beingexcluded, and (2) non-ignorable bias with respect to the pa-rameters γ n if the ties are averaged over. For ﬁnite samplingson a normal distribution of errors though, it naturally followsthat γ n → γ m converg-ing to 0 in the population under the restriction that lim m →∞ + .The Kemeny (1959) metric was constructed to explicitly re-solve the problem of sub-additivity in the presence of ties,and from this metric space, the Kemeny distance functionand a probability density function can be shown to be ex-ponentially related, and in fact may be isometrically embed-ded. This will allow us to characterise the Kemeny distancewithin the Gaussian probability family, which may be shownto asymptotically converge to the same point. Unsurpris-ingly, the Euclidean metric is more informative for Gaussiandata, however the linearity of the Kemeny metric as well asits minimal loss of power in favouring its selection, presentsa convenient means of constructing a linear regression modelspace, without the assumption of the normality of errors andwithout losing the ability to estimate more complex terms asis otherwise typically observed.When we know the distribution in which we are interestedin modelling (i.e., y ) the introduction of a such a completemetric, measuring the distance between our predictions (theestimands ˆ y ) and the true values y , is the deﬁnition of a max-imum likelihood estimator, solely characterised by the min-imisation of (cid:15) for which γ m tends to zero as the samplebecomes the population, and (cid:15) → ε when the regressionmodel is correctly speciﬁed. The ability to leverage the inner-product space deﬁned by the Euclidean norm enables a min-imisation procedure of approximation for which γ n stronglytends towards 0, and therefore so does γ m · γ n =

0. Conse-quently assuming that we have representative sampling uponthe population, the function we learn upon our data is verystable, independent of the speciﬁc objects recorded in ourdata: of course, γ m > | m | < ∞ + . Therefore,these approximations are imperfect, but these imperfections do not compromise the estimation of the sample relations,merely the inductive capacity to link the understandings inour sample to the larger population with an unknown ‘truth’.As stated, these techniques are valid for Euclidean spaces:however, knowing the appropriate transformations to estab-lish additivity to allow the decomposition of the (cid:15) is muchmore di ﬃ cult. If γ n > , our ability to learn relations whichare approximately correct is suddenly a ﬀ ected by every otheruncertainty in the sample, under a model which asserts theseterms are correctly ﬁxed to zero. This increases the distancebetween our ability to ﬁt a sample’s data, and our ability tounderstand a population, with the uniqueness of the likeli-hood function weakened and the sharpness of the convexitydiminished as well. When well-posed, all bias in the pop-ulation is 0, and therefore the correct modelling structure issolvable to produce a unique model solution. However, forany bias which is non-zero, the distance between the sampleerror (cid:15) and the true Bayes error ε grows as the uniquenessof our induced relationships (i.e. ,the existence of a unique‘truth’) is only established under the axiomatic veracity ofsaid conjecture, the generalisability of all our interpretationsis unknowingly compromised.If we take Box (1976) in earnest, then (cid:15) (cid:44) ε , equiva-lent to stating that bias γ T > γ m , and that thepieces of the bias all interact together to deform the holo-morpohic mapping onto our parameter space upon our co-variates. Of course, the incorrect application of a non-sub-additive metric introduces a positive third term in equation 2,in which the function learned is inseparably by a singularregularity criteria of error minimisation wrt the unique sam-ple. Our ability to replicate interpretability across indepen-dent samplings without merely relying upon a weakly consis-tent cheat, is arguably a reason behind the replication crisisin the Social Sciences (Wald, 1949; White, 1982), since or-dinal scales are certainly not continuous, let alone normallydistributed even upon a population. Therefore, the likeli-hood tests and partial Wald tests may be presumed to notbe strongly consistent under the conditions which they arecommonly published if the normality of errors is false aswell (Wald, 1949; Le Cam, 1953). Weak consistency (un-der which γ m → m ≈ ∞ + ) is an undesirablesolution, since it requires the researchers to accurately char-acterise the function we are approximating only when thepopulation is exhaustively sampled, which defeats the pur-pose of inductive argumentation in favour of description, andis therefore meaningless unless the population may be accu-rately collected. It should also be noted that the utilisationof meta-analysis does not pose an adequate solution, sincethe bias in the multiple levels amongst a collection of studiesis typically not resolved, nor is it clearly addressed that theestimates themselves are biased. However, this presumption HURLEY remains the current default for non-normality in the use ofboth Kendall’s τ b and Spearman’s ρ .Consider a data sampling process which produces an ( m ×

1) vector y = ( y , y , · · · , y m ) (cid:124) of observable real numbers, y ⊂ R m . Said data is immutably capable of describing, withprobability 1, the data in itself, the sample. However thereexists no descriptive capacity to address anything beyond it-self: no inductive inferences concerning the characteristicsof either DSP or data generating process (DGP) are possible(Solomono ﬀ , 1964). Functional data analysis is a processby which we may approximate upon an unknown functionspace, and a framework allowing us the ability to predict,within our sample. In Statistics, we are often presented withsuch an unknown data generating function drawn upon a ﬁ-nite sample, which we must approximate in an attempt to un-derstand the population. The identiﬁcation of a speciﬁc errorstructure (a parametric probability distribution family; pdf)which, conditionally, allows us to linearly separate a struc-ture of interest (a model space amongst the universe of allpossible identiﬁable parameters, α n ⊂ Ω ) from the completeuncertainty of the system. Traditional solutions of maximumlikelihood (ML) and ordinary least squares (OLS) have lin-earised the (cid:96) -metric space (see equation 4), for certain spe-ciﬁc conditions y = α + α X + (cid:15). (4)We deﬁne the data as independently and identically sam-pled upon a random variable from an unknown joint prob-ability distribution, whose characteristics will be further ex-panded upon. These m independently distributed outcomesupon this endogenous process y , we wish to calculate esti-mates and conduct inference about unknown speciﬁcationsof the relations between X and y . The estimators of focus areupon a single level (constant within the population) vector( α , α n ) ∈ R n + , α ⊂ Ω , wherein Ω denotes the space of allidentiﬁable parameters, and α denotes the intercept. If weview the Euclidean distance as a characterisation of the Pear-son correlation, then it immediately follows that a simple re-gression is another form of said correlation (see equation 4).Consider then an empirical scenario, wherein X ∼ N ( µ, σ )and y ∼ N ( µ, σ (cid:15) ) upon m units. Within such a static (ﬁxed)empirical system, the normal MLE is clearly applicable, anda provable minimum variance estimator. However, considerinstead the same system of n random variables transformedby a copula, wherein the scores upon X are such that min-imising the Euclidean distance no longer satisﬁes the proper-ties of the minimum variance maximum likelihood estimatorwhich converges to an expected error of 0 for the population(by the smoothing theorem or the law of total expectation).The speciﬁc transformations are completely arbitrary, how-ever we assume that they continue to maintain the proper-ties of a complete probabilistic metric space, as per Sklar’stheorem (Schweizer & Sklar, 2005; Menger, 1942). Thisensures, by using a data generating function such as the the generalised partial credit model to link the original scores X → X (cid:48) , that due to the probabilistic mapping, there is noguarantee of satisfying the triangle inequality upon the Eu-clidean topology. This is because distance between any adja-cent ordinal distances are no longer linear with three distinctpoints of origin (i.e., ρ ( x , − ρ ( y , (cid:44) ρ ( x (cid:48) , − ρ ( y (cid:48) , m and n . Therefore, the estimators ˆ α and ˆ σ are notindependent, as required, a contradiction whose resolutionis fundamental for construction of the valid classical t- andF-tests with respect to both ﬁrst order approximations (coef-ﬁcient bias) but more importantly, second order (standard er-ror) bias. The estimator α represents the coe ﬃ cients of vec-tor decomposition of y = X ˆ α = P y = X α + P (cid:15) , from whichfollows that α is a function of P (cid:15) . Simultaneously, the esti-mator ˆ σ is a norm of vector M (cid:15) divided by n, and thus also afunction of M (cid:15) . Now, random variables ( P (cid:15), M (cid:15) ) are jointlynormal as a linear transformation of (cid:15) , and also orthonormalbecause PM (cid:44)

0, which means that P (cid:15) and M (cid:15) are not inde-pendent, and therefore that estimators ˆ α and ˆ σ are also notindependent (Hoe ﬀ ding, 1948). However, given establishedbiases for ﬁnite samples, the minimum variance replicabilityof the Gaussian likelihood function is not a valid presump-tion, entirely consistent with current research ﬁndings. Asthe error cannot be linearly separated from the regularity pa-rameters, it follows that the interaction from equation 1 aspresented in equation 4 is non-zero, from which follows theintroduction of non-zero bias wrt γ n .To address non-normal data, Nelder and Wedderburn(1972) introduced a linking function between the coe ﬃ -cients, α , and the error, ε , which linearised the function toallow the additive decomposition of (cid:15) from y as a function of α X . This still maintains the parametric nature of the approx-imation distribution, such that we may correctly establishsub-additivity upon the parameter space in terms of our ob-jective goal min (cid:15) which solely deﬁnes our learning process(Vapnik, 2013) by an appropriately selected monotonic trans-formation. Non-parametric functional families based upondata ranking (so-called order statistics; (Thurstone, 1927;Lipovetsky, 2007)) which are invariant to the speciﬁc dis-tribution have been a popular alternative resource, seekingto deﬁne relations between relative data orderability, ratherthan the original data scores. However, the deﬁnition of a INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS (cid:96) space has been widely preferable,conditional upon the correct selection of the empirical distri-bution wrt y .Typical order-statistic methods, such as the Kendall τ b and to a lesser extent, the Spearman footrule (Kendall, 1938;Spearman, 1906), rely upon a topological sub-space con-structed upon a symmetric group S m , for which each in-dividual sample realisation is unique. This error structurewas assumed to originate upon an explicitly (and without er-ror) observed continuous random variable, such that P ( x i = x i (cid:48) ) =

0, almost surely, thereby precluding the existence oftwo subjects with di ﬀ erent covariates possessing the samerank (ties). This characterisation excludes many commonempirical measurements in both continuous and discrete em-pirical spaces, and results in a biased functional approxi-mation due to the Heckman selection process which assertsnon-uniform probability of representation within the sam-pling from the population, as a function of the character-istics of each X i , demonstrating the non-ignorability of the γ n · γ m and γ m terms. This follows, as many multivariateand univariate probability distributions are consequently in-capable of being uniquely embedded upon S m for ﬁnite (andgeneralisable) learning as a maximum likelihood problem,due to the lack of identiﬁability with respect to ties. Theseapproaches further preclude higher-order polynomial termsand interactions, a substantive necessity in observational andexperimental research. Moreover, the existence of ties is asubstantially larger problem within the multivariate space, asthey become substantially more frequent. Consider, for in-stance, the Rubin (1971) causal model, whose counterfactualapproach is wholly constructed upon the existence of mul-tivariate surjective mappings onto common points of non-identical multivariate X . The problem of ties (or collisions)in rank and order-statistic methods have been extensively de-tailed (Diaconis, 1988; Lehmann, 2009; Hollander, Wolfe,& Chicken, 2013; Harlow, 2013). However these resolu-tions rely upon the asymptotic weak order convergence wrt m , rather than properly deﬁning a complete compact met-ric space. As interaction terms in the coe ﬃ cient space areties, almost all rank based techniques avoid estimating them,leading to their disuse in common experimental frameworks,which we address here. An unbiased linear metric estimator upon the expandedpermutation space

Complete metrics are an extensively studied ﬁeld in the-oretical statistics and topology (Schechter, 1997), althoughapplied practice often limits use to the (cid:96) , or Euclidean, met-ric space. Non-linear transformations in the form of theGeneralised Linear Model (McCullagh & Nelder, 1989) in-duce linear additive separation between the model and error structures, while satisﬁcing the primal-dual characterisationof error minimisation, from which follows the highly desir-able generalisability of the unique learned patterns upon thesample. We prove that the Kemeny (1959) metric is such alinear metric for any sortable cumulative distribution func-tion which is permutation non-invariant, thereby implyingas a necessary characteristic, the ability to sort distributionalprobability as a direct linear function of the relative orderingin the sample, which grows linearly to become the popula-tion, and is therefore an MLE with minimum variance.A realisation upon either ( X i j , y i ) which map to anon unique collision under either Spearman’s footrule andKendall’s τ b distances and respective correlational measures,fails to satisfy the properties of a complete metric (Fagin, Ku-mar, & Sivakumar, 2003), due to the uncertainty of the sur-jective mapping. For such a common empirical scenario, itfollows that for ﬁnite sample ties, both distances are invalidmaximum likelihood estimators. Said distances are ﬁnitelybiased, due to the correlation which now exists between theerror and the speciﬁc data realisations (Hoe ﬀ ding, 1948), andthus the relative sparseness of the sample space restrictionenables only weak convergence under the weak law of largenumbers. Therefore, the development of a metric topolog-ical distance and corresponding quotient space for all realsupon i ∈ { , . . . , m } , ∀ m < ∞ + across the column rank spacesremains an unavoidably necessity; however such a measurehas been largely neglected (Kemeny, 1959; Diaconis, 1988;Fagin et al., 2003). We present the utility of this Kemenynorm in Figure 1, wherein the relative size of the populationpermutation space is expanded for ﬁve observations, from apopulation of 24 unique permutations upon the sample space,to 256 (Good, 1975).The Kemeny norm is constructed upon a score matrix,which denotes pairwise discretisation across all pairs of ob-served subjects as presented in equation 6 for comparisonupon subjects i and i (cid:48) in the space (cid:16) m (cid:17) . This score matrixdescribes a relative orderability to all other empirical obser-vations, with the simple image of greater than (a), equal to(0), or less than (-a), upon which the ﬁxed constant a = τ , with the adjustment of a valid image for tied ele-ments, which were merely assumed to occur with probabilityalmost surely 0, for continuous non-normal data. However,empirical measure spaces such as ordinal survey responses,which contain ﬁxed ordered sets of possible choices in re-sponse to prompts, are substantially more likely to incur suchties, thereby raising the loss of e ﬃ cient MLE properties toan immediate point of concern; traditional approaches suchas the polychoric correlation (Pearson & Pearson, 1922; Ols-son, 1979; Savalei, 2011) fail to address this issue, as emptycells upon the cross-tabulation of responses (the inverse needto the original assumption of almost no ties) produce unsta-ble approximations of the correlation matrix. Unsurprisingly, HURLEY . . . . . . A monotonically increasing CDF x F n ( x ) . . . . . . A monotonically non−decreasing CDF x F n ( x ) Figure 1

Comparison of the two rank metrics upon the permuta-tion space S n = , in which ties are explicitly avoided, andthen permitted, in order to demonstrate the empiricalpopulation space for unbiased estimators. It is seen thatthe latter, advocated, Kemeny metric is visually more dense,corresponding to faster convergence to the ECDF in thepopulation under the strong law of large numbers. the summation across the columns of the score matrix (equa-tion 6) results in a complete metric, the Kemeny distance( ρ K ), as found in equation 5, (and re-expressed as a bijectivelinear cross-product in Emond and Mason (2002)) ρ K ( A , B ) = (cid:88) i (cid:88) j sign( κ i j − κ i j ) (5) κ ii (cid:48) =  a if f ( x i ) > f ( x i (cid:48) ) − a if f ( x i ) < f ( x i (cid:48) )0 if f ( d i ) = f ( d i (cid:48) ) , (6)for which ρ : R × R (cid:55)→ R , where ρ is the space of pos-sible monotonic metric functions which, smoothly approx-imate the Heaviside function in aggregation across m , pro-ducing the cumulative distribution function. In Table 1, wedemonstrate a repeated random simulation of the bivariatecorrelation of a bivariate Poisson distribution, with a popula-tion correlation of 0 with 100 subjects. It is seen that consis-tent with our hypothesis, the Kemeny correlation does pos-sess the smallest standard deviation under replication, witha minimum 25% greater concentration, and a maximum ra-tio nearly 250 times smaller. This serves to demonstrate ourcontention that numerous alternative metrics are less e ﬃ cientin comparison to our proposed estimator, in both a univariateand multivariate space.The Kemeny metric may be shown to be a continuousspace from Schechter (1997), as it is a complete metric, and further to be strongly convergent wrt m → ∞ , presentinga su ﬃ cient basis equivalent to conventional Banach (1934)-norm spaces (Cauchy-Schwartz convergence is an explicitconsequence upon any complete metric space). When com-bined with an observed empirically observable space mea-sured with the Kemeny metric, the space is closed, complete,and continuous, and therefore compact for ﬁnite m ∈ Z + . Webegin the proof here, with the assumption that any convexcontinuous metric space must be shown to be connected, asdeﬁned upon our population space, which here is the permu-tation space H m , X ∈ R n ≡ H m . We treat the graph of m − m distinct bands to reﬂect the unique distances of the permuta-tions from an arbitrary origin ρ ( u , π ) ∈ H m , for any orderablesequence u i ∈ R for i = { , . . . , m − m } , which may be com-pared to an arbitrary point of origin, π , upon the norm-spaceof H m . However, it is recommended the π = , , . . . , m = I m ,the identity permutation, due to its uniqueness for any per-mutation group upon the sample of m individual units. Theinverse identity permutation, I (cid:48) m may also be utilised, underthe same reasoning.We call this graph of all non-isolated nodes G , with m ( m −

1) unique distance which contains k elements, forwhich k may be computed using a recursive summation ofStirling numbers (Good, 1975). k may be less than or equal to m , denoting the presence of unique real valued measurementsunder equality, and a non-zero probability of ties occurringholds, as k → ∞ ; the restriction that k > S m ⊆ H m , in which all distances are multi-plied by a scalar of 2 for a =

1, obtaining a bijection betweenthe Kendall and Kemeny metrics. This corresponds to thedemonstrated point-wise equality upon the empirical cumu-lative distribution function (ECDF) for each speciﬁc distancein Figure 1. An adjacency swap of distance 1 (transpositionof two rankings) under the Kendall distance now asserts adistance of 2 under the Kemeny distance, as the tied positionis occupied, and then moved past indicating two distinct lo-cations from { ρ K ( u , I m ) , ρ K ( v , I m ) } → { u − v } → { v , u } uponsaid graph. u − v is denotes an equivalence (incomplete, orpartial, ordering) for the two speciﬁc elements on a singlerandom variable X j .The elements in the compact support upon ρ K are con-nected by the underlying commonality of conditional ex-changeability, adjusting for the generating function, therebydeﬁning a residual, conditional upon the su ﬃ cient statisticsas follows from the linearity of the metric. Said linearity en-ables a connected function upon an exhaustive permutationspace H m , as deﬁned with the Kemeny metric, to converge toa normal distribution as the number of bins within which sub-jects’ measurements may be placed grows to equality with INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS Table 1

Comparison of bias and power for 1,500 randomly generated ordinal responses

Statistic Mean St. Dev. Skew Min 25% 75% Max ρ K r − − − − − ρ − − − − − τ b − − − − − r n = { , } andby induction to all ﬁnite sample sets. For n = H m on the graph G are realised ( u , v ) ∈ V ( G ), we wish toidentify a path from u to v termed the ( uv ) − path. If u = v , or u (cid:44) v , on the edges of G , E ( G ), then the distance between thepoints must be either 0 or a , denoting either an isomorphismor adjacency. It is also trivially obvious that for n = u (cid:44) v and uv (cid:51) E ( G ), the transitive property of the seg-ments of the graph which does not contain both ( u , v ) holds.Such a representation may be considered as a neighbourhoodof each endpoint A = { a ∈ V ( G ) | v a ∈ E ( G ) } (7) B = { b ∈ V ( G ) | v b ∈ E ( G ) } (8)and as long as these neighbourhoods possess a non-empty setintersection, a common element w must connect u (cid:55)→ w and w (cid:55)→ v ; ∴ u (cid:55)→ w (cid:55)→ v allowing construction of a continu-ous path from u (cid:55)→ v for any beginning and end point. Theinclusion-exclusion principle operating upon the neighbour-hood of each node guarantees a common neighbour must ex-ist for all points (see 9) upon all such H m graphs, for n ≥ x j ∈ X n , as the graph is con-nected. | A ∩ B | = | A | + | B | − | A ∪ B | (9) = deg( u ) + deg( v ) − | A ∪ B | (10) ≥ n − + n − − ( n −

2) (11) = ∴ | A ∩ B | (cid:44) ∅ (12)The metric is therefore uniquely deﬁned (up to proportionalconstant a ) to be a consistent, bounded distance deﬁning apopulation topological vector space ( Ω , ρ K ) which Kemeny(1959) treated as a functional mapping of the domain X ontoa univariate real number. Said functional follows from thenesting of the image of the score matrix upon X , which maybe expanded to be considered as the design matrix withoutcomplication, within the summation in equation 5, result-ing in the measurement of the metric distance between any two points. The diameter of G for all nodes ( u , v ) ∈ V ( G )are found within the realised ﬁnite countable interval { (cid:52) ρ K ( u , v ) (cid:52) m ( m − } , which is always known and ﬁxedin the sample. As the distance is closed by the existenceof the upper bound for all ﬁnite realisation of m , an imageof the support is both a closed and a bounded set. There-fore, given any two points ( u , v ) ∈ X on H m , the pairwisedistance is continuous and homogeneous for all ﬁnite sub-samples of the universe of populations u ∈ V ( G ) as the sam-ple grows asymptotically under a uniform and independentsampling of all observable permutations in the population bySlutsky’s theorem, assuming a single population is sampled.We have thus shown that the Kemeny metric is a linear con-vex function upon the compact permutation support of H m ,and is therefore continuous. A simple proof by contradictionmay be used to establish distribution over additivity, and itwill then be shown that by homogeneity, the evaluation of ρ = ρ K commutes with multiplication by constant vector α n ,representing the coe ﬃ cient parameters. Assume as a func-tion of α n (for ﬁxed intercept α ) that ρ K ( x α n + y α n , I m ) + α = α n ρ K ( x , I m ) + α n ρ K ( y , I m ) + α , which reduces to α = α ;from this immediately follows that the solution is a linearequation wrt α n for α = α is deﬁned wrt to I m , andtherefore represents the normalised scores for which the cen-tral location is 0 for all variables under analysis: the intro-duction of the non-zero intercept term as an additive constantmerely serves to translate the expectation of the errors in pre-diction as a Cauchy-Schwartz convergent function series un-der the limit as m → ∞ + , demonstrating with probability 1that the linear estimator is unbiased upon the Kemeny metric.We next proceed to prove that the Kemeny metric is un-biased with minimum variance. This also demonstrates theconclusion that the Kendall rank distance is biased for ﬁnitesamples as a direct result of the restriction to S m for conven-tional data collection. However, this may also be seen bynoting that all elements of x ∈ X cover ρ K and therefore X : (cid:83) mi = ( x ) m ↔ Domain( G ), from which follows x m ∩ x m = ∅ ,and therefore demonstrating both the completeness and com-pactness of the metric. Any norm space which establishes aBanach space for which the three properties of a metric must HURLEY also be homogeneous, which provisions several useful prop-erties, including the power-metric property. Kemeny (1959)proved the ﬁrst three properties for a complete metric for ρ K ,however we must also prove αρ ( P , Q ) = ρ ( α · P , Q ) ∀ α (cid:44) wrt ( α, (cid:15) ), the re-gression parameters and its error, as necessary to establishboth addition and multiplication as valid functors. Assumean intercept only regression model, x j = =

1, for which boththe properties of ρ ( α x , I m ) = α ρ ( x , I m ) , ∀ x ∈ R , α ∈ R ,and ρ ( x + y , I m ) = ρ ( x , I m ) + ρ ( y , I m ) , ∀ x ∈ R , y ∈ R . By a ∈ R (see equation 6), the homogeneity property of the Ke-meny metric follows such that ρ K ( aX , I m ) ≡ a · ρ K ( X , I m ), bylinear scaling of the penalty term, which forces the monoidscalar a to always be a ﬁnite non-zero real, but is other-wise unbounded, without a ﬀ ecting the relative ordering ofall numbers. Exhaustive enumeration establishes that for H ,the permutation group with repetition possesses cardinality4 ρ K ( H , I ) ∈ { , , , } , which by substitution of a ≡ { a , , a , a } ∝ a · { , , , } . Simple induc-tion by m +

1, where m is a ﬁnite number, demonstrates thatany axiomatic conditions which hold upon S must also holdupon S m ., i.e., S + S + · · · + S m + S m + . The validity of thisinduction is proven by seeking the equivalence from S thatall elements in the set { S m + } = { S m + S m + } . The cardinal-ity of these two groups was proven in Good (1975), so it isimmediately known that the groups are correctly sized, andalways begin at 0, for the ascending sequence m (since bythe property of indiscernibility, any group of size 1 must beequivalent to itself, and hence ρ K ( I m , I m ) = a · ( m − m ), using the established multiplicity.From these, we see that, under the assumption that there isan element k ∈ S m for m ≥

1, upon which the distance from I m may be calculated according to equation 6 and equation 5,with I m as the origin. As already established, ρ K ( x , I m ), for x ∈ S m is correctly calculated upon the entire group, fromboth the left ( I m ) and the right ( I (cid:48) m ). As there is no ﬁnite num-ber on the reals for which S m is not capable of calculating theKemeny distance, due to the connectedness of G , it is there-fore immediately seen that lim m →∞ + S ( x = aI ) ⊂ S ( ax ) ⊂ , . . . ⊂ , S m − ( ax ) ≤ S m ( ax ) ≡ a ( S ( x = I ) ⊂ aS ( x ) ⊂ , . . . ⊂ , aS m − ( x ) ⊂ aS m − ( x ) ≤ aS m ( x )), by the distributivity of thelinear multiplication. Thus, the Kemeny metric is shown tobe a linear Banach space, and allows utilisation of the powermetric property.From these properties follows a means for consistent esti-mation of linear parameters (by the Cauchy-Schwarz convex-ity of all complete metrics), such as interaction with respectto almost any homogeneous error distribution whose cumu-lative distribution F is monotonically non-decreasing. Fora function F to be monotonically non-decreasing is a com- plementary extension of the typical assumption of a mono-tonically increasing cumulative distribution function (as in(Mann & Whitney, 1947; Cox, 1972)). Under a mono-tonically increasing function, the probability of ordered in-dices or statistics, of the dependent variable are uniquelysortable, such that each of m observations possesses an rel-ative ordering upon the sample with respect to its largest(or smallest) realised value. Assume F ( x ) explicitly char-acterises the space under the cdf with the point-wise in-equality F x i ( t ) < F x i + ( t )), wherein t satisﬁes the proper-ties of the order-statistics which are exchangeable with re-alisations upon the raw observations x as a consequence ofthe unique probability metric, justiﬁed by Sklar’s theorem.A bijective relation therefore exists between the probabilityand empirically observed measure spaces for each individual x i ∀ i ∈ { , , . . . , m } . Under the Kemeny metric, the inequal-ity upon F is replaced with the relation F x i ( t ) ≤ F x i + ( t )),which induces a transformation of the CDF as visualised inFigure 1. This allows ﬁnite sample ﬁrst and second orderconvergence, under the substantive increase in density uponthe realisable sample space. A simple algebraic adjustmentof the Kemeny metric further allows the estimation of a ﬁ-nite sample correlation estimating equation, which will belater shown to also be a minimum variance maximum likeli-hood estimator. This expression is provided in equation 13,which, due to the Banach-norm properties of the Kemenymetric are linearly strongly consistent, unbiased, and invari-ant to monotonic transformations. The Kemeny correlationis later expanded to demonstrate a variance-covariance ma-trix, for which it is shown to enable the estimation of a mul-tivariate linear non-parametric regression, for a parameterspace which includes the introduction of polynomial termsand interactions, an immense improvement over current non-parametric estimators in terms of γ n . r K = − · ρ K ( x j , x j (cid:48) ) m ( m − . (13) U-statistic estimator properties

Let P be an arbitrary family of probability distributionswhich are homogeneous upon the space ( X , ρ K ), restrictedonly such that each distribution P j is a vector of length m composed upon the family of weakly-orderable distributions.Said data is permutation non-invariant (i.e., it is orderable),but includes weak-orderings such that all pairwise elementalcomparisons may be determined to be greater than, lesserthan, or equal to an arbitrary point of origin, with Hermitiansemi-positive deﬁnite distances. As the Kemeny distance iscontinuous and convex metric, the sole restriction to func-tional analysis lies upon the existence of a common and in-dependent generating random probability function for the er-rors. In extension to multivariate distribution, it will not beexpected that the parametric families be identical, but merely INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS P j ∈ P ∀ j = { , . . . , n } . Let λ ( P j ) be a real-valued func-tion deﬁned for P j ∈ P , which is estimable for the observabledata space X , a rectangular matrix of order m × n , upon which X ⊆ X is identically and independently randomly distributed.Further allow λ ( P j ) be an estimable parameter for some in-teger m for which exists an unbiased estimator with property λ ( P ) which by the linearity of the space deﬁned by ρ K pro-duces a symmetric function over all permutations upon therow-space of the data, which may be countably inﬁnite asper the Axiom of Choice. The nature of the Kemeny metricas Borel measurable follows from the ﬁnite countable natureof H m ∀ m as established by Good (1975). The central loca-tion which minimises the distribution of distances upon thepermutation space H m , ρ K ( π j , π ) ∀ π ∈ H m is satisﬁed by themidpoint distance of the unique extrema and its inverse, ex-pressed as a point of distance m ( m − in the population, aboutwhich the distribution of distances is linearly symmetric. Theexpectation of the metric for said unique point of origin isequivalent to the exhaustive symmetric sampling amongst allpermutation points upon ρ K deﬁning the order statistics, andtherefore that the point of symmetry (the ﬁrst moment or theexpectation) is ﬁnite for all ﬁnite sample, m < ∞ s. There-fore, U j upon ρ K produces a linear functional of the expecta-tion, identical to the one-sample Wilcoxon rank-sum statisticupon S m = Ω , thus establishing the expectation of the lin-ear operator as the median. The variance for a ﬁnite meanis deﬁnable by the power metric property of the ultrametric(or any Hilbert) space as a quadratic Taylor expansion aboutthe expectation. The variance of said U-statistic is expressedwherein Ξ j j = Var( h ( x , . . . , x k )), deﬁned for all ﬁnite reali-sations upon x m ∈ R n . We express the variance of univariatevariable x j = ξ j = ξ ( x j ) and as a multivariate set of vari-ables as the diagonal of the n × n matrix Ξ . Consider twosubsets of the population D for which there are exactly k common subjects between two subsets. The distinct choicesfor the construction of both subsets are (cid:16) nm (cid:17)(cid:16) mk (cid:17)(cid:16) n − mm − k (cid:17) ; as theestimate ˆ h is symmetric and independent of the construc-tion of each subset (by the unbiasedness of said estimandupon the metric), it follows that the point of inﬂection in theprobability distribution of the distance function ρ K allows forthe construction of a minimum variance estimator of order n − k and therefore that the estimator amongst the exponen-tial family P ∈ P is n / -consistent for a singular randomvariable. A closed multivariate solution for α n based upon Ξ , the Kemeny metric variance-covariance upon the unionof the design matrix X with n parameters and the dependentvariable y , with row and column n + y is linearly endogenous wrt to the residual (cid:15) . From thismatrix the covariate parameters ( α n ), and their respective n standard errors may be estimated. The n coe ﬃ cients α j for j = { , , . . . , n } may be simultaneously estimated upon thesub-matrix Ξ n , n , expressing the variance-covariance ma- trix of the feature space, along with the residual error vari-ance ξ (cid:15) as the complement of the linear covariance betweenthe dependent variable and regression function, subtractedfrom the total variance along with the intercept α and thevariance of the parameters σ α : α = ν ( y ) − ν ( X ) n α n (14) α n = Ξ − n , n Ξ n + , n (15) ξ (cid:15) = √ m − n − ξ j j ) − · ξ ( α + X α n , y ) , (16) σ α = ( ξ (cid:15) · diag( X (cid:124) X ) − n , n ) . (17)From these established properties for linearly unbiasedestimators, we conclude a valid realisation of the Gauss-Markov theorem, establishing the minimum variance prop-erties of the unique linear model space parameters solved forunder ρ K by the continuity of the metric for a vector spacewhich is of full column order rank, from which is justiﬁedthe application of the Gram-Schmidt solution, for m (cid:29) n .This further allows us to deﬁne not only the correlation as alinear rescaling of the compact Kemeny distance about theexpectation, but also to scale the correlation by the roots ofthe likelihood function, thereby deﬁning both the variancesand covariances by the inner-product scaling of the compactsu ﬃ cient statistic. The cross-product X (cid:124) X requires no ad-ditional computational adjustment, due to the linear additiveand multiplicative equivalence of the parameter space uponthe Kemeny metric, and therefore is validly deﬁned as suchupon this topology as well. Probability distribution of F upon y = f(x) As the Kemeny metric is henceforth deﬁnable in a uniquelinear metric space with established U -statistic propertieswith support f : α X → ≤ R ≤ m − m , a speciﬁc probabil-ity distribution for the population must be deﬁnable as well.As was previously shown, the ﬁrst and second moments ofthe Kemeny metric are linearly expressible in closed form bythe median ( ν ) and dispersion about the median ( ξ ), in lieuof the conventional mean and variance. Since the mediannaturally converges to the mean upon a population, the ﬁniterobustness of the median as the linear convex expectation un-der independent sampling is greatly beneﬁcial, but asymp-totically equivalent, while possessing a breakdown point of50%, consistent with expectations regarding a closed formexpression for the median.We propose that the pdf, given its linear nature, be de-ﬁned as a Gaussian function, which is strongly consistent asa linear function upon the Kemeny metric, and which may be0 HURLEY deﬁned for the population of univariate reals x ∈ R ; | x | = mF ( x j ) = (cid:90) ∞−∞ ξ j · exp (cid:32) − ( x j − ν ) c (cid:33) dx = ν = m m (cid:88) i = ( ρ K ( x , I m )) (19) ξ = m − m (cid:88) i = (cid:18) ρ K ( x i , I m ) − m − m (cid:19) (20)for all ﬁnite orderable distributions of reals of length m . Thisimplies that a multinomial distribution is explicitly not well-posed, but any monotonic partial ordering would be. Fromthis perspective, we may construct a robust Fisher z-scoredistribution wherein the expected value ν is 0 and the scale ξ = Probabilistic MLE

It is assumed, for any MLE, that any real parameter on theinterior of the parameter space, ˙ α = ( α n ∪ α ) ∈ Ω , possessesa cumulative distribution function, for which also exists aprobability distribution function f ( x i ; ˙ α ) for random variable x i associated with the i th empirical realisation within a study.We further assume that F ( x j , ˙ α ) is either discrete for all ˙ α orabsolutely continuous for all ˙ α .Under a linear pdf, an estimator ˆ˙ α is regularly and linearlyobtained to satisfy the likelihood score, or estimating equa-tion which equals 0 and is invariant to logarithmic transfor-mation, possessing a unique and singular optima. Said op-tima is deﬁned by selection of the minimal su ﬃ cient statis-tics as equivalent to the already provided closed form ex-pressions of ν and ξ , for which the error term then becomesthe object of minimisation wrt the joint set α = { α , α n } .Said property is established by demonstration that the valueof su ﬃ cient statistic z ∀ T ( z ) may be consistently estimated,once the loss function L ( ˙ α | X ) is known. The likelihood func-tion for ˙ α which arises under the assumption of independentand full rank data which is linearly and independently dis-tributed in multivariate ﬁeld of dimension n controlling forthe sample estimates ˆ nu | X and ˆ Ξ | X . To determine for each m the most likely estimate ( wrt α ) or corresponding m predic- tions ( wrt y ), we must demonstrate convergence with proba-bility 1 to the local optima in the interior of the parameterspace which individually characterise ˙ α , and which is essen-tially unique (Perlman, 1983) under certain well-establishedconditions. We will demonstrate that our characterisationupon the space ( X , y , α ; ρ K ) satisﬁes under the Kemeny met-ric for a matrix of su ﬃ cient order these requirements, suchthat even when the likelihood itself may be unbounded orthe Bayes error may not coincide with zero, our proposalwill remain a consistent estimator of ˙ α . The Fisher ex-pected information matrix about the interior parameter set˙ α is deﬁned such that I ( ˙ α ) = E ˙ α { S ( ˙ α X ) S (cid:124) ( ˙ α ) X } for which S ( ˙ α ; x i ) = ∂ log L ( ˙ α ) /∂ ˙ α is the gradient score statistic of thelog-likelihood, and x j = ( x , · · · , x m ) is the j th column vec-tor of X from which the Fisher information is estimated, andfrom which follows the standard errors (SE),SE( ˆ˙ α r ) ≈ (cid:0) I − ( ˆ˙ α ) (cid:1) rr ( r = , · · · , n + ∈ diag X (cid:124) X (21)as the cross-product of the score statistics for an n + ρ ( X , · ), allpoints α on the interior of the parameter space Ω are well-posed.The uniqueness clearly follows for any ﬁnite selectionupon of ˆ˙ α , wherein for a ﬁxed ﬁnite sample space X ex-ists a function L ( α ; x ) = h ( x ) · f ( x ; α ) deﬁned upon thethree-sequence of partial derivatives of equation 18 wrt α n ,utilising the probability ﬁeld as indicated in equation 18,which is already established to be a linear function, withboth derivatives of the same sign, and thus positive deﬁ-nite. This thereby satisﬁes the selection of ˆ α n , a unique pointwhich maximises the likelihood and minimises of the Ke-meny metric, invariant to logarithmic transformations saidloss function, and the maximisation of the log of this sameequation produce. It therefore immediately follows thatlog { L ( ˙ α ; x , ρ K ) } = l ( ˙ α ; x , ρ K ), characterises a uniquely solv-able score equation of the log of the unbiased linear estimatein terms of the pdf found in the derivative of equation 18 atpoint x . l ( ν, ξ | x ) = − m (cid:0) log 2 π + log ξ (cid:1) − ξ m (cid:88) i = ( x i − ν ) (22)The su ﬃ ciency of the two provided statistics may be es-tablished for both S m and the expanded space H m , wherein m continues to denotes the unique set of permutations re-alisable upon vector x , which may then be established forall m < ∞ by recurrence. Begin upon H = x = { } , forwhich using equations 19 and 20, we produce the estimates ν j = ξ j =

0, respectively. This is immediately observ-ably valid, as there is only one possible permutation upon

INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS | H | then, the population probability is com-plete, integrable to 1, and for (cid:80) x ( f ( x ) = · x ≡ x = ν. The variance expression ξ ( x j ) is identically established, as f ( x ) = · ( x i − ν ) =

0, thereby realising in expectation overthe population, both necessary and su ﬃ cient statistics to sat-isfy our Gaussian characterisation of the probability randomerror. Application of the score function providesˆ ν = ξ m (cid:88) i = ( x i − ν ) = m ξ (cid:0) ˆ ν − ν ) , from which it may be immediately seen that there exists onlyone optima, at 0, with ˆ ν = ν . For the score function taken wrt ξ for known ˆ ν is obtained l ( ν, ξ | x ) = − m ξ + ξ ) m (cid:88) i = ( x i − ν ) = − m ξ ) (cid:16) ξ − m m (cid:88) i = ( x i − ν ) (cid:17) , which given the identity ν ( x ) = E ( x ), follows ξ ( x ) = m m (cid:88) i = ( x i − ˆ ν ) . Said estimators are biased upon ﬁnite samples, in that nei-ther ν , nor ξ are typically known, and therefore a reductionin the degrees of freedom are necessary to correct for ﬁnitesamples. The variance estimator may be corrected upon ﬁ-nite samples by substitution of m − m , as with m − m for the median. Therefore, both necessary and suf-ﬁcient statistics are producible from these two estimatingscore equations, for which the minimal su ﬃ cient conditionsare thereby satisﬁed. The variance of the parameters, andthe construction of the Fisher information matrix, is therebyproduced by taking the derivatives for each equation wrt thetarget parameters, as already well established.As a linear function space, the asymptotic sampling dis-tribution of the MLE ˆ α ( x ) is expected to be (multivariate)normally distributed, with expectation α and variance I ( α ) − wherein the established properties of the linear metric spaceensures that a quadratic approximation of I ( α ) is su ﬃ cientas the sum of orthonormal variances (as previously utilisedto demonstrate the bias of the (cid:96) -norm MLE) as calculatedusing equation 20. For a linear regression upon the Kemenyspace, is a linear equation y i = α + α n X i + (cid:15) i , where (cid:15) i is distributed as an independent normal distributionwith median 0 and and unknown error variance, as previouslyestablished for elements i = , · · · , m in the sample. There-fore, the joint density for (cid:15) i upon ( X , y ; ρ K ) is as follows, as speciﬁed according to the likelihood equation1 (cid:112) πξ exp − (cid:16) (cid:15) i = ξ · (cid:112) pi ξ (cid:17) · exp − (cid:16) (cid:15) i = ξ · (cid:112) pi ξ (cid:17) ·· · · (cid:112) pi ξ exp − (cid:16) (cid:15) m ξ (cid:17) = (cid:112) (2 πξ ) m exp − ξ m (cid:88) i = (cid:15) i . By substituting (cid:15) i = y i − ( α + α n X i ) , the likelihood function istherefore L ( α , α n , ξ | y , x ) = √ (2 πξ ) m exp − ξ (cid:80) mi = ( y i − ( α + α n x )) , from which follows the score function l ( α , α n , ξ | y , x ) = − m π ) + log( ξ )) − ξ m (cid:88) i = ( y i − ( α + α n x i )) . Consequently, it is seen that optimising the likelihood func-tion for the parameter space α is equivalent to minimisingthe residual sum of squares as previously deﬁned, with n + m − ( n −

1) residual degreesof freedom for any well-posed regression scenario with anerror distribution which is monotonically non-decreasing.It should be noted that in a regression problem for ﬁnitesamples, both the expectation and the expected dispersion(i.e., the median and the variances and covariances) are un-known. Therefore, it is recommended that Students’ t statis-tics be employed, which is reasonable since the sharp con-vexity previously established for ﬁnite samples ensures thatthe sample mean and variances are both orthonormal andstrongly consistent, thereby precluding the typical necessityfor asymptotic weak convergence, as typically utilised forthe Wilcoxon rank-sum statistic, the Kendall’s τ b , and theTheil-Kendall-Sen non-parametric estimators when one ormore ties occurs. The Hessian matrix is as follows, estab-lished separately for the parameters ˙ α and the residual sumof squares, for which a second derivative must each be takenupon the likelihood equations previously provided. Theseresult in the estimators H = ∂ l ∂ ˙ α∂ ˙ α (cid:124) =  − X j X j ξ (cid:15) − X (cid:124) (cid:15)ξ (cid:15) − (cid:15) (cid:124) X ξ (cid:15) m ξ (cid:15) − ξ (cid:124) (cid:15) ξ (cid:15) ξ (cid:15) .  (23)The expectation of H ( ˙ α ) follows, under which the Gauss-Markov assumptions cancel out the covariances between ξ (cid:124) (cid:15) and X on the o ﬀ -diagonals. The expectation of the modelvariance (which is a ﬁxed constant) is equivalent to theclosed form expression already provided, therefore estab-lishing asymptotic maximum likelihood estimator candidacy.2 HURLEY

The second term, concerning the error variance may be re-duced E ( H ) = ∂ l ∂ ˙ α∂ ˙ α (cid:124) = I ( ˙ α ) − =  − X j X j ξ (cid:15) − m ξ (cid:15)  (24)and from which follows the typical relation estimator as pre-viously established for linear estimators upon this topologi-cal manifold. Thus the Information matrix can be seen to beequivalent to the closed form expressions already providedfor the ‘least squares’ estimator upon the Kemeny metricspace. The Cramer-Rao lower bound is further shown to besatisﬁed, var( ˙ α ) ≥ (cid:16) − E ( H ( ˙ α )) (cid:17) − , and is in fact equivalentto the formulations provided in equation 24, thereby estab-lishing that this presents a minimum variance estimator, andtherefore an e ﬃ cient MLE in the Gauss-Markov satisfactionof the ﬁrst and second order consistency. Geometric perspective upon a non-parametric interac-tion

The idea of main e ﬀ ects does not necessarily guaranteethat a collision upon the covariate space will occur, howeverthe multiplication of two features, especially in the commonproblem of ﬁnite observation spaces (demographics or trialconditions) necessitates that ties will occur. We demonstratethat the Kemeny metric maintains its maximum likelihoodproperties in a linear framework while allowing for inter-actions to be estimated. This is a compelling improvementover all other current employed non-parametric techniques,as it allows for the estimation of linearly ﬁrst and secondorder consistent interactions without a need to conduct sub-study stratiﬁcation. A brief simulation study was conductedto demonstrate the superiority of the method proposed in thismanuscript in comparison to established alternatives and tra-ditional OLS regression, and the results are provided in Ta-ble 2. Also of note is the ﬁnding reported in Table 2, thatthe standard deviation of the estimated parameters is nearlyequivalent for all parameters with the closed form expres-sion, as expected for a linear estimator.A cursory inspection of Figure 2 reveals the distributionof all coe ﬃ cients under the Kemeny metric to be both nor-mally distributed and less dispersed (i.e., more informative)compared to the alternative formulations, as expected of aminimum variance estimator. The predominant componentof the calculation of the asymptotic standard errors, the sumof squared errors, is geometrically identical to the Kemenydistance between the regression predictions and the target.It also immediately follows from this geometric equivalencethat the Kemeny metric is, both empirically and theoretically,a more powerful estimator for any cumulative distribution Table 2

Estimation of interaction parameters over 2,500 jackkniferesamplings under Kloke (2009), Kemeny (1957), and OLSmetrics with mean bootstrapped linear standard errorsreported as ˜ α , indicating expected higher e ﬃ ciency of theparametric standard errors Statistic Mean St. Dev. Min 25% 75% Max ˜ α (Intercept) 0.231 0.082 − − − − − − − − − − − − − − − − − − − − − − − − − − function upon homogeneous but non-Gaussian data samples.This is of course not true for instances in which a proper (cid:96) contraction may be imposed, however this typically inducesa non-linearity, whereas our approach is linear. Computingthe distance between the projection and the target, divided bythe residual degrees of freedoms, provides the mean squarederrors, as would be expected for any linear functional ba-sis. The product of the MSE by the individual parametercross-product, produces the standard errors as a simple lin-ear function. Further, all matrix multiplications which resultin a n × n product are supplanted by the covariance matrixunder the Kemeny distance, enabling an exhaustive imple-mentation of a purely ordinal linear framework. This ensuresboth primal and dual characterisations of a maximum like-lihood estimator as theorised, without the loss of generalityinduced by norming a continuous score function. This is be-cause the units of the Kemeny distance are invariant over alldata spaces under the strong consistency by the Glivenko-Cantelli theorem of both ECDF → CDF and ˆ F → F .We also must demonstrate that the standard errors are(second-order) minimised with respect to the ﬁxed datainput into the model. To address this, we utilise theAnscombe (1973) dataset with all appropriate bivariate pair-ings, which are bootstrapped with replacement to produce15,500 datasets of 550 subject measurements, for each pair-ing. We would expect under valid characterisations, that thestandard deviations of the parametric formulas to be smallerthan the corresponding bootstrapped estimates, unless theparametric assumptions were violated, and to otherwise ap-proach a ratio of 1 between each of the two metric spaces.The substantively ﬁnding reﬂects the approximately constantscaling di ﬀ erence in the four bivariate data sets between thebootstrapped and formula estimated standard errors. Com- INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS −0.2 0.0 0.2 0.4 Distribution of Intercept estimates

N = 2500 Bandwidth = 0.01033 D en s i t y KlokeKemenyOLS −0.1 0.0 0.1 0.2 0.3 0.4 0.5

Distribution of Sociological estimates

N = 2500 Bandwidth = 0.01703 D en s i t y KlokeKemenyOLS −0.8 −0.6 −0.4 −0.2 0.0

Distribution of Gender estimates

N = 2500 Bandwidth = 0.02362 D en s i t y KlokeKemenyOLS−0.5 0.0 0.5 1.0

Distribution of Age estimates

N = 2500 Bandwidth = 0.0347 D en s i t y KlokeKemenyOLS −2 −1 0 1 2

Distribution of Interaction estimates

N = 2500 Bandwidth = 0.008183 D en s i t y KlokeKemenyOLS −3 −2 −1 0 1 2 3

Normal Q−Q Plot

Theoretical Quantiles S a m p l e Q uan t il e s Figure 2

Mardia distributional examination of regularity parameters with interaction pared to the OLS formulation of the ﬁrst and second orderstatistics (coe ﬃ cients and standard errors, respectively, ofthe intercept and regression slope) which possesses a ratioapproaching 1 ( min f or ( x , y ); the bivariate Gaussiandistribution) between the bootstrapped standard deviationsof the empirical coe ﬃ cient distribution and mean standarderrors as expected for the constant (but heavily biased) cor-relation of ˆ r = . ﬃ cient.However, as stated, this reported values in Table 3 greatermagnitude (by a constant scaling) is expected, as the sum-of-squared-errors are in fact non-constant and does not af-fect the comparison of the relative variability. In Table 3the standard errors show that the only approximately unitscaling between the two estimates are found for the solecase in which the Kemeny metric is invalidly applied, i.e.,the quadratic function approximated by a single slope upon( x , y ). For all other cases, the standard errors are consis-tently smaller (approximately 92%), as would be expectedfor the scenario in which the parametric error family wascorrectly derived. More interestingly though is the compari-son of ( x , y ; ρ E , ρ K ), which is the only instance of an unbi- ased estimator upon the Anscombe dataset. Here, presentedin the ﬁrst results for Table 3, the estimated standard errorsare approximately twice the size of the empirically estimatedstandard deviation, whereas for ( x , y [ ρ E ratio isagain approximately 1, however the ratio upon ρ K is nearly5, but under replication, the constant scaling is found to ap-proximate √ m − n , the residual rescaling of the degrees offreedom of the variance. Once this is corrected for, the linearrelative e ﬃ ciency we hypothesised is validated with a ratio ofapproximately 92% upon two or more coe ﬃ cients in a linearsystem of m equations. This necessary correction however,for ( x , y [ Decomposition of Sums-of-Squares

A ﬁnal connection for any theory of a general linear modelrequires the demonstration of decomposition by sums ofsquares. As such a decomposition upon a centred data set X m × n in the form of a square X (cid:124) X or XX (cid:124) matrix provides4 HURLEY

Table 3

Unscaled estimation of the Kemeny based non-parametric bootstrapping of the coe ﬃ cients ( α , α ) and unscaled standarderrors (multiply by ( √ m − n ) − to correct) across 15,500 datasets constructed from 550 bivariate elements resampled fromthe Anscombe dataset. mean sd median min max range skew kurtosis( x , y ) α α σ ( α ) 0.2624 0.0083 0.2622 0.2337 0.2972 0.0635 0.1384 0.1517 σ ( α ) 0.0275 0.0007 0.0275 0.0247 0.0307 0.0060 0.0413 0.0975( x , y ) α α σ ( α ) 0.2962 0.0101 0.2957 0.2615 0.3588 0.0973 1.3671 5.9482 σ ( α ) 0.0311 0.0009 0.0310 0.0281 0.0371 0.0090 1.7509 8.1898( x , y ) α α σ ( α ) 0.0915 0.0118 0.0912 0.0542 0.1409 0.0867 0.1885 0.0643 σ ( α ) 0.0096 0.0011 0.0096 0.0059 0.0140 0.0081 0.1324 0.0234( x , y ) α α σ ( α ) 0.3509 0.0035 0.3508 0.3279 0.3771 0.0492 -0.8226 7.5510 σ ( α ) 0.0368 0.0009 0.0367 0.0326 0.0428 0.0102 0.4238 1.3680 the same fundamental information necessary to produce, interms of the design matrix, the total sum of squares and crossproducts. As previously stated, the Sums of Squared Errors(SSE) and the Mean-Squared Error (MSE) terms are alreadydeﬁned as the distance and expected distance between thepredictions and the target. The intercept only model in turndeﬁnes the origin of the coe ﬃ cient parameter space as themedian, about which the regression terms minimise the dis-tance. Thus, each operation with respect to ˆ α upon the lin-ear convex Kemeny distance is expected to minimise the dis-crepancy in ordination of the prediction set ρ K ( y , ˆ y ). Whilewe will not exhaustively address comparisons with alterna-tive non-parametric ANOVA models (e.g., (Rizzo & Székely,2010)), it should be immediately obvious to the reader thatas a function of removal of the free parameter p upon theMinkowski distance, our proposed method is superior, re-quiring fewer parameters and introducing less bias in orderto establish the linearity between the decomposed sum-of-squares and the target. We demonstrate that the one-wayANOVA decomposition under DISCO (Rizzo & Székely,2010) and Kloke, McKean, and Rashid (2009) are both out-performed by the Kemeny metric, and that our method is fur-ther less restrictive in the coe ﬃ cient space in that it allowsinteractions. We explore this by repeated jackknife resam-pling upon the the Warpbreaks data set (R Core Team, 2020)for an ANOVA based decomposition of Breakage counts pertension (a three level factor; L, M, and H) and wool type(binomial levels, A or B), treating tension as both an or-dered factor, and a multinomial distribution. The dependentmeasure in the data set is a total count of warp breaks perstandardised length of yarn measure, well represented by aPoisson distribution. These results of the exploratory anal-yses, treating the outcome as Gaussian, are provided in Ta- ble 4, providing demonstration of bootstrapped consistencyand relative stability of all estimates, while providing bothan omnibus and local Wald-testing for each e ﬀ ect under theassumption of a weakly-orderable measurement data space.More interesting though is the recognition that,while the tension variable is an ordered magnitude(Low,Medium,High) which must be categorically codedunder the Frobenius norm, the Kemeny norm allows forthe ordinal nature of the data to assessed as part of thedata. In order to compare this, we assessed the recoded theWarpbreaks design matrix to assess the tension variable andall interactions as single variables, reducing the necessarymodel degrees of freedom to 3 from 5 – the results arepresented in Table 6. The interesting result, presentedhere with the 25,500 bootstrapped replications of the 54subjects with replacement, was that the conclusions are infact similar between the OLS and Kemeny approaches: bothidentiﬁed two of the same elements as signiﬁcant, howeverthe comparative range and standard deviations for thecoe ﬃ cients of determination and the MSEs are substantiallylarger for the Frobenius norm.Provided in Table 5 are the empirical standard deviation ofthe distribution of the Mean Squares, which show that non-normality presents more widely dispersed ANOVA termsthan the proposed Kemeny metric. Interestingly in all casesof the application of a Wald test, the cross-validated resam-pling indicates signiﬁcant non-zero e ﬀ ects which are concor-dant with the conclusions of the Wald tests under the Ke-meny metric space. Comparisons to DISCO analysis (Rizzo& Székely, 2010) provide one-way ANOVA results, how-ever the tests are shown to be less powerful, for Minkowskiadjustment p = .

5, demonstrating that the Kemeny metricprovides a more robust and consistent alternative to the alter-

INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS Table 4 ﬀ ects and Interaction upon ρ K Statistic Mean St. Dev. Min 25% 75% Max ˜ σ ( α )(Intercept) 0.055 0.017 − − − − − − − − − − − − − − − − Table 5

ANOVA Wald test decomposition for ρ K over 2,500cross-validations in comparison to two univariate DISCOanalyses with p = . and traditional ρ E ANOVA

Method Term df MS Min(MS) 25% MS 75% MS Max(MS) ˜ σ ( MS )Kemeny tension 2 872.13 789.25 856.75 888.25 942.25 90.19wool 1 3,437.726 2,699 3,295.8 3,582 4,063 103.79tension:wool 2 833.40 756.75 817.50 849.50 909.75 94.69Error 48 27.098 2.644 18.67 25.24 36.31 2.67DISCO tension 2 127.06 32.40 173.46 318.94 812.00 110.28wool 1 134.25 5.69 56.86 187.18 656.73 100.84Error(tension) 52 27.275 15.53 24.78 29.71 42.27 3.61Error(wool) 53 31.664 18.47 28.45 34.82 46.22 3.56ANOVA tension 2 823.40 119.05 1077.43 2083.32 5233.44 785.63wool 1 768.45 0.02 299.67 1,097.64 4,883.08 605.23tension:wool 2 413.36 0.87 243.59 543.66 1592.54 450.42Error 48 105.810 0.81 1.26 1.55 2.18 16.41 native analyses, for a broader hypothesis space of the modelsthemselves. Table 6

Empirical comparison of the standard errors upon theordinal characterisation of the Tension feature α se ( α ) σ ( α ) t p ( t )(Intercept) 28.1528 1.4780 1.4865 19.0484 0.0000woolB -2.8857 1.4780 1.4786 -1.9525 0.0282tensionM -5.0127 1.8049 2.0120 -2.7773 0.0038tensionH -3.2424 1.0444 0.9212 -3.1046 0.0016woolB:tensionM 5.2573 1.8049 2.0098 2.9129 0.0027woolB:tensionH -0.0000 1.0444 0.9242 -0.0000 0.5000MSE 106.57 21.52(Intercept) 33.2425 2.4557 3.7112 13.5371 0.0000woolB -1.8232 .9085 1.3234 -2.006 0.0251tension -3.7919 1.1441 1.5777 -3.3142 0.0009woolB:tension -1.3541 1.6319 0.9925 -0.8298 0.2053MSE 151.575 18.57 Empirical Demonstrations

We conclude with several empirical data demonstrations,to validate the mathematical fact that the results in the pres-ence of ties of several estimators are inconsistent and under- powered, even in the presence of jackknife resampling. In thepresence of ties, conventional estimators do not possess theproperties of an MLE, which is empirically explored by theapplication of jackknife resampling and statistical compar-isons of the results, with the null hypothesis of comparableperformance expected to produce no observable di ﬀ erencesin the ﬁrst order results of the coe ﬃ cients. The second con-tention, that the application of a non-metric topology resultsin inconsistent (and therefore unrealisable representations ofa population), is explored as well, with the null hypothesis ofperformance less than or equal to that of conventional meth-ods. Thus, it would be found that the proposed methods failto demonstrate any improvement over current methods forscenarios such as normal distributions. These conjectures,if validated, therefore demonstrate that even non-parametrictests do not provide adequate unbiased understanding outsidethe population, when we introduce tied subjects, a point ofpresumed necessity for empirical studies in any ﬁeld. Allsimulations in this section are performed using R v.4.0.3(R Core Team, 2020) and custom written software which isavailable from the authors.We next will compare the Tukey-Siegel, Kendall- τ b andWilcoxon rank-sum performance to the Pearson correlationand the proposed estimator, with respect to the point estimatedi ﬀ erences and empirical variance (power) of each estima-tor. We rely upon the natural relationship between a distancemeasure and a similarity measure as mutual reﬂections uponthe arbitrary, but appropriate, point of origin, and thus es-timations of similar and comparable bivariate relationships.This is a valid assertion, as otherwise the data must be multi-nomial distributed, and result in a substantially divergent per-spective as to the nature of the data, for which all data an-alytic methods are invalid without à priori data processingin the form of binomial coding, and therefore would be ex-pected to perform poorly by deﬁnition. Minimum variability of rank-tests with tiesTable 7

Empirical distribution of relations estimated upon bothjacknife and 7,500 times repeated resampling (CV).

Statistic Mean St. Dev. Min 25% 75% Max ρ K ( x , x ) − − − r ( x , x ) − − − − ρ ( x , x ) − − − − τ ( x , x ) − − − − r − r K ( x , x ) 0.042 0.079 − − r ( x , x ) − − − ρ ( x , x ) − − − − τ ( x , x ) − − − − r -0.134 0.187 − A second brief demonstration with a 14 observation bi-variate data set is presented; these data may be equivalently6

HURLEY viewed as an unpaired Wald t-test, an OLS regression, apoint-biserial correlation, a rank-sum test of di ﬀ erences be-tween groups, and any other functional linear basis analytictechniques – asymptotically, these are all consistent underWilk’s theorem. In e ﬀ ect, it proposes the identiﬁcation ofa test in the shift between two groups’ measurements, fora one unit change in the group membership. Here, we aresolely interested in the distributional properties of the vari-ous estimators under the conceptual meaning of the unbiasedmaximum likelihood estimator. We expect that a correctlyidentiﬁed estimator with appropriately chosen error distribu-tion will possess a global representation of the bivariate rela-tionship, with the point of minimum error in turn being themost informative (smaller standard error) estimate as well.Under these two deﬁnitions of the maximum likelihood esti-mator, the point is made that solving the minimum error ofapproximation must best approximate the population value.We demonstrate that this is not true for any estimator whichis not bivariate normal even with standard ‘non-parametric’testing procedures, due to the non-metric nature of the rankinterpretation with ties. Further, not only does the Kemenycorrelation coe ﬃ cient most accurately approximate the pop-ulation true value, but also appears to do so with the ex-pected smallest distribution of scores, thereby demonstrat-ing that this estimator, for any homogeneous score distribu-tion stochastically dominates not only the Pearson based es-timators (for scores which are approximately normally dis-tributed) upon non-Gaussian spaces but also the bivariatenon-parametric tests as well. This provides support for ourconjecture that, in the presence of ties, even ‘tie-resolving’estimators for both Spearman ρ and Kendall τ b , are biasedestimators of the population relationship as tested within aNull Hypothesis framework. Real world VHA dataset

A data set from a Veteran A ﬀ air O ﬃ ce of Research andDevelopment study approved by the VA Central InstitutionalReview Board (CIRB) is utilised to demonstrate both ﬁniteand bootstrap comparative performance upon an endogenousordinal summative score in evaluating randomised treatmentoutcomes for m =

594 Major Depressive Disorder (MDD)complete case patients. All participants provided written in-formed consent and privacy authorization. This data waspreviously reported in Zisook et al. (2019) and the non-normality of the data was addressed with a Cox survivalmodel of time to remission, where remission was deﬁned asa quantised decrease of at least a ﬁxed reduction in depres-sion score. Here we instead provide a method of quantitativeassessment of the reduction in overall depressive impairmentover time for the three treatment strata, which is comparedto the an equivalent model estimated under OLS. This allowsfor explicit shifts in the quantity of depressive impairment tobe assessed as a continuous measure without explicit para- metric assumptions beyond the existence of a suitable cu-mulative distribution function. However, the non-normalityof the response variable is unresolved by both deﬁnition (aﬁnite set of responses demonstrates only weakly consistentnormality in the population) and in overall ECDF. As previ-ously reviewed, traditional alternative techniques explicitlypreclude the estimation of substantive parameters of explicitinterest, such as interactions, resulting in choices in modelselection favouring the general linear model, in spite of itsinappropriateness. We demonstrate here how these conclu-sions are inconsistent with the data learning process, result-ing in both false positives and false negatives, for both thecomplete cases sample and a bootstrapped resampling withreplacement of 15,500 cases of 594 subjects for each sample.The participants here were VHA patients aged 18 yearsor older who were both diagnosed and treated as an MDDpatient. Diagnostic eligibility was supplemented with the 9-Item Patient Health Questionnaire, an ordered scale with sup-port [0 , ﬀ erence of 14.83.Controlling for ancillary baseline covariates, it was de-sired to assess the change in score at week twelve upon theEuroQOL, with explicit focus being upon the interaction ofsocial support (a categorical 4 level variable) and treatmentapproach arm. The hypothesis was held that greater avail-ability of social support would present di ﬀ erential changes inthe treatment e ﬃ cacy, and further that these ﬁndings wouldnot be found under the traditional use of an OLS regression.The results of the model are presented in Table 8, compar-ing the two techniques. It should be noted that if the nor-mality assumption was met, all statistics would be expectedto replicate with greater stability for the OLS model, andwould present with smaller standard errors than those foundunder the methods proposed in this paper. However, as weseen from the disjoint ﬁndings between the cross-validationresampling performed for both models, while the OLS ap-proach does imply a higher coe ﬃ cient of determination andtherefore better ﬁt, the resampling demonstrates that thisvalue is biased, resulting in a lower estimated MSE comparedto the true MSE, and therefore also draws concerns wrt thepartial Wald tests, which may be invalidly over-powered as aresult.As was hypothesised, the order-statistic based techniquedemonstrated greater power on the non-normally distributeddata, and further, these ﬁndings replicated with greater suc-cess than the cross-validation performed upon under OLS re-gression. It was found that the non-parametric linear basiswas a better substantive average of all observations, satisfy- INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS Table 8

VHA Data example, conducted upon 594 subjects; ( F , = . , R = . σ R = . ρ E ) and ( F , = . , R = . σ R = . ρ K ) , for 15,500 cross-validated resamplings. OLSOLS Coe ﬃ cients OLS Standard Errorsmean sd min max range mean sd min max range(Intercept) 55.0681 4.7306 35.9979 72.6146 36.6167 4.704 0.2188 3.8504 5.5423 1.6919AGE 0.0039 0.0729 -0.2661 0.3009 0.567 0.068 0.0029 0.0583 0.082 0.0238TrtcodeB -1.1438 2.5716 -10.9435 8.6931 19.6366 2.5958 0.1298 2.1816 3.1298 0.9481TrtcodeC -0.6499 2.4634 -10.8852 9.4516 20.3368 2.6167 0.132 2.1693 3.1337 0.9644marital_status1 -2.5109 2.9317 -13.8083 9.161 22.9693 2.9556 0.1561 2.3653 3.8533 1.488marital_status2 -0.9593 3.781 -18.3705 13.344 31.7145 3.966 0.3353 3.0119 6.4659 3.454marital_status3 -2.4362 6.3579 -36.5888 22.6965 59.2853 6.5796 1.3334 3.9924 17.8907 13.8983race1 0.6867 1.7443 -6.6847 7.0143 13.699 1.7369 0.0779 1.4769 2.1382 0.6613race2 1.3162 2.6922 -12.2875 12.2162 24.5037 2.6354 0.1898 2.0273 3.6687 1.6414education 0.9011 0.5914 -1.399 3.1646 4.5636 0.6038 0.0242 0.5203 0.72 0.1997EUROHLTH_BASE 0.2965 0.0376 0.1389 0.4393 0.3005 0.0336 0.0014 0.0287 0.04 0.0114F20TotalACE -0.021 0.277 -1.2014 0.9195 2.1208 0.2815 0.0121 0.237 0.3318 0.0948CIRSscore -0.4233 0.1739 -1.0077 0.1992 1.2068 0.1561 0.0066 0.1298 0.1835 0.0537TrtcodeB:marital_status1 3.3191 3.8553 -10.5904 17.7299 28.3203 3.869 0.1603 3.278 4.7756 1.4975TrtcodeC:marital_status1 0.0388 4.0232 -15.8494 15.2499 31.0992 3.8563 0.156 3.2334 4.5409 1.3075TrtcodeB:marital_status2 1.1768 4.9905 -23.1863 24.5952 47.7814 5.6276 0.398 4.5075 7.8708 3.3632TrtcodeC:marital_status2 2.5672 4.9334 -15.2903 24.2221 39.5124 5.359 0.3419 4.3294 7.5138 3.1844TrtcodeB:marital_status3 -1.5576 12.6209 -45.8561 45.1172 90.9733 12.6189 2.809 7.4795 25.3048 17.8252TrtcodeC:marital_status3 4.4462 8.063 -32.2378 39.813 72.0508 9.3286 1.4704 6.1888 20.7778 14.5889KemenyKemeny Coe ﬃ cients Kemeny Standard Errorsmean sd min max range mean sd min max range(Intercept) 58.4679 2.8408 47.1501 70.3194 23.1693 1.3868 0.0516 1.1896 1.638 0.4484AGE 0.0193 0.0404 -0.1465 0.1763 0.3228 0.0207 0.0008 0.0178 0.0245 0.0067TrtcodeB 0.1778 0.9585 -3.7916 3.9957 7.7874 0.8034 0.0298 0.6845 0.9481 0.2635TrtcodeC -0.0677 0.8878 -3.968 3.2428 7.2109 0.8177 0.0305 0.673 0.9653 0.2923marital_status1 -0.969 0.9431 -4.6262 3.3803 8.0065 0.8653 0.0326 0.742 1.065 0.3230marital_status2 1.0144 2.5456 -10.7434 11.0517 21.7952 1.125 0.0463 0.9282 1.6365 0.7083marital_status3 -1.6155 10.4665 -69.0194 57.148 126.1674 1.7262 0.108 1.3734 5.144 3.7706race1 0.4943 1.2988 -4.8352 5.6206 10.4557 0.5573 0.0212 0.4533 0.6584 0.2051race2 1.8235 3.0921 -11.6497 14.3368 25.9865 0.8725 0.0357 0.6624 1.0319 0.3695education 0.771 0.4065 -0.743 2.2454 2.9884 0.1962 0.0076 0.1599 0.2319 0.0719EUROHLTH_BASE 0.2102 0.0253 0.1185 0.3072 0.1887 0.0101 0.0004 0.0086 0.0118 0.0032F20TotalACE 0.0593 0.1798 -0.6474 0.7208 1.3682 0.0881 0.0033 0.073 0.104 0.0310CIRSscore -0.2781 0.0955 -0.67 0.0715 0.7415 0.0473 0.0017 0.0398 0.0558 0.0160TrtcodeB:marital_status1 1.6503 1.9457 -5.3977 8.7065 14.1042 1.1722 0.0428 1.0064 1.3822 0.3758TrtcodeC:marital_status1 0.8088 1.8886 -7.3827 8.5681 15.9508 1.2083 0.0447 1.0313 1.4265 0.3951TrtcodeB:marital_status2 -3.1926 6.2124 -41.8194 22.7819 64.6013 1.665 0.0637 1.394 2.1636 0.7697TrtcodeC:marital_status2 2.2548 5.261 -22.3428 27.3877 49.7305 1.5949 0.0604 1.3289 1.9826 0.6536TrtcodeB:marital_status3 -3.0289 33.3953 -183.8365 189.0291 372.8656 5.501 0.4002 2.3222 7.2848 4.9626TrtcodeC:marital_status3 3.2242 19.7106 -125.7341 178.219 303.953 2.5136 0.1344 2.0154 5.5258 3.5105 ing the replicability assumptions of the model, unlike withthe parametric regression (White, 1982). In particular, whilethe coe ﬃ cients of determination were twice as large for theOLS model, the variability of said statistic under replica-tion was found to be 13 times greater for OLS than for theKemeny regression under jackknife resampling. The esti-mates were conducted upon the correlation matrix for therank-based method, and thus valid comparisons are foundupon the Wald tests, and are proportionally similar betweenthe two models (Krane & McDonald, 1978). These ﬁnd-ings would indicate substantive subjectivity to which unitsremained within the sample, and therefore contraindicatethe conditional exchangeability assumption upon which themaximum likelihood generalisability principles are founded.Moreover, from the perspective of experimental replicability,consistency for OLS for non-normally distributed data would appear to only be satisﬁed under the weak law of large num-bers, and therefore does not present evidence of being a validmaximum likelihood estimator for this data set, as would betheoretically expected for both ﬁrst and second order estima-tor biases. In addition, attention should be drawn to the co-e ﬃ cients of determination estimated for each of the 15,500replicated samplings of the models, as reported in Table 8.There it can be seen that while the sampled R values werein fact always substantially higher (by an average scaling of173), the variability of the estimators upon the bootstrappedsamples was 28 times greater for the Gaussian general linearmodel than for our Kemeny linear model, thereby betrayingthe necessary performance which would be expected upon amaximum likelihood estimator upon a more informative met-ric topology, as identiﬁed under valid linear Gaussian para-metric assumptions.8 HURLEY

We suggest the focus of attention be with respect to thee ﬀ ect TrtcodeB:marital_status2, as this term is explicitly sig-niﬁcant and indicative of a contradictory ﬁnding than un-der OLS assumptions. This demonstrates evidence in favourof the interpretation that married individuals were less de-pressed under treatment B , and further that marriage in gen-eral provided a strongly signiﬁcant positive e ﬀ ect upon theoverall assessment of depression at treatment termination. Inparticular is the di ﬀ erence in MSE between the two models;while OLS does in fact establish a higher predictive accu-racy within sample, the replicability or stability of this pointestimate under bootstrapping is nearly four times greater,whereas while under resampling the median coe ﬃ cient ofdetermination for the two approaches was nearly halved forOLS, it remained nearly constant for the Kemeny method-ology proposed herein. As well, the standard errors arefound to perform exactly as expected, being in most casessmaller and producing Wald test statistics which are con-sistently found to be signiﬁcant, unlike with the OLS ap-proach. The consistency of these results are empiricallydemonstrated with the comparison of the standard deviationsof the comparisons across bootstrapped data sets, which are,again, smaller for the Kemeny metric. Discussion

This paper introduces both a closed form and maxi-mum likelihood solution to an endogenous unbiased lin-ear learning problem upon an ordinal, and more generallynon-parametric, metric topological manifold, with severaldemonstrations of performance with arbitrary data sets. Fur-ther, this metric space was shown to be linear and unbi-ased, possessing MLE properties in the presence and ab-sence of ties, thereby encompassing a much wider breadthof empirical applications in terms of the estimable parame-ters (Mukherjee, 2016; Chatterjee & Mukherjee, 2019). Thisallows for the calculation of indeterminate points (e.g., poly-nomial terms and interactions) for incomplete rankings andsum-of-square decompositions in a similar, but more com-prehensive manner to the Theil-Kendall-Sen estimator in themodel regularity parameter space α ∈ Ω , for which an intu-itive and computationally simple deconstruction may be ap-plied for ﬁnite samples. Pragmatically, this Kemeny distanceestimator has been shown to satisfy the requirements of alinear maximum likelihood estimator in particular for ordi-nal data both in the absence and presence of all non-nominaldata, for which we have derived and empirically shown thatthe local and global model parameters may be assessed un-der the conventionally Wald test framework. Further demon-strated in this text is the information superiority and reducedparametric assumptions necessary to validate the inductivegeneralisation beyond the sample to a homogeneous and po-tentially multivariate population. Explicit comparisons weremade for nested models along with the computational sta- bility in the estimation of a linear and well-posed parame-ter space for joint coe ﬃ cients of determination, as well asthe ability to examine the marginal distributions of the con-ditionally independent parameter space. Over all of theseassertions, we follow through to have demonstrated the rela-tive power gains which are expected to hold under the stronglaw of large numbers, under much weaker parametric con-ditions than those typically associated with maximum likeli-hood principles.Importantly, this embodies an important empirical demon-stration of the conjecture of the replication crisis, in that it isseen that even signiﬁcant empirical ﬁndings upon impropererror distributions fail to allow for replication to validly be astrongly consistent estimator. That is, the variability of repli-cation e ﬀ orts is greater than the standard errors typically es-timated and reported. Since the assumption that a ﬁnite setof ordinal measurements, where the number of responses isgreater than 2, was shown to produce biased estimates underthe Gaussian estimator it questions the widespread validity ofthe assumption of Gaussian assumptions upon the Frobeniusnorm, as is commonly taught at both the undergraduate andgraduate levels of the Social Sciences. Such a considerationis extremely important in the context of the near universalprevalence and reliance upon ordinal measurements in theSocial Sciences, and more generally the failure to correctlyestablish distributions of normal errors upon ﬁnite samplesunder the (cid:96) -space. This mathematical framework demon-strates that the false assumption of the su ﬃ ciency of the Eu-clidean norm upon data spaces can result in scenarios whichare markedly similar to that of the replication crisis of sta-tistical tests, wherein signiﬁcant biased point estimates maybe found, for which the variability is drastically underesti-mated. This follows from the simple fact that in the loss ofMLE properties, the strong convexity and consistency of ourestimators upon a ﬁnite sample are lost, and therefore mayonly be abstracted to a description of the population whenexhaustive sampling has been enacted. Since this is infeasi-ble, the choice of maximum likelihood procedures which areappropriate to our data is recommended. Similar argumentshave been extended to contexts such as latent variable mod-elling and Psychometrics, a popular methodological topicemployed. It has been demonstrated in another manuscriptunder submission that the techniques upon which this pro-cess is constructed allow us to remove the multivariate nor-mality assumption, while maintaining ﬁrst and second or-der consistency, thereby presenting an acceptable maximumlikelihood estimator for ordinal data, which avoids said is-sues described with the polychoric correlation, and addressesthe common failure of the goodness-of-ﬁt statistics uponnon-normal data. Similar work under submission has beenemployed to similarly address both missing data under theexpectation-maximisation algorithm for ﬁnite samples uponthe ( X , ρ K )-space, construct a non-parametric formulation of INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS M -statistic, and to address mixture models under non-parametric feature spaces.ReferencesAnscombe, F. J. (1973). Graphs in statistical analysis. TheAmerican Statistician , (1), 17–21. doi: 10.2307 / Theory of linear operations (pp. 23–32). Elsevier. doi: 10.1016 / s0924-6509(08)70023-0Box, G. E. P. (1976, December). Science and statistics. Jour-nal of the American Statistical Association , (356),791–799. doi: 10.1080 / IEEE Transactions on Information Theory , (6), 3525–3539. doi: 10.1109 / tit.2019.2893911Cox, D. R. (1972). Regression models and life-tables. Jour-nal of the Royal Statistical Society: Series B (Method-ological) , (2), 187–202. doi: 10.1111 / j.2517-6161.1972.tb00899.xDiaconis, P. (1988). Group representations in probabil-ity and statistics (Vol. 11). Institute of MathematicalStatistics, Hayward, CA.Emond, E. J., & Mason, D. W. (2002). A new rank cor-relation coe ﬃ cient with application to the consensusranking problem. Journal of Multi-Criteria DecisionAnalysis , (1), 17–28. doi: 10.1002 / mcda.313Fagin, R., Kumar, R., & Sivakumar, D. (2003). Comparingtop k lists. SIAM Journal on Discrete Mathematics , (1), 134–160. doi: 10.1137 / s0895480102412856Friedman, M. (1937). The use of ranks to avoid the as-sumption of normality implicit in the analysis of vari-ance. Journal of the American Statistical Associa-tion , (200), 675–701. doi: 10.1080 / IEEE Transactions on Infor-mation Theory , (11), 5130–5139. doi: 10.1109 / tit.2008.929943Good, I. J. (1975). The number of orderings of n candidateswhen ties are permitted. Fibonacci Quarterly , (1),11-18. doi: 10.1080 / What if there were no signiﬁcance tests?

Psychology Press. doi: 10.4324 / ﬀ ding, W. (1948). A class of statistics with asymptot-ically normal distribution. The Annals of Mathemat-ical Statistics , (3), 293–325. doi: 10.1214 / aoms / Non-parametric statistical methods (Vol. 751). John Wiley& Sons.James, W., & Stein, C. (1961). Estimation with quadraticloss. In

Proceedings of the fourth berkeley symposiumon mathematical statistics and probability (Vol. 1, pp.361–379).Kemeny, J. G. (1959). Generalized random variables.

Pa-ciﬁc Journal of Mathematics , (4), 1179–1189. doi:10.2140 / pjm.1959.9.1179Kendall, M. G. (1938). A new measure of rank correlation. Biometrika , 81–93. doi: 10.2307 / Journal of theAmerican Statistical Association , (485), 384–390.doi: 10.1198 / jasa.2009.0116Krane, W. R., & McDonald, R. P. (1978). Scale invari-ance and the factor analysis of correlation matrices. British Journal of Mathematical and Statistical Psy-chology , (2), 218–228. doi: 10.1111 / j.2044-8317.1978.tb00586.xLane, S. M. (1971). Categories for the working mathemati-cian . Springer New York. doi: 10.1007 / Journal of Non-parametric Statistics , (4), 397–405. doi: 10.1080 / Mathematical and Computer Modelling , (7-8), 917–926. doi: 10.1016 / j.mcm.2006.09.009Mann, H. B., & Whitney, D. R. (1947). On a test of whetherone of two random variables is stochastically largerthan the other. The Annals of Mathematical Statistics , (1), 50–60. doi: 10.1214 / aoms / Generalized linearmodels (Vol. 37). Boca Raton, FL: CRC Press.Menger, K. (1942). Statistical metrics.

Proceedings of theNational Academy of Sciences , (12), 535–537. doi:10.1073 / pnas.28.12.535Mukherjee, S. (2016). Estimation in exponential familieson permutations. The Annals of Statistics , (2), 853–875. doi: 10.1214 / Journal of the Royal Statis-tical Society. Series A (General) , (3), 370. doi:10.2307 / HURLEY the polychoric correlation coe ﬃ cient. Psychometrika , (4), 443–460. doi: 10.1007 / bf02296207Owen, A. B. (2001). Empirical likelihood . Boca Raton, FL:CRC Press.Pearson, K., & Pearson, E. S. (1922). On polychoric coef-ﬁcients of correlation.

Biometrika , (1-2), 127–156.doi: 10.1093 / biomet / Recent ad-vances in statistics (pp. 339–370). Elsevier. doi:10.1016 / b978-0-12-589320-6.50020-4R Core Team. (2020). R: A language and environmentfor statistical computing [Computer software man-ual]. Vienna, Austria. Retrieved from Redner, R. (1981, January). Note on the consistency of themaximum likelihood estimate for nonidentiﬁable dis-tributions.

The Annals of Statistics , (1), 225–228. doi:10.1214 / aos / TheAnnals of Applied Statistics , (2), 1034–1055. doi:10.1214 / The use of matched sampling and re-gression adjustment in observational studies (Unpub-lished doctoral dissertation). Harvard University.Savalei, V. (2011). What to do about zero frequencycells when estimating polychoric correlations.

Struc-tural Equation Modeling: A Multidisciplinary Jour-nal , (2), 253–273. doi: 10.1080 / Handbook of analysis and its foun-dations . Elsevier. doi: 10.1016 / b978-0-12-622760-4.x5000-6Schweizer, B., & Sklar, A. (2005). Probabilistic metricspaces . Mineola, NY: Dover Publications, Inc.Solomono ﬀ , R. J. (1964). A formal theory of inductive in-ference. part I. Information and Control , , 1–22. doi:10.1016 / S0019-9958(64)90223-2Spearman, C. (1906). ‘Footrule’ For Measuring Correla-tion.

British Journal of Psychology , (1), 89–108. doi:10.1111 / j.2044-8295.1906.tb00174.xThurstone, L. L. (1927). A law of comparative judg-ment. Psychological Review , (4), 273–286. doi:10.1037 / h0070288Vapnik, V. N. (2013). The nature of statistical learningtheory (3rd ed.). New York, NY: Springer Science &Business Media.Wald, A. (1949). Note on the consistency of the maxi-mum likelihood estimate.

The Annals of Mathemat-ical Statistics , (4), 595–601. doi: 10.1214 / aoms / Econometrica , (1), 1. doi:10.2307 / Biometrics Bulletin , (6), 80. doi: 10.2307 / American Journal of Psychiatry , (5), 348–357. doi: 10.1176 / appi.ajp.2018.18091079 Appendix AKullback-Leibler divergence

The Kullback-Leibler divergence is deﬁned for any correctlyposed probability distribution, from which follows a mono-tonically non-increasing primal problem (error minimisa-tion) convergence sequence which is unbiased and convergesto the target when the probability distribution is linear andorthonormal. For any measurable space on the reals which issortable, then, it follows that the Kemeny metric is an unbi-ased estimator, which is linear but less informative than thelogarithmic linearisation undergone by the use of the canon-ical linking function upon the (cid:96) metric. For conditions inwhich the family of distributions are not identiﬁed, the like-lihood function is almost surely maximised in the neighbour-hood of Ω as deﬁned upon the sample as long as the quotientspace is deﬁned (Redner, 1981) and is asymptotically nor-mal as long for each α with su ﬃ ciently small radius in theconcave neighbourhood about the optima, f ( · , α, r ) is mea-surable and the following inequalities are therefore true (cid:90) x log f ∗ ( x , α, r ) d α ∗ < ∞ and (cid:90) x log h ∗ ( x , s ) d θ < ∞ , assuming α ∗ denotes the true parameter set. If true, then as δ ( θ , θ t ) → ∞ + , f ( x , θ i ) → θ t , and isthus satisﬁed for any sub-additive metric topology. Furtherassuming that (cid:90) | log f ( x , θ ) | d u θ < ∞ , and therefore ﬁnite, then if θ i → θ and in turn f ( x , θ ) → f ( x , θ ) for a non-deterministic functional space, the withprobability one the likelihood function converges to the truefunction, which for any linear function is best approximatedby its expectation which minimises the error for all M . Uponany mixture family, which is linear δ ( X , Ω ), then it fol-lows that both extensions to Wald’s theorems (Redner, 1981,p. 226) are satisﬁced upon the Kemeny metric, and there-fore f ( X , ¯ θ m ) → f ( X , ˆ θ ) with probability 1. From Redner INEAR NON-PARAMETRIC REGRESSION, WITH INTERACTIONS

Theorem 5 , it follows that the Kemeny metric, as acompact ultrametric, is strongly consistent for any orderableand independently sampled distribution, including a multi-variate normal distribution. Any distribution for which the (cid:96) metric is complete and compact is also linearly stronglyconsistent for the Kemeny metric, as it is invariant to anymonotonic transformation (non-identity canonical link), andthus the properties generalise without any additional restric-tions. However, the selection of an appropriate non-lineartransformation is subjective and subject to misjudgement dueto idiosyncrasies such as cultural modelling norms, non-uniform sampling between the data and the population, andincomplete data observations. Analysing ordinal data asan endogenous measure space, as opposed to continuousdata, requires fewer structural assumptions in the endoge-nous response to introduce maximum likelihood estimation(Friedman, 1937; Wilcoxon, 1945). However, this must bepaired with the introduction of more assumptions upon thesampling procedure. This follows under the conventionalconstruction of both Spearman’s ρ and Kendall’s τ a , in whichties are explicitly excluded from the space with probability1, from which directly follows by contradiction a tie in anysample upon the population with probability of 0. By thepigeon-hole principle though, ties almost surely are observedfor any ordinal measure space since the number of empiricalpaired orderings is always less than the number of possibleobservations in the sample, as is demonstrated by the con-struction of a polychoric correlation matrix (Olsson, 1979;Pearson & Pearson, 1922).Unfortunately, the polychoric correlation intro-duces an assumption concerning the latent variable linearityupon the (cid:96) metric, and requires greater sample sizes withrespect to g responses upon an ordinal scale (i.e., g cells).Each bivariate relation pairing must possess su ﬃ cient cellsize to just identify each latent variable, requiring at leasttwo parameters; multidimensional latent spaces in turn re-quire more parameters to be estimated, which quickly inﬂatesthe minimum su ﬃ cient sample size. As a continuous distri-bution upon a compact metric is discrete, Gibbs’ inequalitymay be directly leveraged to deﬁne for two any orderable dis-tributions P = { p , · · · , p m } and Q = { q , · · · , q m } for which P , Q ∈ [0 ,

1] with equality only holding when P = Q forthe metric space δ . Since by our previous assumptions, thesample is deﬁned for a ﬁnite support in the ﬁeld [0 , m ( m − / (1 + m − m )in the population, thereby allowing for the Kullback-Leiblerdivergence to be constructed D KL ( P (cid:107) Q ) = m (cid:88) i = p i log p i q i ≥ . If one were to expand the deﬁnition of P to be a function ofparameters α ∈ Ω in the parameter universe, then a Bregman divergence naturally follows F : Ω → R , wherein the ob-served space is strictly non-negative D F ( P , Q ) ≥ F which is linear wrt the endogenous empiricalprocess P for which the expectation is the optimal su ﬃ cientstatistic (Frigyik, Srivastava, & Gupta, 2008).A further proof of uniqueness for any F -norm,which includes any Banach norm-space, may be constructedfrom the axiom of the excluded middle dependent choice,given a proper metric upon any random partially orderableﬁeld. Thus, the property of dependent choice holds upon it,unlike the sub-graph formed by the traditional rank meth-ods of the S m -space (Diaconis, 1988). As a normed space(which includes Banach spaces, F -spaces, and G -spaces) X deﬁned with the topology ( X , ρ K ) is provably both linear andcontinuous with ﬁnite dimensionality n upon the stochasticspace ( (cid:15), α n ). It is therefore a valid dream space, which is ex-plicitly uniquely deﬁned with an inner-product space (Lane,1971). Let X be complete and ρ K be compact as previouslyestablished. It follows then that if Y is any topological vec-tor space and f : X → Y is any linear operator, then f iscontinuous. It also follows that should Ω be an open convexsubset of X , Y is a locally full space, and f : Ω → Y is aconvex operator, then f is continuous. By Schechter (1997,p. 751), any two complete F − norms on a vector space aretopologically equivalent; this is proven as the identity map-ping X → X is a linear operator. Thus proves the unique-ness of the maximum likelihood estimator upon the Kemenymetric, which is also a proper MLE for any partially order-able distribution (Schechter, 1997), as also deﬁned for thelinear function space we have established. Further, sincegiven the Gaussian nature of the probability space is alwaysmonotonically non-decreasing, the Hessian of the likelihoodfunction is always non-negative, and therefore it follows thatthe likelihood function is a linear local maxima for any iden-tically and independently distributed error function which ispartially orderable. Appendix BPositive semi-deﬁniteness of the Kemeny correlation matrix

For m observations upon n measures, the matrix Ξ may beconstructed, as a square matrix of order m × m or n × n , sum-marising pairwise similarity over all subjects or variables, re-spectively. Assume that a square real matrix A is positivesemi-deﬁnite (p.s.d.)when, for any m × x (cid:48) Ax ≥ A and B are p.s.d. upon the space ( X , ρ K ) thenby the established additive and multiplicative properties ofthe Kemeny metric, so is A + B . We construct the followingbounded equivalence x (cid:48) ( A + B ) x = x (cid:48) Ax + x (cid:48) B x ≥

0; fromthis follows the general statement that the sum of any p.s.d.matrices is itself positive semi-deﬁnite.For a sample of vectors x i = ( x i , . . . , x mn ) (cid:124) , with i = , . . . , m , the sample median ˆ ν is estimated as per equa-tion 19 and the sample correlation matrix Ξ as given for each2 HURLEY bivariate pair in equation 13 (with the covariance scaling fol-lowing by the use of equation 20). For the non-zero vector z ∈ R n , it follows that each vector is non-constant, and there-fore has positive variance from which we use equation 20 ξ = ξ (cid:124) ρ K ( x n , x n ) ξ ∝ ξ (cid:124) Ξ ξ > . Allow z i to be deﬁned z i = ( x i − ˆ ν ( x )), for i = , . . . , m .Any non-zero x ∈ R n is therefore equal to zero if and onlyif ρ K ( x (cid:124) i , I m ) =

0, for each i = , . . . , n . Upon the set { z , . . . , z n } spanning R n , there exist real numbers β , . . . , β m such that z = β x + . . . + β m x m , and we also possess z (cid:124) x = α z (cid:124) x + · · · + α n z (cid:124) n x =

0, which induces a contra-diction. It therefore follows that if the span of any randomsampling distribution upon z i spans R n , then Ξ is positivedeﬁnite. Positive semi-deﬁniteness then may be establishedfrom z Ξ n z (cid:48) ≥ z . As Ξ n is 1 / m times the dis-tance of ρ K ( v i , I m ) · ρ K ( v i , I m ), the squared length of vector zv i is m . and as m > ∈ Z and a sum of squares is strictly non-negative, z Ξ n z (cid:48) ≥

0, and thus when v i spans the continuousﬁeld, z = Ξ n is deﬁnite, and therefore any Ξ may beinverted for the marginalised sampling distribution wrt either m or n . Appendix CProof of the probability measure for the Kemeny metric

For a measure space ( Ω , F , P ), where P ( x ∈ X ) denotes ameasure X upon which x occurs with probability P ( x ), forwhich P ( Ω ) =

1, allowing us to deﬁne said measure space asa probability space, with sample space Ω , event space F , andprobability measure P . The ﬁrst Kolmogorov axiom has al-ready been proven, wherein each observed event occurs withpositive probability, for each element in the event space, asseen in equation 9. Concurrently, the ﬁnite and compact na-ture of the Kemeny measure space ensures that P ( x ) is also ﬁnite, with bounded support P ( −∞ ) = ≤ P ( x ) ≤ P ( ∞ ) = . The second axiom, that that the probability that at least oneof the elementary events in the entire sample space will occuris 1, and therefore that Ω is complete, follows from ﬁnite andcumulative nature of the Stirling numbers wrt the Kemenymetric, from which we may deﬁne that for the space | H | = I asa partial ranking, and which therefore must occur with prob-ability 1. If not, then the Identity permutation is not a validorigin for the sample space, and therefore the population isnon-uniquely identiﬁed (as neither the identity permutationor its inverse are validly measurable events). Finally, σ -additivity must be validly present to ensure that a given eventmust occur with unique probability, and is therefore a func-tion. In equation 9, it is shown that the graph of the metricspace is connected, and thus therefore that the Kemeny met-ric space is a function, for which each occurrence is uniquelymapped onto a singular distance with respect to the arbitraryorigin π for the entire space x m . As such, any element in thesample space must be measured upon the probability ﬁeld,which for equation 18 contains all reals. Second, since thecumulative distribution is complete and must integrate to 1over all disjoint events measured upon the Kemeny metricspace, all disjoint subsets must also sum to 1, as shown inequation 9, which, together with the ﬁnite countability of thesymmetric group H m , ensures that the probability measureis always observed with probability 1, thereby satisfying theequality µ (cid:0)(cid:83) ∞ m = A m (cid:1) = (cid:80) ∞ m = µ ( A m ), to ensure σσ