John C. Gower
Open University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John C. Gower.
Biometrics | 1971
John C. Gower
A general coefficient measuring the similarity between two sampling units is defined. The matrix of similarities between all pairs of sample units is shown to be positive semidefinite (except possibly when there are missing values). This is important for the multidimensional Euclidean representation of the sample and also establishes some inequalities amongst the similarities relating three individuals. The definition is extended to cope with a hierarchy of characters.
Psychometrika | 1975
John C. Gower
SupposePi(i) (i = 1, 2, ...,m, j = 1, 2, ...,n) give the locations ofmn points inp-dimensional space. Collectively these may be regarded asm configurations, or scalings, each ofn points inp-dimensions. The problem is investigated of translating, rotating, reflecting and scaling them configurations to minimize the goodness-of-fit criterion Σi=1m Σi=1n Δ2(Pj(i)Gi), whereGi is the centroid of them pointsPi(i) (i = 1, 2, ...,m). The rotated positions of each configuration may be regarded as individual analyses with the centroid configuration representing a consensus, and this relationship with individual scaling analysis is discussed. A computational technique is given, the results of which can be summarized in analysis of variance form. The special casem = 2 corresponds to Classical Procrustes analysis but the choice of criterion that fits each configuration to the common centroid configuration avoids difficulties that arise when one set is fitted to the other, regarded as fixed.
Biometrics | 1974
John C. Gower
Consider n point populations characterised by v dichotomous variables, constant within populations. For any classification into k classes, a list of valtues (class predictors) can be constructed for the variables of each class, and used to predict the properties of any individual belonging to that class. The maximal predictive criterion selects that partition of the n populations into k classes which maximises the inuimber, Wk, of correct predictions. The average number, Bk, of properties correctly predicted for members of each class using the class predictors of the other k 1 classes, measures the separation between classes. The best choice of k is related to maximising Wk Bk . A general method is given for defining optimal hierarchical classification using any optimal k-class criterion, not necessarily depending on taxonomic distance. Maximal predictive classes have an optimal identification property and other properties, useful for constructing search algorithms, are given. An example illustrates the results. Multilevel qualitative variables and differing probabilities of occurrence for each population are acceptable, but random variation within populations needs further consideration.
Journal of The Royal Statistical Society Series C-applied Statistics | 1999
John C. Gower; W. J. Krzanowski
Many data sets in practice fit a multivariate analysis of variance (MANOVA) structure but are not consonant with MANOVA assumptions. One particular such data set from economics is described. This set has a 24 factorial design with eight variables measured on each individual, but the application of MANOVA seems inadvisable given the highly skewed nature of the data. To establish a basis for analysis, we examine the structure of distance matrices in the presence of a priori grouping of units and show how the total squared distance among the units of a multivariate data set can be partitioned according to the factors of an external classification. The partitioning is exactly analogous to that in the univariate analysis of variance. It therefore provides a framework for the analysis of any data set whose structure conforms to that of MANOVA, but which for various reasons cannot be analysed by this technique. Descriptive aspects of the technique are considered in detail, and inferential questions are tackled via randomization tests. This approach provides a satisfactory analysis of the economics data.
Computational Statistics & Data Analysis | 2006
Gardner S; John C. Gower; N. J. Le Roux
Canonical variate analysis (CVA) is concerned with the analysis of J classes of samples, all described by the same variables. Generalised canonical correlation analysis (GCCA) is concerned with the analysis of K sets of variables, all describing the same samples. A generalised procrustes analysis context is used for data partitioned into J classes of samples and K sets of variables to explore the links between GCCA and CVA. Biplot methodology is used to exploit the visualisation properties of these techniques. This methodology is illustrated by an example of 1425 samples described by three sets of variables (K = 3), the initial analysis of which suggests a grouping of the samples into four classes (J = 4), followed by subsequent more detailed analyses.
Encyclopedia of Biostatistics | 2005
John C. Gower
Principal coordinates analysis, also known as Classical scaling, is a metric multidimensional scaling method based on projection, which uses spectral decomposition to approximate a matrix of distances/dissimilarities by the distances between a set of points in few dimensions. The points may be used in visualizations. Keywords: metric multidimensional scaling; principal component analysis; graphical representation; visualization; similarity; dissimilarity; distance
Computational Statistics & Data Analysis | 2009
Jörg Blasius; Paul H. C. Eilers; John C. Gower
The elements of a biplot are (i) a set of axes representing variables, usually concurrent at the centroid of (ii) a set of points representing samples or cases. The axes are (approximations to) conventional coordinate axes, and therefore may be labelled and calibrated. Especially when there are many points (perhaps several thousand) the whole effect can be very confusing but this may be mitigated by: 1. Giving a density representation of the points. 2. While respecting the calibrations, moving the axes to new positions more remote from the points, and possibly jointly rotating axes and points. 3. The use of colour - when permissible. 4. Choosing more than one centre of concurrency. The principles are quite general but we illustrate them by examples of the Categorical Principal Component Analysis of the responses to questions concerning migration in Germany. This application introduces the additional interest of representing ordered categorical variables by irregularly calibrated axes.
Journal of Classification | 2003
John C. Gower; Mark de Rooij
We examine the use of triadic distances as a basis for multidimensional scaling (MDS). The MDS of triadic distances (MDS3) and a conventional MDS of dyadic distances (MDS2) both give Euclidean representations. Our analysis suggests that MDS2 and MDS3 can be expected to give very similar results, and this is strongly supported by numerical examples. We have concentrated on the perimeter and generalized Euclidean models of triadic distances, both of which are linear transformations of dyadic distances and so might be suspected of explaining our findings; however an MDS3 of the nonlinear variance definition of triadic distance also closely approximated the MDS2 representation. An appendix gives some matrix results that we have found useful and also gives matrix respresentations and alternative derivations of some known properties of triadic distances.
Biometrics | 1990
John C. Gower
SUMMARY Fishers method of optimal scores (FOS) and Guttman scaling are often cited among the origins of multiple correspondence analysis (MCA). It is shown that FOS is more strongly linked to canonical correlation analysis (CCA) and that although Fishers method, as described in Statistical Methods fbr Research Workers, 7th edition, essentially leads to the MCA results, this is something of an accident. If there he had not been concerned with a balanced two-way table, the relationship between the two methods would have been weaker. Similarly, Fisher (1940, Annals of Euigenics 10, 422-429) refers mainly to simple correspondence analysis (CA) and cannot be strongly linked to MCA. Nevertheless Fisher had all the essential elements not only of CA but also of MCA.
Journal of Classification | 2003
Mark de Rooij; John C. Gower
Triadic distances t defined as functions of the Euclidean (dyadic) distances a1, a2, a3 between three points are studied. Special attention is paid to the contours of all points giving the same value of t when a3 is kept constant. These isocontours allow some general comments to be made about the suitability, or not, for practical investigations of certain definitions of triadic distance. We are especially interested in those definitions of triadic distance, designated as canonical, that have optimal properties. An appendix gives some results we have found useful.Triadic distances t defined as functions of the Euclidean (dyadic) distances a1, a2, a3 between three points are studied. Special attention is paid to the contours of all points giving the same value of t when a3 is kept constant. These isocontours allow some general comments to be made about the suitability, or not, for practical investigations of certain definitions of triadic distance. We are especially interested in those definitions of triadic distance, designated as canonical, that have optimal properties. An appendix gives some results we have found useful.