J. A. Hartigan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where J. A. Hartigan is active.

Explore More

Publication

Featured researches published by J. A. Hartigan.

Journal of the American Statistical Association | 1972

Direct Clustering of a Data Matrix

J. A. Hartigan

Abstract Clustering algorithms are now in widespread use for sorting heterogeneous data into homogeneous blocks. If the data consist of a number of variables taking values over a number of cases, these algorithms may be used either to construct clusters of variables (using, say, correlation as a measure of distance between variables) or clusters of cases. This article presents a model, and a technique, for clustering cases and variables simultaneously. The principal advantage in this approach is the direct interpretation of the clusters on the data.

Journal of the American Statistical Association | 1993

A Bayesian Analysis for Change Point Problems

Daniel Barry; J. A. Hartigan

A sequence of observations undergoes sudden changes at unknown times. We model the process by supposing that there is an underlying sequence of parameters partitioned into contiguous blocks of equal parameter values; the beginning of each block is said to be a change point. Observations are then assumed to be independent in different blocks given the sequence of parameters. In a Bayesian analysis it is necessary to give probability distributions to both the change points and the parameters. We use product partition models (Barry and Hartigan 1992), which assume that the probability of any partition is proportional to a product of prior cohesions, one for each block in the partition, and that given the blocks the parameters in different blocks have independent prior distributions. Given the observations a new product partition model holds, with posterior cohesions for the blocks and new independent block posterior distributions for parameters. The product model thus provides a convenient machinery for allo...

Journal of the American Statistical Association | 1981

Representing Points in Many Dimensions by Trees and Castles

B. Kleiner; J. A. Hartigan

Abstract A number of points in k dimensions are displayed by associating with each point a symbol: a drawing of a tree or a castle. All symbols have the same structure derived from a hierarchical clustering algorithm applied to the k variables (dimensions) over all points, but their parts are coded according to the coordinates of each individual point. Trees and castles show general size effects, the change of whole clusters of variables from point to point, trends, and outliers. They are especially appropriate for evaluating the clustering of variables and for observing clusters of points. Their major advantage over earlier attempts to represent multivariate observations (such as profiles, stars, faces, boxes, and Andrewss curves) lies in their matching of relationships between variables to relationships between features of the representing symbol. Several examples are given, including one with 48 variables.

Journal of the American Statistical Association | 1981

Consistency of Single Linkage for High-Density Clusters

J. A. Hartigan

Abstract High-density clusters are defined on a population with density f in r dimensions to be the maximal connected sets of form {x | f(x) ≥ c}. Single-linkage clustering is evaluated for consistency in detecting such high-density clusters—other standard hierarchical techniques, such as average and complete linkage, are hopelessly inconsistent for these clusters. The asymptotic consistency of single linkage closely depends on the percolation problem of Broadbent and Hammersley—if small spheres are removed at random from a solid, at which density of spheres will water begin to flow through the solid? If there is a single critical density such that no flow takes place below a certain density, and flow occurs through a single connected set above that density, then single linkage is consistent in separating high-density clusters (by disjoint single-linkage clusters that include a positive fraction of sample points in the respective clusters and pass arbitrarily close to all points in the respective clusters...

Journal of the American Statistical Association | 1987

Estimation of a convex density contour in two dimensions

J. A. Hartigan

Abstract If a density in two dimensions has a convex contour containing probability α, the contour may be estimated from a sample by finding the convex polygon of smallest area containing a proportion α of the sample points. An algorithm for finding a particular contour is given that takes O(n 2) space and O(n 3) time for n sample points.

Biometrics | 1987

Asynchronous distance between homologous DNA sequences.

Daniel Barry; J. A. Hartigan

The distance between homologous DNA sequences of two species is proposed to be -1/4 ln[det(P)], where P is the conditional probability matrix specifying the proportions of the various nucleotides in the second sequence, corresponding to each of the four nucleotides in the first sequence. A probability model is described which supports this choice of distance. Distance measures based on a constant evolutionary rate assumption are described and compared with the proposed measure. Sampling properties of both types of distance are examined and we conclude by applying the distance measures to mitochondrial DNA sequences of the hominoids.

Journal of Classification | 1992

The runt test for multimodality

J. A. Hartigan; Surya Mohanty

Single linkage clusters on a set of points are the maximal connected sets in a graph constructed by connecting all points closer than a given threshold distance. The complete set of single linkage clusters is obtained from all the graphs constructed using different threshold distances. The set of clusters forms a hierarchical tree, in which each non-singleton cluster divides into two or more subclusters; the runt size for each single linkage cluster is the number of points in its smallest subcluster. The maximum runt size over all single linkage clusters is our proposed test statistic for assessing multimodality. We give significance levels of the test for two null hypotheses, and consider its power against some bimodal alternatives.

Random Structures and Algorithms | 2013

The number of graphs and a random graph with a given degree sequence

Alexander I. Barvinok; J. A. Hartigan

We consider the set of all graphs on n labeled vertices with prescribed degrees D = (d1,…,dn). For a wide class of tame degree sequences D we obtain a computationally efficient asymptotic formula approximating the number of graphs within a relative error which approaches 0 as n grows. As a corollary, we prove that the structure of a random graph with a given tame degree sequence D is well described by a certain maximum entropy matrix computed from D. We also establish an asymptotic formula for the number of bipartite graphs with prescribed degrees of vertices, or, equivalently, for the number of 0-1 matrices with prescribed row and column sums.

Transactions of the American Mathematical Society | 2012

An asymptotic formula for the number of non-negative integer matrices with prescribed row and column sums

Alexander I. Barvinok; J. A. Hartigan

We count mxn non-negative integer matrices (contingency tables) with prescribed row and column sums (margins). For a wide class of smooth margins we establish a computationally efficient asymptotic formula approximating the number of matrices within a relative error which approaches 0 as m and n grow.

Journal of the American Statistical Association | 1993

Choice Models for Predicting Divisional Winners in Major League Baseball

Daniel Barry; J. A. Hartigan

Abstract Major league baseball in the United States is divided into two leagues and four divisions. Each team plays 162 games against teams in the same league. The winner in each division is the team winning the most games of the teams in that division. We wish to predict the division winners based on games played up to any specified time. We use a generalized choice model for the probability of a team winning a particular game that allows for different strengths for each team, different home advantages, and strengths varying randomly with time. Future strengths and the outcomes of future games are simulated using Markov chain sampling. The probability of a particular team winning the division is then estimated by counting the proportion of simulated seasons in which it wins the most games. The method is applied to the 1991 National League season.

Explore More