Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Glenn W. Milligan is active.

Publication


Featured researches published by Glenn W. Milligan.


Psychometrika | 1985

An examination of procedures for determining the number of clusters in a data set

Glenn W. Milligan; Martha C. Cooper

A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.


Psychometrika | 1980

An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms.

Glenn W. Milligan

An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.


Applied Psychological Measurement | 1987

Methodology review: Clustering methods

Glenn W. Milligan; Martha C. Cooper

A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research. After an over view of the clustering literature, the clustering process is discussed within a seven-step framework. The four major types of clustering methods can be characterized as hierarchical, partitioning, overlapping, and ordina tion algorithms. The validation of such algorithms re fers to the problem of determining the ability of the methods to recover cluster configurations which are known to exist in the data. Validation approaches in clude mathematical derivations, analyses of empirical datasets, and monte carlo simulation methods. Next, interpretation and inference procedures in cluster anal ysis are discussed. inference procedures involve test ing for significant cluster structure and the problem of determining the number of clusters in the data. The paper concludes with two sets of recommendations. One set deals with topics in clustering that would ben efit from continued research into the methodology. The other set offers recommendations for applied anal yses within the framework of the clustering process.


Journal of Classification | 1988

A Study of Standardization of Variables in Cluster Analysis

Glenn W. Milligan; Martha C. Cooper

A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure. Existing results have been mixed with some studies recommending standardization and others suggesting that it may not be desirable. The existence of numerous approaches to standardization complicates the decision process. The present simulation study examined the standardization problem. A variety of data structures were generated which varied the intercluster spacing and the scales for the variables. The data sets were examined in four different types of error environments. These involved error free data, error perturbed distances, inclusion of outliers, and the addition of random noise dimensions. Recovery of true cluster structure as found by four clustering methods was measured at the correct partition level and at reduced levels of coverage. Results for eight standardization strategies are presented. It was found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure. The result held over different error conditions, separation distances, clustering methods, and coverage levels. The traditionalz-score transformation was found to be less effective in several situations.


Multivariate Behavioral Research | 1986

A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis

Glenn W. Milligan; Martha C. Cooper

Five external criteria were used to evaluate the extent of recovery of the true structure in a hierarchical clustering solution. This was accomplished by comparing the partitions produced by the clustering algorithm with the partition that indicates the true cluster structure known to exist in the data. The five criteria examined were the Rand, the Morey and Agresti adjusted Rand, the Hubert and Arabie adjusted Rand, the Jaccard, and the Fowlkes and Mallows measures. The results of the study indicated that the Hubert and Arabie adjusted Rank index was best suited to the task of comparison across hierarchy levels. Deficiencies with the other measures are noted.


Psychometrika | 1981

A monte carlo study of thirty internal criterion measures for cluster analysis

Glenn W. Milligan

A Monte Carlo evaluation of thirty internal criterion measures for cluster analysis was conducted. Artificial data sets were constructed with clusters which exhibited the properties of internal cohesion and external isolation. The data sets were analyzed by four hierarchical clustering methods. The resulting values of the internal criteria were compared with two external criterion indices which determined the degree of recovery of correct cluster structure by the algorithms. The results indicated that a subset of internal criterion measures could be identified which appear to be valid indices of correct cluster recovery. Indices from this subset could form the basis of a permutation test for the existence of cluster structure or a clustering algorithm.


Multivariate Behavioral Research | 1981

A Review of Monte Carlo Tests of Cluster Analysis.

Glenn W. Milligan

A review of Monte Carlo validation studies of clustering algorithms is presented. Several validation studies have tended to support the view that Wards minimum variance hierarchical method gives the best recovery of cluster structure. However, a more complete review of the validation literature on clustering indicates that other algorithms may provide better recovery under a variety of conditions. Applied researchers are cautioned concerning the uncritical selection of Wards method for empirical research. Alternative explanations for the differential recovery performance are explored and recommendations are made for future Monte Carlo experiments.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1983

The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure

Glenn W. Milligan; S. C. Soon; Lisa M. Sokol

An evaluation of four clustering methods and four external criterion measures was conducted with respect to the effect of the number of clusters, dimensionality, and relative cluster sizes on the recovery of true cluster structure. The four methods were the single link, complete link, group average (UPGMA), and Wards minimum variance algorithms. The results indicated that the four criterion measures were generally consistent with each other, of which two highly similar pairs were identified. The tirst pair consisted of the Rand and corrected Rand statistics, and the second pair was the Jaccard and the Fowlkes and Mallows indexes. With respect to the methods, recovery was found to improve as the number of clusters increased and as the number of dimensions increased. The relative cluster size factor produced differential performance effects, with Wards procedure providing the best recovery when the clusters were of equal size. The group average method gave equivalent or better recovery when the clusters were of unequal size.


Psychometrika | 1985

An algorithm for generating artificial test clusters

Glenn W. Milligan

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.


Educational and Psychological Measurement | 1980

A Two-Stage Clustering Algorithm with Robust Recovery Characteristics

Glenn W. Milligan; Lisa M. Sokol

Two FORTRAN IV computer programs for a two-stage clustering algorithm with robust recovery characteristics are described. In Stage 1, a group average hierarchical algorithm generates cluster centroids which are used as starting seed points for Stage 2, Janceys k-means nonhierarchical algorithm. Also available with the hierarchical algorithm is a hypothesis test procedure which can be used to determine whether significant cluster structure exists in the data.

Collaboration


Dive into the Glenn W. Milligan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Honggeng Zhou

University of New Hampshire

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge