Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Douglas Steinley is active.

Publication


Featured researches published by Douglas Steinley.


British Journal of Mathematical and Statistical Psychology | 2006

K-means clustering: A half-century synthesis

Douglas Steinley

This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years. The K-means method is first introduced, various formulations of the minimum variance loss function and alternative loss functions within the same class are outlined, and different methods of choosing the number of clusters and initialization, variable preprocessing, and data reduction schemes are discussed. Theoretic statistical results are provided and various extensions of K-means using different metrics or modifications of the original algorithm are given, leading to a unifying treatment of K-means and some of its extensions. Finally, several future studies are outlined that could enhance the understanding of numerous subtleties affecting the performance of the K-means method.


Journal of Classification | 2007

Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques

Douglas Steinley; Michael J. Brusco

K-means clustering is arguably the most popular technique for partitioning data. Unfortunately, K-means suffers from the well-known problem of locally optimal solutions. Furthermore, the final partition is dependent upon the initial configuration, making the choice of starting partitions all the more important. This paper evaluates 12 procedures proposed in the literature and provides recommendations for best practices.


Psychological Methods | 2011

Evaluating mixture modeling for clustering: recommendations and cautions.

Douglas Steinley; Michael J. Brusco

This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison, 2002). Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated. To provide a link to the current literature, comparisons are made with K-means clustering in 3 detailed Monte Carlo studies. The findings have implications for applied researchers in that mixture-model clustering techniques performed best when the covariance structure and number of clusters were known. However, as the information about the shape and number of clusters became unknown, degraded performance was observed for both K-means clustering and mixture-model clustering.


Alcoholism: Clinical and Experimental Research | 2010

Developmental Trajectories of Impulsivity and Their Association With Alcohol Use and Related Outcomes During Emerging and Young Adulthood I

Andrew K. Littlefield; Kenneth J. Sher; Douglas Steinley

BACKGROUND Research has documented normative patterns of personality change during emerging and young adulthood that reflect decreases in traits associated with substance use, such as impulsivity. However, evidence suggests variability in these developmental changes. METHODS This study examined trajectories of impulsivity and their association with substance use and related problems from ages 18 to 35. Analyses were based on data collected from a cohort of college students (N = 489), at high and low risk for AUDs, first assessed as freshmen at a large, public university. RESULTS Mixture modeling identified five trajectory groups that differed in baseline levels of impulsivity and developmental patterns of change. Notably, the trajectory group that exhibited the sharpest declines in impulsivity tended to display accelerated decreases in alcohol involvement from ages 18 to 25 compared to the other impulsivity groups. CONCLUSION Findings highlight the developmental nature of impulsivity across emerging and young adulthood and provide an empirical framework to identify key covariates of individual changes of impulsivity.


Multivariate Behavioral Research | 2008

A New Variable Weighting and Selection Procedure for K-means Cluster Analysis

Douglas Steinley; Michael J. Brusco

A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these procedures are demonstrated in a simulation study, showing favorable results when compared with existing standardization methods. A detailed demonstration of the weighting and selection procedure is provided for the well-known Fisher Iris data and several synthetic data sets.


Psychological Methods | 2006

Profiling local optima in K-means clustering: developing a diagnostic technique.

Douglas Steinley

Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate distributions of clusters; and (e) various multidimensional data structures. The results are evaluated in terms of the Hubert-Arabie adjusted Rand index, and several observations concerning the performance of K-means clustering are made. Finally, the article concludes with the proposal of a diagnostic technique indicating when the partitioning given by a K-means cluster analysis can be trusted. By combining the information from several observable characteristics of the data (number of clusters, number of variables, sample size, etc.) with the prevalence of unique local optima in several thousand implementations of the K-means algorithm, the author provides a method capable of guiding key data-analysis decisions.


Journal of Classification | 2005

OCLUS: An Analytic Method for Generating Clusters with Known Overlap

Douglas Steinley; Robert A. Henson

AbstractThe primary method for validating cluster analysis techniques is throughMonte Carlo simulations that rely on generating data with known cluster structure (e.g., Milligan 1996). This paper defines two kinds of data generation mechanisms with cluster overlap, marginal and joint; current cluster generation methods are framed within these definitions. An algorithm generating overlapping clusters based on shared densities from several different multivariate distributions is proposed and shown to lead to an easily understandable notion of cluster overlap. Besides outlining the advantages of generating clusters within this framework, a discussion is given of how the proposed data generation technique can be used to augment research into current classification techniques such as finite mixture modeling, classification algorithm robustness, and latent profile analysis.


Psychological Methods | 2011

Choosing the number of clusters in Κ-means clustering.

Douglas Steinley; Michael J. Brusco

Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the number of clusters. Through a series of simulations, the proposed methodology is shown to outperform several other commonly used procedures for determining both the presence of clusters and their number.


Multivariate Behavioral Research | 2008

Cautionary Remarks on the Use of Clusterwise Regression.

Michael J. Brusco; J. Dennis Cradit; Douglas Steinley; Gavin L. Fox

Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the within-cluster regression models from the error explained by the clustering process. In some cases, most of the variation in the response variable is explained by clustering the objects, with little additional benefit provided by the within-cluster regression models. Accordingly, there is tremendous potential for overfitting with clusterwise regression, which is demonstrated with numerical examples and simulation experiments. To guard against the misuse of clusterwise regression, we recommend a benchmarking procedure that compares the results for the observed empirical data with those obtained across a set of random permutations of the response measures. We also demonstrate the potential for overfitting via an empirical application related to the prediction of reflective judgment using high school and college performance measures.


British Journal of Mathematical and Statistical Psychology | 2008

Stability analysis in K‐means clustering

Douglas Steinley

This paper develops a new procedure, called stability analysis, for K-means clustering. Instead of ignoring local optima and only considering the best solution found, this procedure takes advantage of additional information from a K-means cluster analysis. The information from the locally optimal solutions is collected in an object by object co-occurrence matrix. The co-occurrence matrix is clustered and subsequently reordered by a steepest ascent quadratic assignment procedure to aid visual interpretation of the multidimensional cluster structure. Subsequently, measures are developed to determine the overall structure of a data set, the number of clusters and the multidimensional relationships between the clusters.

Collaboration


Dive into the Douglas Steinley's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge