Timothy C. Havens
Michigan Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Timothy C. Havens.
IEEE Transactions on Fuzzy Systems | 2012
Timothy C. Havens; James C. Bezdek; Christopher Leckie; Lawrence O. Hall; Marimuthu Palaniswami
Very large (VL) data or big data are any data that you cannot load into your computers working memory. This is not an objective definition, but a definition that is easy to understand and one that is practical, because there is a dataset too big for any computer you might use; hence, this is VL data for you. Clustering is one of the primary tasks used in the pattern recognition and data mining communities to search VL databases (including VL images) in various applications, and so, clustering algorithms that scale well to VL data are important and useful. This paper compares the efficacy of three different implementations of techniques aimed to extend fuzzy c-means (FCM) clustering to VL data. Specifically, we compare methods that are based on 1) sampling followed by noniterative extension; 2) incremental techniques that make one sequential pass through subsets of the data; and 3) kernelized versions of FCM that provide approximations based on sampling, including three proposed algorithms. We use both loadable and VL datasets to conduct the numerical experiments that facilitate comparisons based on time and space complexity, speed, quality of approximations to batch FCM (for loadable data), and assessment of matches between partitions and ground truth. Empirical results show that random sampling plus extension FCM, bit-reduced FCM, and approximate kernel FCM are good choices to approximate FCM for VL data. We conclude by demonstrating the VL algorithms on a dataset with 5 billion objects and presenting a set of recommendations regarding the use of different VL FCM clustering schemes.
knowledge discovery and data mining | 2011
Radha Chitta; Rong Jin; Timothy C. Havens; Anil K. Jain
Digital data explosion mandates the development of scalable tools to organize the data in a meaningful and easily accessible form. Clustering is a commonly used tool for data organization. However, many clustering algorithms designed to handle large data sets assume linear separability of data and hence do not perform well on real world data sets. While kernel-based clustering algorithms can capture the non-linear structure in data, they do not scale well in terms of speed and memory requirements when the number of objects to be clustered exceeds tens of thousands. We propose an approximation scheme for kernel k-means, termed approximate kernel k-means, that reduces both the computational complexity and the memory requirements by employing a randomized approach. We show both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k-means algorithm, but with dramatically reduced run-time complexity and memory requirements.
IEEE Transactions on Knowledge and Data Engineering | 2012
Timothy C. Havens; James C. Bezdek
The VAT algorithm is a visual method for determining the possible number of clusters in, or the cluster tendency of a set of objects. The improved VAT (iVAT) algorithm uses a graph-theoretic distance transform to improve the effectiveness of the VAT algorithm for “tough” cases where VAT fails to accurately show the cluster tendency. In this paper, we present an efficient formulation of the iVAT algorithm which reduces the computational complexity of the iVAT algorithm from O(N3) to O(N2). We also prove a direct relationship between the VAT image and the iVAT image produced by our efficient formulation. We conclude with three examples displaying clustering tendencies in three of the Karypis data sets that illustrate the improvement offered by the iVAT transformation. We also provide a comparison of iVAT images to those produced by the Reverse Cuthill-Mckee (RCM) algorithm; our examples suggest that iVAT is superior to the RCM method of display.
ieee swarm intelligence symposium | 2008
Timothy C. Havens; Christopher J. Spain; Nathan Salmon; James M. Keller
There are many function optimization algorithms based on the collective behavior of natural systems - Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are two of the most popular. This paper presents a new adaptation of the PSO algorithm, entitled Roach Infestation Optimization (RIO), that is inspired by recent discoveries in the social behavior of cockroaches. We present the development of the simple behaviors of the individual agents, which emulate some of the discovered cockroach social behaviors. We also describe a ldquohungryrdquo version of the PSO and RIO, which we aptly call Hungry PSO and Hungry RIO. Comparisons with standard PSO show that Hungry PSO, RIO, and Hungry RIO are all more effective at finding the global optima of a suite of test functions.
Pattern Recognition | 2011
Masud Moshtaghi; Timothy C. Havens; James C. Bezdek; Laurence Anthony F. Park; Christopher Leckie; Sutharshan Rajasegarar; James M. Keller; Marimuthu Palaniswami
Comparing, clustering and merging ellipsoids are problems that arise in various applications, e.g., anomaly detection in wireless sensor networks and motif-based patterned fabrics. We develop a theory underlying three measures of similarity that can be used to find groups of similar ellipsoids in p-space. Clusters of ellipsoids are suggested by dark blocks along the diagonal of a reordered dissimilarity image (RDI). The RDI is built with the recursive iVAT algorithm using any of the three (dis) similarity measures as input and performs two functions: (i) it is used to visually assess and estimate the number of possible clusters in the data; and (ii) it offers a means for comparing the three similarity measures. Finally, we apply the single linkage and CLODD clustering algorithms to three two-dimensional data sets using each of the three dissimilarity matrices as input. Two data sets are synthetic, and the third is a set of real WSN data that has one known second order node anomaly. We conclude that focal distance is the best measure of elliptical similarity, iVAT images are a reliable basis for estimating cluster structures in sets of ellipsoids, and single linkage can successfully extract the indicated clusters.
IEEE Transactions on Fuzzy Systems | 2015
Christian Wagner; Simon Miller; Jonathan M. Garibaldi; Derek T. Anderson; Timothy C. Havens
In this paper, a new approach is presented to model interval-based data using fuzzy sets (FSs). Specifically, we show how both crisp and uncertain intervals (where there is uncertainty about the endpoints of intervals) collected from individual or multiple survey participants over single or repeated surveys can be modeled using type-1, interval type-2, or general type-2 FSs based on zSlices. The proposed approach is designed to minimize any loss of information when transferring the interval-based data into FS models, and to avoid, as much as possible, assumptions about the distribution of the data. Furthermore, our approach does not rely on data preprocessing or outlier removal, which can lead to the elimination of important information. Different types of uncertainty contained within the data, namely intra- and inter-source uncertainty, are identified and modeled using the different degrees of freedom of type-2 FSs, thus providing a clear representation and separation of these individual types of uncertainty present in the data. We provide full details of the proposed approach, as well as a series of detailed examples based on both real-world and synthetic data. We perform comparisons with analogue techniques to derive FSs from intervals, namely the interval approach and the enhanced interval approach, and highlight the practical applicability of the proposed approach.
IEEE Transactions on Fuzzy Systems | 2010
Isaac J. Sledge; James C. Bezdek; Timothy C. Havens; James M. Keller
Numerous computational schemes have arisen over the years that attempt to learn information about objects based upon the similarity or dissimilarity of one object to another. One such scheme, clustering, looks for self-similar groups of objects. To use clustering algorithms, an investigator must often have a priori knowledge of the number of clusters, i.e., c, to search for in the data. Moreover, it is often convenient to have ways to rank the returned results, either for a single value of c, a range of cs different clustering methods, or any combination thereof. However, the task of assessing the quality of the results, so that c may be determined objectively, is currently ill-defined for object-object relationships. To bridge this gap, we generalize three well-known validity indices: the modified Huberts Gamma, Xie-Beni, and the generalized Dunns indices, to relational data. In doing so, we develop a framework to convert many other validity indices to a relational form. Numerical examples on 12 datasets (samples from four normal mixtures, four real-world object datasets, and four real-world “pure relational” datasets) using the relational duals of the hard, fuzzy, and possibilistic c-means cluster algorithms are offered to illustrate and evaluate the new indices.
information processing and management of uncertainty | 2010
Derek T. Anderson; James M. Keller; Timothy C. Havens
Fuzzy integrals are very useful for fusing confidence or opinions from a variety of sources. These integrals are non-linear combinations of the support functions with the (possibly subjective) worth of subsets of the sources, realized by a fuzzy measure. There have been many applications and extensions of fuzzy integrals and this paper deals with a Sugeno integral where both the integrand and the measure take on fuzzy number values. A crucial aspect of using fuzzy integrals for fusion is determining or learning the measures. Here, we propose a genetic algorithm with novel cross-over and mutation operators to learn fuzzy-valued fuzzy measures for a fuzzy-valued Sugeno integral.
IEEE Computational Intelligence Magazine | 2011
James C. Bezdek; Sutharshan Rajasegarar; Masud Moshtaghi; Christopher Leckie; Marimuthu Palaniswami; Timothy C. Havens
We apply a recently developed model for anomaly detection to sensor data collected from a single node in the Heron Island wireless sensor network, which in turn is part of the Great Barrier Reef Ocean Observation System. The collection period spanned six hours each day from February 21 to March 22, 2009. Cyclone Hamish occurred on March 9, 2009, roughly in the middle of the collection period. Our system converts sensor measurements to elliptical summaries. Then a dissimilarity image of the data is built from a measure of focal distance between pairs of ellipses. Dark blocks along the diagonal of the image suggest clusters of ellipses. Finally, the single linkage algorithm extracts clusters from the dissimilarity data. We illustrate the model with three two-dimensional subsets of the three dimensional measurements of (air) pressure, temperature and humidity. Our examples show that iVAT images of focal distance are a reliable basis for estimating cluster structures in sets of ellipses, and that single linkage can successfully extract the indicated clusters. In particular, we are able to clearly isolate the cyclone Hamish event with this method, which demonstrates the ability of our model to detect anomalies in environmental monitoring networks.
IEEE Transactions on Fuzzy Systems | 2013
Timothy C. Havens; James C. Bezdek; Christopher Leckie; Kotagiri Ramamohanarao; Marimuthu Palaniswami
We discuss a new formulation of a fuzzy validity index that generalizes the Newman-Girvan (NG) modularity function. The NG function serves as a cluster validity functional in community detection studies. The input data is an undirected weighted graph that represents, e.g., a social network. Clusters correspond to socially similar substructures in the network. We compare our fuzzy modularity with two existing modularity functions using the well-studied Karate Club and American College Football datasets.