Jacalyn M. Huband | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jacalyn M. Huband is active.

Explore More

Publication

Featured researches published by Jacalyn M. Huband.

Pattern Recognition | 2005

bigVAT: Visual assessment of cluster tendency for large data sets

Jacalyn M. Huband; James C. Bezdek; Richard J. Hathaway

Assessment of clustering tendency is an important first step in cluster analysis. One tool for assessing cluster tendency is the Visual Assessment of Tendency (VAT) algorithm. VAT produces an image matrix that can be used for visual assessment of cluster tendency in either relational or object data. However, VAT becomes intractable for large data sets. The revised VAT (reVAT) algorithm reduces the number of computations done by VAT, and replaces the image matrix with a set of profile graphs that are used for the visual assessment step. Thus, reVAT overcomes the large data set problem which encumbers VAT, but presents a new problem: interpretation of the set of reVAT profile graphs becomes very difficult when the number of clusters is large, or there is significant overlap between groups of objects in the data. In this paper, we propose a new algorithm called bigVAT which (i) solves the large data problem suffered by VAT, and (ii) solves the interpretation problem suffered by reVAT. bigVAT combines the quasi-ordering technique used by reVAT with an image display of the set of profile graphs displaying the clustering tendency information with a VAT-like image. Several numerical examples are given to illustrate and support the new technique.

Pattern Recognition | 2006

Scalable visual assessment of cluster tendency for large data sets

Richard J. Hathaway; James C. Bezdek; Jacalyn M. Huband

The problem of determining whether clusters are present in a data set (i.e., assessment of cluster tendency) is an important first step in cluster analysis. The visual assessment of cluster tendency (VAT) tool has been successful in determining potential cluster structure of various data sets, but it can be computationally expensive for large data sets. In this article, we present a new scalable, sample-based version of VAT, which is feasible for large data sets. We include analysis and numerical examples that demonstrate the new scalable VAT algorithm.

ieee international conference on fuzzy systems | 2005

Kernelized Non-Euclidean Relational Fuzzy c-Means Algorithm

Richard J. Hathaway; Jacalyn M. Huband; James C. Bezdek

Successes with kernel-based classification methods have spawned recent efforts to kernelize clustering algorithms for object data. Here we extend the kernelization to relational data clustering by proposing a kernelized form of the nonEuclidean relational fuzzy c-means algorithm. A numerical test result is included

International Journal of Intelligent Systems | 2006

Approximate clustering in very large relational data

James C. Bezdek; Richard J. Hathaway; Jacalyn M. Huband; Christopher Leckie; Ramamohanarao Kotagiri

Different extensions of fuzzy c‐means (FCM) clustering have been developed to approximate FCM clustering in very large (unloadable) image (eFFCM) and object vector (geFFCM) data. Both extensions share three phases: (1) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (2) clustering with (literal or exact) FCM; and (3) noniterative extension of the literal clusters to the remainder of the data set. This article presents a comparable method for the remaining case of interest, namely, clustering in VL relational data. We will propose and discuss each of the four phases of eNERF and our algorithm for this last case: (1) finding distinguished features that monitor progressive sampling, (2) progressively sampling a square N × N relation matrix RN until an n × n sample relation Rn passes a statistical test, (3) clustering Rn with literal non‐Euclidean relational fuzzy c‐means, and (4) extending the clusters in Rn to the remainder of the relational data. The extension phase in this third case is not as straightforward as it was in the image and object data cases, but our numerical examples suggest that eNERF has the same approximation qualities that eFFCM and geFFCM do.

Annals of Mathematics and Artificial Intelligence | 2009

Is VAT really single linkage in disguise

Timothy C. Havens; James C. Bezdek; James M. Keller; Mihail Popescu; Jacalyn M. Huband

This paper addresses the relationship between the Visual Assessment of cluster Tendency (VAT) algorithm and single linkage hierarchical clustering. We present an analytical comparison of the two algorithms in conjunction with numerical examples to show that VAT reordering of dissimilarity data is directly related to the clusters produced by single linkage hierarchical clustering. This analysis is important to understanding the underlying theory of VAT and, more generally, other algorithms that are based on VAT-ordered dissimilarity data.

north american fuzzy information processing society | 2004

Revised Visual Assessment of (Cluster) Tendency (reVAT)

Jacalyn M. Huband; James C. Bezdek; R.J. Hathaway

The Visual Assessment of (cluster) Tendency (VAT) method readily displays cluster tendency for small data sets as grayscale images, but is too computationally costly for larger data sets. A revised version of VAT is presented here that can efficiently be applied to larger collections of data. A numerical example is included.

ieee international conference on fuzzy systems | 2008

A new cluster validity measure for bioinformatics relational datasets

Mihail Popescu; James C. Bezdek; James M. Keller; Timothy C. Havens; Jacalyn M. Huband

Many important applications in biology have underlying datasets that are relational, that is, only the (dis)similarity between biological objects (amino acid sequences, gene expression profiles, etc.) is known and not their feature values in some feature space. Examples of such relational datasets are the gene similarity matrices obtained from BLAST, gene expression data, or gene ontology (GO) similarity measures. Once a relational dataset is obtained, a common question asked is how many groups of objects are represented in the original dataset. The answer to this question is usually obtained by employing a clustering algorithm and a cluster validity measure. In this article we describe a cluster validity measure for non-Euclidean relational fuzzy c-means that is based on the correlation between a relation induced on the data by the cluster memberships and the original relational data. This validity measure can be applied to partitions made by any fuzzy relational clustering algorithm. We illustrate our measure by validating clusters in several dissimilarity matrices for a set of 194 gene products obtained using BLAST and GO similarities.

soft computing | 2009

Finding the number of clusters in ordered dissimilarities

Isaac J. Sledge; Timothy C. Havens; Jacalyn M. Huband; James C. Bezdek; James M. Keller

As humans, we have innate faculties that allow us to efficiently segment groups of objects. Computers, to some degree, can be programmed with similar categorical capabilities, which stem from exploratory data analysis. Out of the various subsets of data reasoning, clustering provides insight into the structure and relationships of input samples situated in a number of distributions. To determine these relationships, many clustering methods rely on one or more human inputs; the most important being the number of distributions, c, to seek. This work investigates a technique for estimating the number of clusters from a general type of data called relational data. Several numerical examples are presented to illustrate the effectiveness of the proposed method.

fuzzy systems and knowledge discovery | 2008

Automatic) Cluster Count Extraction from Unlabeled Data Sets

Isaac J. Sledge; Jacalyn M. Huband; James C. Bezdek

Through the years researchers have crafted algorithms to carry out the process of object partitioning (clustering). All clustering algorithms ultimately rely on human inputs, principally in the form of the number of clusters to seek. This work investigates a new technique for automating cluster assessment and estimating the number of clusters to look for in unlabeled data utilizing the VAT [visual assessment of cluster tendency] algorithm coupled with common image processing techniques. Several numerical examples are presented to illustrate the effectiveness of the proposed method.

iberoamerican congress on pattern recognition | 2006

Maximin initialization for cluster analysis

Richard J. Hathaway; James C. Bezdek; Jacalyn M. Huband

Most iterative clustering algorithms require a good initialization to achieve accurate results. A new initialization procedure for all such algorithms is given that is exact when the data contain compact, separated clusters. Our examples use c-means clustering.

Explore More