Martin H. C. Law
Michigan State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin H. C. Law.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004
Martin H. C. Law; Mário A. T. Figueiredo; Anil K. Jain
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006
Martin H. C. Law; Anil K. Jain
Understanding the structure of multidimensional patterns, especially in unsupervised cases, is of fundamental importance in data mining, pattern recognition, and machine learning. Several algorithms have been proposed to analyze the structure of high-dimensional data based on the notion of manifold learning. These algorithms have been used to extract the intrinsic characteristics of different types of high-dimensional data by performing nonlinear dimensionality reduction. Most of these algorithms operate in a batch mode and cannot be efficiently applied when data are collected sequentially. In this paper, we describe an incremental version of ISOMAP, one of the key manifold learning algorithms. Our experiments on synthetic data as well as real world images demonstrate that our modified algorithm can maintain an accurate low-dimensional representation of the data in an efficient manner.
computer vision and pattern recognition | 2004
Martin H. C. Law; Alexander Topchy; Anil K. Jain
Conventional clustering algorithms utilize a single criterion that may not conform to the diverse shapes of the underlying clusters. We offer a new clustering approach that uses multiple clustering objective functions simultaneously. The proposed multiobjective clustering is a two-step process. It includes detection of clusters by a set of candidate objective functions as well as their integration into the target partition. A key ingredient of the approach is a cluster goodness junction that evaluates the utility of multiple clusters using re-sampling techniques. Multiobjective data clustering is obtained as a solution to a discrete optimization problem in the space of clusters. At meta-level, our algorithm incorporates conflict resolution techniques along with the natural data constraints. An empirical study on a number of artificial and real-world data sets demonstrates that multiobjective data clustering leads to valid and robust data partitions.
international conference on data mining | 2004
Alexander Topchy; Martin H. C. Law; Anil K. Jain; Ana L. N. Fred
In combination of multiple partitions, one is usually interested in deriving a consensus solution with a quality better than that of given partitions. Several recent studies have empirically demonstrated improved accuracy of clustering ensembles on a number of artificial and real-world data sets. Unlike certain multiple supervised classifier systems, convergence properties of unsupervised clustering ensembles remain unknown for conventional combination schemes. In this paper, we present formal arguments on the effectiveness of cluster ensemble from two perspectives. The first is based on a stochastic partition generation model related to re-labeling and consensus function with plurality voting. The second is to study the property of the mean partition of an ensemble with respect to a metric on the space of all possible partitions. In both the cases, the consensus solution can be shown to converge to a true underlying clustering solution as the number of partitions in the ensemble increases. This paper provides a rigorous justification for the use of cluster ensemble.
pattern recognition and machine intelligence | 2005
Anil K. Jain; Martin H. C. Law
Cluster analysis deals with the automatic discovery of the grouping of a set of patterns. Despite more than 40 years of research, there are still many challenges in data clustering from both theoretical and practical viewpoints. In this paper, we describe several recent advances in data clustering: clustering ensemble, feature selection, and clustering with constraints.
computer vision and pattern recognition | 2005
Tilman Lange; Martin H. C. Law; Anil K. Jain; Joachim M. Buhmann
Classification problems abundantly arise in many computer vision tasks eing of supervised, semi-supervised or unsupervised nature. Even when class labels are not available, a user still might favor certain grouping solutions over others. This bias can be expressed either by providing a clustering criterion or cost function and, in addition to that, by specifying pairwise constraints on the assignment of objects to classes. In this work, we discuss a unifying formulation for labelled and unlabelled data that can incorporate constrained data for model fitting. Our approach models the constraint information by the maximum entropy principle. This modeling strategy allows us (i) to handle constraint violations and soft constraints, and, at the same time, (ii) to speed up the optimization process. Experimental results on face classification and image segmentation indicates that the proposed algorithm is computationally efficient and generates superior groupings when compared with alternative techniques.
international conference on pattern recognition | 2004
Anil K. Jain; Alexander Topchy; Martin H. C. Law; Joachim M. Buhmann
Numerous clustering algorithms, their taxonomies and evaluation studies are available in the literature. Despite the diversity of different clustering algorithms, solutions delivered by these algorithms exhibit many commonalities. An analysis of the similarity and properties of clustering objective functions is necessary from the operational/user perspective. We revisit conventional categorization of clustering algorithms and attempt to relate them according to the partitions they produce. We empirically study the similarity of clustering solutions obtained by many traditional as well as relatively recent clustering algorithms on a number of real-world data sets. Sammons mapping and a complete-link clustering of the inter-clustering dissimilarity values are performed to detect a meaningful grouping of the objective functions. We find that only a small number of clustering algorithms are sufficient to represent a large spectrum of clustering criteria. For example, interesting groups of clustering algorithms are centered around the graph partitioning, linkage-based and Gaussian mixture model based algorithms.
Lecture Notes in Computer Science | 2004
Martin H. C. Law; Alexander Topchy; Anil K. Jain
Several clustering algorithms equipped with pairwise hard constraints between data points are known to improve the accuracy of clustering solutions. We develop a new clustering algorithm that extends mixture clustering in the presence of (i) soft constraints, and (ii) group-level constraints. Soft constraints can reflect the uncertainty associated with a priori knowledge about pairs of points that should or should not belong to the same cluster, while group-level constraints can capture larger building blocks of the target partition when afforded by the side information. Assuming that the data points are generated by a mixture of Gaussians, we derive the EM algorithm to estimate the parameters of different clusters. Empirical study demonstrates that the use of soft constraints results in superior data partitions normally unattainable without constraints. Further, the solutions are more robust when the hard constraints may be incorrect.
Archive | 2006
Tin Kam Ho; Mitra Basu; Martin H. C. Law
When popular classifiers fail to perform to perfect accuracy in a practical application, possible causes can be deficiencies in the algorithms, intrinsic difficulties in the data, and a mismatch between methods and problems. We propose to address this mystery by developing measures of geometrical and topological characteristics of point sets in high-dimensional spaces. Such measures provide a basis for analyzing classifier behavior beyond estimates of error rates. We discuss several measures useful for this characterization, and their utility in analyzing data sets with known or controlled complexity. Our observations confirm their effectiveness and suggest several future directions.
iberian conference on pattern recognition and image analysis | 2003
Mário A. T. Figueiredo; Anil K. Jain; Martin H. C. Law
We propose a feature selection approach for clustering which extends Koller and Sahami’s mutual-information-based criterion to the unsupervised case. This is achieved with the help of a mixture-based model and the corresponding expectation-maximization algorithm. The result is a backward search scheme, able to sort the features by order of relevance. Finally, an MDL criterion is used to prune the sorted list of features, yielding a feature selection criterion. The proposed approach can be classified as a wrapper, since it wraps the mixture estimation algorithm in an outer layer that performs feature selection. Preliminary experimental results show that the proposed method has promising performance.