William F. Punch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William F. Punch is active.

Explore More

Publication

Featured researches published by William F. Punch.

IEEE Transactions on Evolutionary Computation | 2000

Dimensionality reduction using genetic algorithms

Michael L. Raymer; William F. Punch; Erik D. Goodman; Leslie A. Kuhn; Anil K. Jain

Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern affect the success of subsequent classification. Feature extraction is the process of deriving new features from original features to reduce the cost of feature measurement, increase classifier efficiency, and allow higher accuracy. Many feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and classification efficiency, it does not necessarily reduce the number of features to be measured since each new feature may be a linear combination of all of the features in the original pattern vector. Here, we present a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a feature weight vector used to scale the individual features in the original pattern vectors. A masking vector is also employed for simultaneous selection of a feature subset. We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Clustering ensembles: models of consensus and weak partitions

Alexander Topchy; Anil K. Jain; William F. Punch

Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial, or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum-likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intraclass variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world data sets.

international conference on data mining | 2003

Combining multiple weak clusterings

Alexander Topchy; Anil K. Jain; William F. Punch

A data set can be clustered in many ways depending on the clustering algorithm employed, parameter settings used and other factors. Can multiple clusterings be combined so that the final partitioning of data provides better clustering? The answer depends on the quality of clusterings to be combined as well as the properties of the fusion method. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. As a result, we show that the consensus function is related to the classical intra-class variance criterion using the generalized mutual information definition. Second, we show the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. We analyze the combination accuracy as a function of parameters controlling the power and resolution of component partitions as well as the learning dynamics vs. the number of clusterings involved. Finally, some empirical studies compare the effectiveness of several consensus functions.

Frontiers in Education | 2003

Predicting student performance: an application of data mining methods with an educational Web-based system

Behrouz Minaei-Bidgoli; Deborah A. Kashy; Gerd Kortemeyer; William F. Punch

Newly developed Web-based educational technologies offer researchers unique opportunities to study how students learn and what approaches to learning lead to success. Web-based systems routinely collect vast quantities of data on user patterns, and data mining methods can be applied to these databases. This paper presents an approach to classifying students in order to predict their final grade based on features extracted from logged data in an education Web-based system. We design, implement, and evaluate a series of pattern classifiers and compare their performance on an online course dataset. A combination of multiple classifiers leads to a significant improvement in classification performance. Furthermore, by learning an appropriate weighting of the features used via a genetic algorithm (GA), we further improve prediction accuracy. The GA is demonstrated to successfully improve the accuracy of combined classifier performance, about 10 to 12% when comparing to non-GA classifier. This method may be of considerable usefulness in identifying students at risk early, especially in very large classes, and allow the instructor to provide appropriate advising in a timely manner.

international parallel and distributed processing symposium | 1994

Coarse-grain parallel genetic algorithms: categorization and new approach

Shyh-Chang Lin; William F. Punch; Erik D. Goodman

This paper describes a number of different coarse-grain GAs, including various migration strategies and connectivity schemes to address the premature convergence problem. These approaches are evaluated on a graph partitioning problem. Our experiments showed, first, that the sequential GAs used are not as effective as parallel GAs for this graph partition problem. Second, for coarse-grain GAs, the results indicate that using a large number of nodes and exchanging individuals asynchronously among them is very effective. Third, GAs that exchange solutions based on population similarity instead of a fixed connection topology get better results without any degradation in speed. Finally, we propose a new coarse-grained GA architecture, the Injection Island GA (iiGA). The preliminary results of iiGAs show them to be a promising new approach to coarse-grain GAs.<<ETX>>

international conference on pattern recognition | 2004

Adaptive clustering ensembles

Alexander Topchy; Behrouz Minaei-Bidgoli; Anil K. Jain; William F. Punch

Clustering ensembles combine multiple partitions of the given data into a single clustering solution of better quality. Inspired by the success of supervised boosting algorithms, we devise an adaptive scheme for integration of multiple non-independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given data set. The sampling probability for each data point dynamically depends on the consistency of its previous assignments in the ensemble. New subsamples are drawn to increasingly focus on the problematic regions of the input feature space. A measure of a data points clustering consistency is defined to guide this adaptation. An empirical study compares the performance of adaptive and regular clustering ensembles using different consensus functions on a number of data sets. Experimental results demonstrate improved accuracy for some clustering structures.

genetic and evolutionary computation conference | 2003

Using genetic algorithms for data mining optimization in an educational web-based system

Behrouz Minaei-Bidgoli; William F. Punch

This paper presents an approach for classifying students in order to predict their final grade based on features extracted from logged data in an education web-based system. A combination of multiple classifiers leads to a significant improvement in classification performance. Through weighting the feature vectors using a Genetic Algorithm we can optimize the prediction accuracy and get a marked improvement over raw classification. It further shows that when the number of features is few; feature weighting is works better than just feature selection.

Nature | 2006

Ab initio determination of solid-state nanostructure

Pavol Juhas; David M. Cherba; Phillip M. Duxbury; William F. Punch; Simon J. L. Billinge

Advances in materials science and molecular biology followed rapidly from the ability to characterize atomic structure using single crystals. Structure determination is more difficult if single crystals are not available. Many complex inorganic materials that are of interest in nanotechnology have no periodic long-range order and so their structures cannot be solved using crystallographic methods. Here we demonstrate that ab initio structure solution of these nanostructured materials is feasible using diffraction data in combination with distance geometry methods. Precise, sub-ångström resolution distance data are experimentally available from the atomic pair distribution function (PDF). Current PDF analysis consists of structure refinement from reasonable initial structure guesses and it is not clear, a priori, that sufficient information exists in the PDF to obtain a unique structural solution. Here we present and validate two algorithms for structure reconstruction from precise unassigned interatomic distances for a range of clusters. We then apply the algorithms to find a unique, ab initio, structural solution for C60 from PDF data alone. This opens the door to sub-ångström resolution structure solution of nanomaterials, even when crystallographic methods fail.

international conference on information technology coding and computing | 2004

Ensembles of partitions via data resampling

Behrouz Minaei-Bidgoli; Alexander Topchy; William F. Punch

The combination of multiple clusterings is a difficult problem in the practice of distributed data mining. Both the cluster generation mechanism and the partition integration process influence the quality of the combinations. We propose a data resampling approach for building cluster ensembles that are both robust and stable. In particular, we investigate the effectiveness of a bootstrapping technique in conjunction with several combination algorithms. The empirical study shows that a meaningful consensus partition for an entire set of objects emerges from multiple clusterings of bootstrap samples, given optimal combination algorithm parameters. Experimental results for ensembles with varying numbers of partitions and clusters are reported for simulated and real data sets. Experimental results show improved stability and accuracy for consensus partitions obtained via a bootstrapping technique.

international conference on machine learning and applications | 2004

Mining interesting contrast rules for a web-based educational system

Behrouz Minaei-Bidgoli; Pang Ning Tan; William F. Punch

Web-based educational technologies allow educators to study how students learn (descriptive studies) and which learning strategies are most effective (causal/predictive studies). Since web-based educational systems collect vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of students, assessments, and the solution strategies adopted by students. This paper focuses on the discovery of interesting contrast rules, which are sets of conjunctive rules describing interesting characteristics of different segments of a population. In the context of webbased educational sy stems, contrast rules help to identifY attributes characterizing patterns of performance disparity between various groups of students. We propose a general formulation of contrast rules as well as a framework for finding such patterns. We apply this technique to an online educational sy stem developed at Michigan State University called LON-CAP A.

Explore More