Robert E. Banfield | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert E. Banfield is active.

Explore More

Publication

Featured researches published by Robert E. Banfield.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

A Comparison of Decision Tree Ensemble Creation Techniques

Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; W.P. Kegelmeyer

We experimentally evaluate bagging and seven other randomization-based approaches to creating an ensemble of decision tree classifiers. Statistical tests were performed on experimental results from 57 publicly available data sets. When cross-validation comparisons were tested for statistical significance, the best method was statistically more accurate than bagging on only eight of the 57 data sets. Alternatively, examining the average ranks of the algorithms across the group of data sets, we find that boosting, random forests, and randomized trees are statistically significantly better than bagging. Because our results suggest that using an appropriate ensemble size is important, we introduce an algorithm that decides when a sufficient number of classifiers has been created for an ensemble. Our algorithm uses the out-of-bag error estimate, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble

Information Fusion | 2004

Ensemble diversity measures and their application to thinning

Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; W. Philip Kegelmeyer

Abstract The diversity of an ensemble of classifiers can be calculated in a variety of ways. Here a diversity metric and a means for altering the diversity of an ensemble, called “thinning”, are introduced. We evaluate thinning algorithms created by several techniques on 22 publicly available datasets. When compared to other methods, our percentage correct diversity measure shows a greatest correlation between the increase in voted ensemble accuracy and the diversity value. Also, the analysis of different ensemble creation methods indicates that they generate different levels of diversity. Finally, the methods proposed for thinning show that ensembles can be made smaller without loss in accuracy.

international conference on multiple classifier systems | 2003

A new ensemble diversity measure applied to thinning ensembles

Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; W. Philip Kegelmeyer

We introduce a new way of describing the diversity of an ensemble of classifiers, the Percentage Correct Diversity Measure, and compare it against existing methods. We then introduce two new methods for removing classifiers from an ensemble based on diversity calculations. Empirical results for twelve datasets from the UC Irvine repository show that diversity is generally modeled by our measure and ensembles can be made smaller without loss in accuracy.

multiple classifier systems | 2004

A Comparison of Ensemble Creation Techniques

Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; Divya Bhadoria; W. Philip Kegelmeyer; Steven Eschrich

We experimentally evaluated bagging and six other randomization-based ensemble tree methods. Bagging uses randomization to create multiple training sets. Other approaches, such as Randomized C4.5, apply randomization in selecting a test at a given node of a tree. Then there are approaches, such as random forests and random subspaces, that apply randomization in the selection of attributes to be used in building the tree. On the other hand boosting incrementally builds classifiers by focusing on examples misclassified by existing classifiers. Experiments were performed on 34 publicly available data sets. While each of the other six approaches has some strengths, we find that none of them is consistently more accurate than standard bagging when tested for statistical significance.

international conference on data mining | 2003

Comparing pure parallel ensemble creation techniques against bagging

Lawrence O. Hall; Kevin W. Bowyer; Robert E. Banfield; Divya Bhadoria; W.P. Kegelmeyer; Steven Eschrich

We experimentally evaluate randomization-based approaches to creating an ensemble of decision-tree classifiers. Unlike methods related to boosting, all of the eight approaches considered here create each classifier in an ensemble independently of the other classifiers. Experiments were performed on 28 publicly available datasets, using C4.5 release 8 as the base classifier. While each of the other seven approaches has some strengths, we find that none of them is consistently more accurate than standard bagging when tested for statistical significance.

international conference on multiple classifier systems | 2005

Ensembles of classifiers from spatially disjoint data

Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; W. Philip Kegelmeyer

We describe an ensemble learning approach that accurately learns from data that has been partitioned according to the arbitrary spatial requirements of a large-scale simulation wherein classifiers may be trained only on the data local to a given partition. As a result, the class statistics can vary from partition to partition; some classes may even be missing from some partitions. In order to learn from such data, we combine a fast ensemble learning algorithm with Bayesian decision theory to generate an accurate predictive model of the simulation data. Results from a simulation of an impactor bar crushing a storage canister and from region recognition in face images show that regions of interest are successfully identified.

international conference on pattern recognition | 2008

Semi-supervised learning on large complex simulations

J.N. Korecki; Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; W.P. Kegelmeyer

Complex simulations can generate very large amounts of data stored disjointedly across many local disks. Learning from this data can be problematic due to the difficulty of obtaining labels for the data. We present an algorithm for the application of semi-supervised learning on disjoint data generated by complex simulations. Our semi-supervised technique shows a statistically significant accuracy improvement over supervised learning using the same underlying learning algorithm and requires less labeled data for comparable results.

Information Fusion | 2008

Using classifier ensembles to label spatially disjoint data

Larry Shoemaker; Robert E. Banfield; Lawrence O. Hall; Kevin W. Bowyer; W. Philip Kegelmeyer

We describe an ensemble approach to learning from arbitrarily partitioned data. The partitioning comes from the distributed processing requirements of a large scale simulation. The volume of the data is such that classifiers can train only on data local to a given partition. As a result of the partition reflecting the needs of the simulation, the class statistics can vary from partition to partition. Some classes will likely be missing from some partitions. We combine a fast ensemble learning algorithm with probabilistic majority voting in order to learn an accurate classifier from such data. Results from simulations of an impactor bar crushing a storage canister and from facial feature recognition show that regions of interest are successfully identified in spite of the class imbalance in the individual training sets.

systems, man and cybernetics | 2003

Why are neural networks sometimes much more accurate than decision trees: an analysis on a bio-informatics problem

Lawrence O. Hall; Xiaomei Liu; Kevin W. Bowyer; Robert E. Banfield

Bio-informatics data sets may be large in the number of examples and/or the number of features. Predicting the secondary structure of proteins from amino acid sequences is one example of high dimensional data for which large training sets exist. The data from the KDD Cup 2001 on the binding of compounds to thrombin is another example of a very high dimensional data set. This type of data set can require significant computing resources to train a neural network. In general, decision trees will require much less training time than neural networks. There have been a number of studies on the advantages of decision trees relative to neural networks for specific data sets. There are often statistically significant, though typically not very large, differences. Here, we examine one case in which a neural network greatly outperforms a decision tree; predicting the secondary structure of proteins. The hypothesis that the neural network learns important features of the data through its hidden units is explored by a using a neural network to transform data for decision tree training. Experiments show that this explains some of the performance difference, but not all. Ensembles of decision trees are compared with a single neural network. It is our conclusion that the problem of protein secondary structure prediction exhibits some characteristics that are fundamentally better exploited by a neural network model.

International Journal on Artificial Intelligence Tools | 2003

Is Error-Based Pruning Redeemable?

Lawrence O. Hall; Kevin W. Bowyer; Robert E. Banfield; Steven Eschrich; Richard Collins

Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision tree software. It uses a parameter, the certainty factor, that affects the size of the pruned tree. Several researchers have compared error based pruning with other approaches, and have shown results that suggest that error based pruning results in larger trees that give no increase in accuracy. They further suggest that as more data is added to the training set, the tree size after applying error based pruning continues to grow even though there is no increase in accuracy. It appears that these results were obtained with the default certainty factor value. Here, we show that varying the certainty factor allows significantly smaller trees to be obtained with minimal or no accuracy loss. Also, the growth of tree size with added data can be halted with an appropriate choice of certainty factor. Methods of determining the certainty factor are discussed for both small and large data sets. Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees with good accuracy when compared with reduced error pruning.

Explore More