Kristin P. Bennett | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kristin P. Bennett is active.

Explore More

Publication

Featured researches published by Kristin P. Bennett.

Optimization Methods & Software | 1992

Robust linear programming discrimination of two linearly inseparable sets

Kristin P. Bennett; Olvi L. Mangasarian

A single linear programming formulation is proposed which generates a plane that of minimizes an average sum of misclassified points belonging to two disjoint points sets in n-dimensional real space. When the convex hulls of the two sets are also disjoint, the plane completely separates the two sets. When the convex hulls intersect, our linear program, unlike all previously proposed linear programs, is guaranteed to generate some error-minimizing plane, without the imposition of extraneous normalization constraints that inevitably fail to handle certain cases. The effectiveness of the proposed linear program has been demonstrated by successfully testing it on a number of databases. In addition, it has been used in conjunction with the multisurface method of piecewise-linear separation to train a feed-forward neural network with a single hidden layer.

Sigkdd Explorations | 2000

Support vector machines: hype or hallelujah?

Kristin P. Bennett; Colin Campbell

Support Vector Machines (SVMs) and related kernel methods have become increasingly popular tools for data mining tasks such as classification, regression, and novelty detection. The goal of this tutorial is to provide an intuitive explanation of SVMs from a geometric perspective. The classification problem is used to investigate the basic concepts behind SVMs and to examine their strengths and weaknesses from a data mining perspective. While this overview is not comprehensive, it does provide resources for those interested in further exploring SVMs.

Machine Learning | 2002

Linear Programming Boosting via Column Generation

Ayhan Demiriz; Kristin P. Bennett; John Shawe-Taylor

We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.

Computational Optimization and Applications | 1999

Multicategory Classification by Support Vector Machines

Kristin P. Bennett

We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapniks Support Vector Machine (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise-linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise-nonlinear classification function. Each piece of this function can take the form of a polynomial, a radial basis function, or even a neural network. For the k > 2-class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach.

Optimization Methods & Software | 1994

Multicategory discrimination via linear programming

Kristin P. Bennett; Olvi L. Mangasarian

A single linear program is proposed for discriminating between the elements of κ disjoint point sets in the n-dimensional real space Rn . When the conical hulls of the κ sets are (κ−1)-point disjoint in Rn +1, a κ-piece piecewise-linear surface generated by the linear program completely separates the κ sets. This improves on a previous linear programming approach which required that each set be linearly separable from the remaining κ−1 sets. When the conical hulls of the κ sets are not (κ;−1)-point disjoint, the proposed linear program generates an error-minimizing piecewise-linear separator for the κ Sets. For this case it is shown that the null solution is never a unique solver of the linear program and occurs only under the rather rare condition when the mean of each point set equals the mean of the means of the other κ−l sets. This makes the proposed linear computational programming formulation useful for approximately discriminating between κ sets that are not piecewise-linear separable. Computationa...

knowledge discovery and data mining | 1999

Density-based indexing for approximate nearest-neighbor queries

Kristin P. Bennett; Usama M. Fayyad; Dan Geiger

We consider the problem of performing nearest-neighbor queries e ciently over large high-dimensional databases. Assuming that a full database scan to determine the nearest neighbor entries is not acceptable, we study the possibility of constructing an index structure over the database. It is well-accepted that traditional database indexing algorithms fail for high-dimensional data (say d > 10 or 20 depending on the scheme). Some arguments have advocated that nearest-neighbor queries do not even make sense for high-dimensional data since the ratio of maximum and minimum distance goes to 1 as dimensionality increases. We show that these arguments are based on over-restrictive assumptions, and that in the general case it is meaningful and possible to perform such queries. We present an approach for deriving a multidimensional index to support approximate nearestneighbor queries over large databases. Our approach, called DBIN, scales to high-dimensional databases by exploiting statistical properties of the data. The approach is based on statistically modeling the density of the content of the data table. DBIN uses the density model to derive a single index over the data table and requires physically re-writing data in a new table sorted by the newly created index (i.e. create what is known as a clustered-index in the database literature). The indexing scheme produces a mapping between a query point (a data record) and an ordering on the clustered index values. Data is then scanned according to the index until the probability that the nearest-neighbor has been found exceeds some threshold. We present theoretical and empirical justi cation for DBIN. The scheme supports a family of distance functions which includes the traditional Euclidean distance measure. Microsoft Research Technical Report MSR-TR-98-58 Revised: February 28, 1999 Contact Author: Usama Fayyad (http://research.microsoft.com/~fayyad) address: Microsoft Research One Microsoft Way Redmond, WA 98008, USA phone: +1-425-703-1528 fax: +1-425-936-7329 e-mail: [email protected] This work was performed while the author was visiting Microsoft Research This work was performed while the author was on sabbatical at Microsoft Research

Computational Optimization and Applications | 1993

Bilinear separation of two sets in n -space

Kristin P. Bennett; Olvi L. Mangasarian

The NP-complete problem of determining whether two disjoint point sets in then-dimensional real spaceRn can be separated by two planes is cast as a bilinear program, that is minimizing the scalar product of two linear functions on a polyhedral set. The bilinear program, which has a vertex solution, is processed by an iterative linear programming algorithm that terminates in a finite number of steps a point satisfying a necessary optimality condition or at a global minimum. Encouraging computational experience on a number of test problems is reported.

Machine Learning | 2000

Enlarging the Margins in Perceptron Decision Trees

Kristin P. Bennett; Nello Cristianini; John Shawe-Taylor; Donghui Wu

Capacity control in perceptron decision trees is typically performed by controlling their size. We prove that other quantities can be as relevant to reduce their flexibility and combat overfitting. In particular, we provide an upper bound on the generalization error which depends both on the size of the tree and on the margin of the decision nodes. So enlarging the margin in perceptron decision trees will reduce the upper bound on generalization error. Based on this analysis, we introduce three new algorithms, which can induce large margin perceptron decision trees. To assess the effect of the large margin bias, OC1 (Journal of Artificial Intelligence Research, 1994, 2, 1–32.) of Murthy, Kasif and Salzberg, a well-known system for inducing perceptron decision trees, is used as the baseline algorithm. An extensive experimental study on real world data showed that all three new algorithms perform better or at least not significantly worse than OC1 on almost every dataset with only one exception. OC1 performed worse than the best margin-based method on every dataset.

knowledge discovery and data mining | 2004

Column-generation boosting methods for mixture of kernels

Jinbo Bi; Tong Zhang; Kristin P. Bennett

We devise a boosting approach to classification and regression based on column generation using a mixture of kernels. Traditional kernel methods construct models based on a single positive semi-definite kernel with the type of kernel predefined and kernel parameters chosen according to cross-validation performance. Our approach creates models that are mixtures of a library of kernel models, and our algorithm automatically determines kernels to be used in the final model. The 1-norm and 2-norm regularization methods are employed to restrict the ensemble of kernel models. The proposed method produces sparser solutions, and thus significantly reduces the testing time. By extending the column generation (CG) optimization which existed for linear programs with 1-norm regularization to quadratic programs with 2-norm regularization, we are able to solve many learning formulations by leveraging various algorithms for constructing single kernel models. By giving different priorities to columns to be generated, we are able to scale CG boosting to large datasets. Experimental results on benchmark data are included to demonstrate its effectiveness.

Neurocomputing | 2003

A Geometric Approach to Support Vector Regression

Jinbo Bi; Kristin P. Bennett

We develop an intuitive geometric framework for support vector regression (SVR). By examining when †-tubes exist, we show that SVR can be regarded as a classification problem in the dual space. Hard and soft †-tubes are constructed by separating the convex or reduced convex hulls respectively of the training data with the response variable shifted up and down by †. A novel SVR model is proposed based on choosing the max-margin plane between the two shifted datasets. Maximizing the margin corresponds to shrinking the eective †-tube. In the proposed approach, the eects of the choices of all parameters become clear geometrically. The kernelized model corresponds to separating the convex or reduced convex hulls in feature space. Generalization bounds for classification can be extended to characterize the generalization performance of the proposed approach. We propose a simple iterative nearest-point algorithm that can be directly applied to the reduced convex hull case in order to construct soft †-tubes. Computational comparisons with other SVR formulations are also included.

Explore More