Paul S. Bradley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul S. Bradley is active.

Explore More

Publication

Featured researches published by Paul S. Bradley.

Journal of Global Optimization | 2000

k-Plane Clustering

Paul S. Bradley; Olvi L. Mangasarian

A finite new algorithm is proposed for clustering m given points in n-dimensional real space into k clusters by generating k planes that constitute a local solution to the nonconvex problem of minimizing the sum of squares of the 2-norm distances between each point and a nearest plane. The key to the algorithm lies in a formulation that generates a plane in n-dimensional space that minimizes the sum of the squares of the 2-norm distances to each of m1 given points in the space. The plane is generated by an eigenvector corresponding to a smallest eigenvalue of an n × n simple matrix derived from the m1 points. The algorithm was tested on the publicly available Wisconsin Breast Prognosis Cancer database to generate well separated patient survival curves. In contrast, the k-mean algorithm did not generate such well-separated survival curves.

Informs Journal on Computing | 1999

Mathematical Programming for Data Mining: Formulations and Challenges

Paul S. Bradley; Usama M. Fayyad; Olvi L. Mangasarian

This article is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems.

Informs Journal on Computing | 1998

Feature Selection Via Mathematical Programming

Paul S. Bradley; Olvi L. Mangasarian; Street Wn

The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints. Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity. One feature selection algorithm via concave minimization reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4.

knowledge discovery and data mining | 1999

Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

Jayavel Shanmugasundaram; Usama M. Fayyad; Paul S. Bradley

Efficiently answering decision support queries is an important problem. Most of the work in this direction has been in the context of the data cube. Queries are efficiently answered by pre-computing large parts of the cube. Besides having large space requirements, such pre-computation requires that the hierarchy along each dimension be fixed (hence dimensions are categorical or prediscretized). Queries that take advantage of pre-computation can thus only drill-down or roll-up along this fixed hierarchy. Another disadvantage of existing pre-computation techniques is that the target measure, along with the aggregation function of interest, is fixed for each cube. Queries over more than one target measure or using different aggregation functions, would require pre-computing larger data cubes. In this paper, we propose a new compressed representation of the data cube that (a) drastically reduces storage requirements, (b) does not require the discretization hierarchy along each query dimension to be fixed beforehand and (c) treats each dimension as a potential target measure and supports multiple aggregation functions without additional storage costs. The tradeoff is approximate, yet relatively accurate, answers to queries. We outline mechanisms to reduce the error in the approximation. Our performance evaluation indicates that our compression technique effectively addresses the limitations of existing approaches.

knowledge discovery and data mining | 2001

Efficient discovery of error-tolerant frequent itemsets in high dimensions

Cheng Yang; Usama M. Fayyad; Paul S. Bradley

We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (customer-purchase data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering high-dimensional data, query selectivity estimation and collaborative filtering. Results show that the algorithm consistently uncovers structure in large sparse databases that other traditional clustering algorithms fail to find.

Optimization Methods & Software | 2000

Massive data discrimination via linear support vector machines

Paul S. Bradley; Olvi L. Mangasarian

A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimen-sional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to an optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problems

international conference on pattern recognition | 2000

Clustering very large databases using EM mixture models

Paul S. Bradley; U. M. Fayyad; Cory Reina

Clustering very large databases is a challenge for traditional pattern recognition algorithms, e.g. the expectation-maximization (EM) algorithm for fitting mixture models, because of high memory and iteration requirements. Over large databases, the cost of the numerous scans required to converge and large memory requirement of the algorithm becomes prohibitive. We present a decomposition of the EM algorithm requiring a small amount of memory by limiting iterations to small data subsets. The scalable EM approach requires at most one database scan and is based on identifying regions of the data that are discardable, regions that are compressible, and regions that must be maintained in memory. Data resolution is preserved to the extent possible based upon the size of the memory buffer and fit of the current model to the data. Computational tests demonstrate that the scalable scheme outperforms similarly constrained EM approaches.

Computational Optimization and Applications | 1998

Parsimonious Least Norm Approximation

Paul S. Bradley; Olvi L. Mangasarian; J. B. Rosen

A theoretically justifiable fast finite successive linear approximation algorithm is proposed for obtaining a parsimonious solutionto a corrupted linear system Ax=b+p, where the corruption p is due to noise or error in measurement. The proposedlinear-programming-based algorithm finds a solution x by parametrically minimizing the number of nonzeroelements in x and the error ‖Ax-b-p‖1.Numerical tests on a signal-processing-based exampleindicate that the proposed method is comparable to a method that parametrically minimizesthe 1-norm of the solution x and the error ‖Ax-b-p‖1, and that both methods are superior, byorders of magnitude, to solutions obtained by least squares as well by combinatorially choosing an optimal solution with a specific number of nonzero elements.

knowledge discovery and data mining | 2003

Data mining as an automated service

Paul S. Bradley

An automated data mining service offers an out-sourced, cost-effective analysis option for clients desiring to leverage their data resources for decision support and operational improvement. In the context of the service model, typically the client provides the service with data and other information likely to aid in the analysis process (e.g. domain knowledge, etc.). In return, the service provides analysis results to the client. We describe the required processes, issues, and challenges in automating the data mining and analysis process when the high-level goals are: (1) to provide the client with a high quality, pertinent analysis result; and (2) to automate the data mining service, minimizing the amount of human analyst effort required and the cost of delivering the service. We argue that by focusing on client problems within market sectors, both of these goals may be realized.

international conference on acoustics speech and signal processing | 1998

Parsimonious side propagation

Paul S. Bradley; Olvi L. Mangasarian

A fast parsimonious linear-programming-based algorithm for training neural networks is proposed that suppresses redundant features while using a minimal tra number of hidden units. This is achieved by propagating sideways to newly added hidden units the task of separating successive groups of unclassified points. Computational results show an improvement of 26.53% and 19.76% in tenfold cross-validation test correctness over a parsimonious perceptron on two publicly available datasets.

Explore More