Glenn Fung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Glenn Fung is active.

Explore More

Publication

Featured researches published by Glenn Fung.

knowledge discovery and data mining | 2001

Proximal support vector machine classifiers

Glenn Fung; Olvi L. Mangasarian

Instead of a standard support vector machine (SVM) that classifies points by assigning them to one of two disjoint half-spaces, points are classified by assigning them to the closest of two parallel planes (in input or feature space) that are pushed apart as far as possible. This formulation, which can also be interpreted as regularized least squares and considered in the much more general context of regularized networks [8, 9], leads to an extremely fast and simple algorithm for generating a linear or nonlinear classifier that merely requires the solution of a single system of linear equations. In contrast, standard SVMs solve a quadratic or a linear program that require considerably longer computational time. Computational results on publicly available datasets indicate that the proposed proximal SVM classifier has comparable test set correctness to that of standard SVM classifiers, but with considerably faster computational time that can be an order of magnitude faster. The linear proximal SVM can easily handle large datasets as indicated by the classification of a 2 million point 10-attribute set in 20.8 seconds. All computational results are based on 6 lines of MATLAB code.

Computational Optimization and Applications | 2004

A Feature Selection Newton Method for Support Vector Machine Classification

Glenn Fung; Olvi L. Mangasarian

A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.

european conference on machine learning | 2007

Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

Mark W. Schmidt; Glenn Fung; Rómer Rosales

L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.

Machine Learning | 2005

Multicategory Proximal Support Vector Machine Classifiers

Glenn Fung; Olvi L. Mangasarian

Given a dataset, each element of which labeled by one of k labels, we construct by a very fast algorithm, a k-category proximal support vector machine (PSVM) classifier. Proximal support vector machines and related approaches (Fung & Mangasarian, 2001; Suykens & Vandewalle, 1999) can be interpreted as ridge regression applied to classification problems (Evgeniou, Pontil, & Poggio, 2000). Extensive computational results have shown the effectiveness of PSVM for two-class classification problems where the separating plane is constructed in time that can be as little as two orders of magnitude shorter than that of conventional support vector machines. When PSVM is applied to problems with more than two classes, the well known one-from-the-rest approach is a natural choice in order to take advantage of its fast performance. However, there is a drawback associated with this one-from-the-rest approach. The resulting two-class problems are often very unbalanced, leading in some cases to poor performance. We propose balancing the k classes and a novel Newton refinement modification to PSVM in order to deal with this problem. Computational results indicate that these two modifications preserve the speed of PSVM while often leading to significant test set improvement over a plain PSVM one-from-the-rest application. The modified approach is considerably faster than other one-from-the-rest methods that use conventional SVM formulations, while still giving comparable test set correctness.

Optimization Methods & Software | 2001

Semi-superyised support vector machines for unlabeled data classification

Glenn Fung; Olvi L. Mangasarian

A concave minimization approach is proposed for classifying unlabeled data based on the following ideas: (i) A small representative percentage (5% to 10%) of the unlabeled data is chosen by a clustering algorithm and given to an expert or oracle to label, (ii) A linear support vector machine is trained using the small labeled sample while simultaneously assigning the remaining bulk of the unlabeled dataset to one of two classes so as to maximize the margin (distance) between the two bounding planes that determine the separating plane midway between them. This latter problem is formulated as a concave minimization problem on a polyhedral set for which a stationary point is quickly obtained by solving a few (5 to 7) linear programs. Such stationary points turn out to be very effective as evidenced by our computational results which show that clustered concave minimization yields: (a) Test set improvement as high as 20.4% over a linear support vector machine trained on a correspondingly small but randomly chosen subset that is labeled by an expert. (b) Test set correctness averaged to within 5.1% when compared to that of a completely supervised linear support vector machine trained on the entire dataset which has been labeled by an expert.

computer vision and pattern recognition | 2008

Structure learning in random fields for heart motion abnormality detection

Mark W. Schmidt; Kevin P. Murphy; Glenn Fung; Rómer Rosales

Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this problem, the interactions between the different heart regions and their overall influence on the clinical condition of the heart need to be considered. To do this, we propose a method for jointly learning the structure and parameters of conditional random fields, formulating these tasks as a convex optimization problem. We consider block-L1 regularization for each set of features associated with an edge, and formalize an efficient projection method to find the globally optimal penalized maximum likelihood solution. We perform extensive numerical experiments comparing the presented method with related methods that approach the structure learning problem differently. We verify the robustness of our method on echocardiograms collected in routine clinical practice at one hospital.

Neurocomputing | 2003

Finite Newton method for Lagrangian support vector machine classification

Glenn Fung; Olvi L. Mangasarian

Abstract An implicit Lagrangian [Math. Programming Ser. B 62 (1993) 277] formulation of a support vector machine classifier that led to a highly effective iterative scheme [J. Machine Learn. Res. 1 (2001) 161] is solved here by a finite Newton method. The proposed method, which is extremely fast and terminates in 6 or 7 iterations, can handle classification problems in very high dimensional spaces, e.g. over 28,000, in a few seconds on a 400 MHz Pentium II machine. The method can also handle problems with large datasets and requires no specialized software other than a commonly available solver for a system of linear equations. Finite termination of the proposed method is established in this work.

knowledge discovery and data mining | 2006

Learning sparse metrics via linear programming

Rómer Rosales; Glenn Fung

Calculation of object similarity, for example through a distance function, is a common part of data mining and machine learning algorithms. This calculation is crucial for efficiency since distances are usually evaluated a large number of times, the classical example being query-by-example (find objects that are similar to a given query object). Moreover, the performance of these algorithms depends critically on choosing a good distance function. However, it is often the case that (1) the correct distance is unknown or chosen by hand, and (2) its calculation is computationally expensive (e.g., such as for large dimensional objects). In this paper, we propose a method for constructing relative-distance preserving low-dimensional mapping (sparse mappings). This method allows learning unknown distance functions (or approximating known functions) with the additional property of reducing distance computation time. We present an algorithm that given examples of proximity comparisons among triples of objects (object i is more like object j than object k), learns a distance function, in as few dimensions as possible, that preserves these distance relationships. The formulation is based on solving a linear programming optimization problem that finds an optimal mapping for the given dataset and distance relationships. Unlike other popular embedding algorithms, this method can easily generalize to new points, does not have local minima, and explicitly models computational efficiency by finding a mapping that is sparse, i.e. one that depends on a small subset of features or dimensions. Experimental evaluation shows that the proposed formulation compares favorably with a state-of-the art method in several publicly available datasets.

knowledge discovery and data mining | 2000

Data selection for support vector machine classifiers

Glenn Fung; Olvi L. Mangasarian

The problem of extracting a minimal number of data points from a large dataset, in order to generate a support vector machine (SVM) classifier, is formulated as a concave minimization problem and solved by a finite number of linear programs. This minimal set of data points, which is the smallest number of support vectors that completely characterize a separating plane classifier, is considerably smaller than that required by a standard 1-norm support vector machine with or without feature selection. The proposed approach also incorporates a feature selection procedure that results in a minimal number of input features used by the classifier. Tenfold cross validation gives as good or better test results using the proposed minimal support vector machine (MSVM) classifier based on the smaller set of data points compared to a standard 1-norm support vector machine classifier. The reduction in data points used by an MSVM classifier over those used by a 1-norm SVM classifier averaged 66% on seven public datasets and was as high as 81%. This makes MSVM a useful incremental classification tool which maintains only a small fraction of a large dataset before merging and processing it with new incoming data.

Medical Physics | 2010

Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy.

K Jayasurya; Glenn Fung; Shipeng Yu; Cary Dehing-Oberije; Dirk De Ruysscher; Andrew Hope; W. De Neve; Yolande Lievens; P. Lambin; Andre Dekker

PURPOSE Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. METHODS A BN and SVM model were trained on 322 inoperable NSCLC patients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. RESULTS The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. CONCLUSIONS Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.

Explore More