Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Balaji Krishnapuram is active.

Publication


Featured researches published by Balaji Krishnapuram.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Sparse multinomial logistic regression: fast algorithms and generalization bounds

Balaji Krishnapuram; Lawrence Carin; Mário A. T. Figueiredo; Alexander J. Hartemink

Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

A Bayesian approach to joint feature selection and classifier design

Balaji Krishnapuram; A.J. Harternink; Lawrence Carin; Mário A. T. Figueiredo

This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation- maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.


international conference on machine learning | 2008

Bayesian multiple instance learning: automatic feature selection and inductive transfer

Vikas C. Raykar; Balaji Krishnapuram; Jinbo Bi; Murat Dundar; R. Bharat Rao

We propose a novel Bayesian multiple instance learning (MIL) algorithm. This algorithm automatically identifies the relevant feature subset, and utilizes inductive transfer when learning multiple (conceptually related) classifiers. Experimental results indicate that the proposed MIL method is more accurate than previous MIL algorithms and selects a much smaller set of useful features. Inductive transfer further improves the accuracy of the classifier as compared to learning each task individually.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

On Classification with Incomplete Data

David A. Williams; Xuejun Liao; Ya Xue; Lawrence Carin; Balaji Krishnapuram

We address the incomplete-data problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both expectation-maximization (EM) and variational Bayesian EM (VB-EM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graph-based regularization. The semisupervised algorithm utilizes all available data-both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shown


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006

Variational Bayes for continuous hidden Markov models and its application to active learning

Shihao Ji; Balaji Krishnapuram; Lawrence Carin

In this paper, we present a variational Bayes (VB) framework for learning continuous hidden Markov models (CHMMs), and we examine the VB framework within active learning. Unlike a maximum likelihood or maximum a posteriori training procedure, which yield a point estimate of the CHMM parameters, VB-based training yields an estimate of the full posterior of the model parameters. This is particularly important for small training sets since it gives a measure of confidence in the accuracy of the learned model. This is utilized within the context of active learning, for which we acquire labels for those feature vectors for which knowledge of the associated label would be most informative for reducing model-parameter uncertainty. Three active learning algorithms are considered in this paper: 1) query by committee (QBC), with the goal of selecting data for labeling that minimize the classification variance, 2) a maximum expected information gain method that seeks to label data with the goal of reducing the entropy of the model parameters, and 3) an error-reduction-based procedure that attempts to minimize classification error over the test data. The experimental results are presented for synthetic and measured data. We demonstrate that all of these active learning methods can significantly reduce the amount of required labeling, compared to random selection of samples for labeling.


Journal of Computational Biology | 2004

Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data.

Balaji Krishnapuram; Lawrence Carin; Alexander J. Hartemink

Recent research has demonstrated quite convincingly that accurate cancer diagnosis can be achieved by constructing classifiers that are designed to compare the gene expression profile of a tissue of unknown cancer status to a database of stored expression profiles from tissues of known cancer status. This paper introduces the JCFO, a novel algorithm that uses a sparse Bayesian approach to jointly identify both the optimal nonlinear classifier for diagnosis and the optimal set of genes on which to base that diagnosis. We show that the diagnostic classification accuracy of the proposed algorithm is superior to a number of current state-of-the-art methods in a full leave-one-out cross-validation study of five widely used benchmark datasets. In addition to its superior classification accuracy, the algorithm is designed to automatically identify a small subset of genes (typically around twenty in our experiments) that are capable of providing complete discriminatory information for diagnosis. Focusing attention on a small subset of genes is useful not only because it produces a classifier with good generalization capacity, but also because this set of genes may provide insights into the mechanisms responsible for the disease itself. A number of the genes identified by the JCFO in our experiments are already in use as clinical markers for cancer diagnosis; some of the remaining genes may be excellent candidates for further clinical investigation. If it is possible to identify a small set of genes that is indeed capable of providing complete discrimination, inexpensive diagnostic assays might be widely deployable in clinical settings.


Artificial Intelligence in Medicine | 2015

Predicting readmission risk with institution-specific prediction models

Shipeng Yu; Alexander Van Esbroeck; Glenn Fung; Vikram Anand; Balaji Krishnapuram

The ability to predict patient readmission risk is extremely valuable for hospitals, especially under the Hospital Readmission Reduction Program (HRRP) of the Center for Medicare and Medicaid Services (CMS) which went into effect starting October 1, 2012. There is a plethora of work in the literature that deals with developing readmission risk prediction models, but most of them do not have sufficient prediction accuracy to be deployed in a clinical setting, partly because different hospitals may have different characteristics in their patient populations. In this work we experimented with a generic framework for institution-specific readmission risk prediction, which takes patient data from a single institution and produces a statistical risk prediction model optimized for that particular institution and optionally condition specific. This provides great flexibility in model building, and is also able to provide institution-specific insights in its readmitted patient population. We showcase some initial results at three institutions for Heart Failure (HF), Acute Myocardial Infarction (AMI) and Pneumonia (PN) patients. The developed models yield better prediction accuracy than the ones present in the literature.


knowledge discovery and data mining | 2010

Designing efficient cascaded classifiers: tradeoff between accuracy and cost

Vikas C. Raykar; Balaji Krishnapuram; Shipeng Yu

We propose a method to train a cascade of classifiers by simultaneously optimizing all its stages. The approach relies on the idea of optimizing soft cascades. In particular, instead of optimizing a deterministic hard cascade, we optimize a stochastic soft cascade where each stage accepts or rejects samples according to a probability distribution induced by the previous stage-specific classifier. The overall system accuracy is maximized while explicitly controlling the expected cost for feature acquisition. Experimental results on three clinically relevant problems show the effectiveness of our proposed approach in achieving the desired tradeoff between accuracy and feature acquisition cost.


IEEE Transactions on Biomedical Engineering | 2008

Multiple-Instance Learning Algorithms for Computer-Aided Detection

Murat Dundar; Glenn Fung; Balaji Krishnapuram; R.B. Rao

Many computer-aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data, i.e., the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a convex hull representation of multiple instances that is significantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to find the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm significantly improves diagnostic accuracy when compared to both MIL and traditional classifiers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

A Fast Algorithm for Learning a Ranking Function from Large-Scale Data Sets

Vikas C. Raykar; Ramani Duraiswami; Balaji Krishnapuram

We consider the problem of learning a ranking function that maximizes a generalization of the Wilcoxon-Mann-Whitney statistic on the training data. Relying on an e-accurate approximation for the error function, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m2) to O(m), where m is the number of training samples. Experiments on public benchmarks for ordinal regression and collaborative filtering indicate that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when the algorithms are trained on the same data. However, since it is several orders of magnitude faster than the current state-of-the-art approaches, it is able to leverage much larger training data sets.

Collaboration


Dive into the Balaji Krishnapuram's collaboration.

Researchain Logo
Decentralizing Knowledge